STUDY AND ANALYSIS OF REMOTE SENSING DATA PARALLEL PROCESSING

This paper analyzes a varieties of procedure of remote sensing data processing, and explores the common mathematical models, common algorithm models, and public function processing units of data processing shared by different tasks or even different parts within an individual task. Public modules are established to improve the parallelism of remote sensing data processing based on FPGA, which has excellent parallel processing performance. In addition, in order to reduce the resource consumption and increase the calculation efficiency of the designed FPGA program, the method of avoiding floating-point arithmetic and division operation in FPGA programming are discussed in this paper. There are a large number of common calculation modules between different tasks, such as the rotation matrix calculation module in attitude solution, geometric correction, and orthorectification task. Image preprocessing, feature information extraction, image threshold separation, and connected region markers are all common processing modules for a target detection task. In the same task, there is also a common calculation module. When using the FPGA design program, the power series of 2 can be used to convert the floating-point operation to fixed-point operation withan acceptable precision. A similar approach can transform the division operation into multiplication and shift operations, thereby improve the computational performance of FPGA programming. * Corresponding author


INTRODUCTION
With the development of new Earth observation satellites, the spatial, spectral and temporal resolution of satellite sensors are continuously improved. The earth observation data obtained by remote sensing, has grown dramatically. It is an imperative demand to increase data processing accuracy and speed for space application. When processing massive remote sensing data, the traditional uni-processor serial program is affected by the parallelism of data processing, mainly because the serious drawbacks in real-time data processing. Therefore, the application of remote sensing images suffers from limitation, especially in real-time disaster monitoring and military applications. In view of the above problems, it makes great sense to research the parallel data processing of highperformance remote sensing data clusters. With the development of MEMS (Micro-electromechanical Systems) and sensing technology in the past several decades, the application of field programmable gate array (FPGA) technology has greatly facilitated the data processing efficiency. Besides, FPGA technology has also been widely used in parallel data processing of remote sensing. For application of FPGA in parallel processing of remote sensing data, in order to optimize the parallelism of the designed algorithm, adequate attention should be given to analysis of feasibility and design strategy. For feasibility analysis, it is necessary to clarify the general task flow of remote sensing data processing, so that the associated attributes and inheritance relationships between different data processing tasks could be identified. In addition, the detailed algorithm implementation process for each task provides a method to establish the parallelism between the algorithms of each task, and thereby reduce the required calculation time. With appropriate design strategy, not only calculation but also resource could be optimized by means of column methods as long as solution accuracy is guaranteed. Considering these two concerns, the current paper is mainly aimed at an improved performance in the parallel data processing of remote sensing based on FPGA.

General procedure for remote sensing data processing
The flow chart of data processing in common remote sensing technology is shown in Fig.1  processing (Gu et al., 2016) As can be seen from the flow chart, the data processing can be roughly divided into several stages according to task types, including image preprocessing stage, image processing stage, information extraction stage, knowledge application stage, and remote sensing application stage. After remote sensing images are preprocessed by atmospheric and radiation correction, attitude solution geometric correction and orthorectification are the key to image information extraction and remote sensing applications. and finally, the final remote sensing products required by the user can be obtained through information extraction, image classification, and inversion of the parameters for the region of interest.

Radiation calibration data processing and algorithm analysis:
The data processing of the radiometric calibration mainly includes, acquisition of original images under different brightness levels by the light homogenizing device, determination of the uniformity and oversaturation of the image. The non-uniformity correction coefficient is calculated by twopoint method, which provides an approach to obtain a uniform image. With the calculated radiation calibration coefficient by two-point method, the radiation corrected image is finally achieved.  Figure 2. Radiation calibration data processing flow chart 2.2.2 Cloud detection data processing flow and algorithm analysis: the next step After the radiation pre-processing is cloud detection on the image. In this task, the images with higher cloud coverage are directly removed, in order to reduce the number of images for processing and improve the efficiency of other subsequent remote sensing tasks. The main data processing flow of the cloud detection includes image feature information extraction, image threshold separation, and labelling of connected components. The data processing flow chart is as follows:  (2) absolute attitude calculation.
The relative calculation solution includes detection and matching of local features, mismatch elimination and relative attitude parameter calculation   In the absolute attitude calculation, the algorithms includes detection and matching of local features; mismatch elimination; absolute attitude parameter calculation   By analyzing the task processing flow of the relative attitude calculation and the absolute attitude calculation, it can be found that the two tasks share common in the method of elimination of mismatched feature points and use quaternion to calculate the rotation matrix. Therefore, parallel optimization could be performed with common algorithms of relative attitude calculation and absolute attitude calculation.

Satellite orthorectification data processing flow and algorithm analysis:
The calculation process of this task mainly includes: establishment of rigorous geometric processing model, and obtainment of the pixel coordinates of "virtual control point" and geodetic coordinates by the rigorous geometric processing model. After Establishment of the Rational Function Model(RFM), relevant model parameter are determined with according to least square method. then RFM is simplified with optimization in the irrelevance of RFM parameters, leading to reduced amount of model parameters; the orthoimage position and the number of rows and columns are obtained through strict geometric processing model; and subsequently, the orthophoto pixel coordinates and geodetic coordinates could be calculated. With substitution of orthophoto pixel coordinates and geodetic coordinates into the optimized RFM model, the pixel coordinates of the original image could be derived, and its grey value could be determined according to the bilinear interpolation method. The determined grey level is further employed tothe orthophoto pixel, and orthorectification is completed . The main flow chart for data processing is shown in Fig. 6 Figure 6. Data processing flow chart for orthophoto correction without ground control point 2.2.5 Target detection data processing flow and algorithm analysis: with the example of ship target detection, the main flow chart of data processing includes, image data preprocessing (filtering, denoising, enhancement), target feature extraction and target discrimination, connected area labelling. The main calculation process is shown in Fig. 7  Aaccording to the above analysis, the common processing modules between different tasks are: The image gray value calculation module in radiometric calibration part is also applicable to other target detection task processes such as cloud detection, ship monitoring, etc., and can be mainly used as image enhancement.
(2) The common processing modules among attitude calculation, geometric correction and orthorectification mainly include diagonal element calculation module with the outer quaternion method instead of Euler angle, and the least squares solution module.
(3) The cloud detection and ship target detection tasks share more common processing modules, including filtering and denoising module in image preprocessing, image threshold segmentation, binarization module, connected area labelling module, and redundant data elimination module. There are common processing modules within an individual tasks, including: In radiation calibration part: The two-point method module is used to calculate not only the non-uniformity correction coefficients but also the radiation calibration coefficients.
In attitude calculation part, the absolute attitude and relative attitude calculation involve common module in the eigenvalue description sub-calculation module and the mismatch

REMOTE SENSING DATA PARALLEL PROCESSING DESIGN METHOD BASED ON FPGA
The remote sensing data processing based on FPGA is programmed by HDL, and the tasks of remote sensing data processing are implemented in FPGA. Programming with FPGA needs to improve calculation accuracy as well as efficiency, and reduce resource consumption, so an optimization in processing algorithm for original data is achieved, to provide more convenience for FPGA programming. Remote sensing data processing consists of multiple tasks. Each task consists of specific algorithm modules. Different algorithm modules have specific implementation steps. Each implementation step includes multiple computational units. The establishment of a common calculation unit can facilitate the application of parallel algorithm for different calculation modules, and thereby the parallel processing efficiency is improved for each task.

Remote Sensing Data Processing Design Flow Based on FPGA
The design and implementation of the FPGA-based system for remote sensing data processing task usually adopts a top-down hierarchical design strategy in the development process. The task is the top-level hierarchy, and each calculation module of the task is a second-level hierarchical module, and then the solution steps in each calculation module are further refined into elements in third level of hierarchy until the top-level design task is fully decomposed into the basic FPGA computing unit or IP core.  Consequently, the remote sensing data processing can be roughly divided into the first, second and third layer of operation according to the hierarchy. The top layer is a task layer, which mainly includes radiation calibration, cloud detection, satellite attitude calculation, geometric correction, orthorectification, target recognition (flood, ship, etc. The third layer is a computing unit layer, and mainly includes: arithmetic operations, logical operations, bitwise operations, relational operations, equality operations, shift operations, conditional operations, and the like. Among them, Among them, there are more applications unit includes: addition, subtraction, multiplication, division, logarithm, and matrix operations. There are a large number of matrix operations in remote sensing data processing algorithms, including computational content: matrix addition, matrix multiplication (matrix fast multiplication), matrix inversion, matrix hybrid product, matrix determinant, etc.
The parallel data processing of remote sensing based on FPGA can obtain the algorithm relationship between different tasks through hierarchical analysis of the data processing algorithm flow of each task, and then plans the parallelism of remote sensing task.

Public Model of Remote Sensing Data Processing
To study the parallel data processing of remote sensing based on FPGA, it is necessary to dwell on the relationship between each task, each algorithm module and each computing unit, and design common modules according to common mathematical models, common typical algorithms and public functions among different tasks, in order to improve the efficiency of remote sensing data processing. When designing common mathematical models, common typical algorithms and public functions, due to the characteristics of FPGA programming, it is necessary to make corresponding improvements in remote sensing data processing, in terms of precision, efficiency and resource consumption.

3.2.1
Public mathematical model of remote sensing data processing based on FPGA: When calculating the angle of satellite attitude based on FPGA, the traditional Euler angle model consists of a complex rotation matrix structure, which requires a large number of trigonometric functions. The calculation consumes a lot of hardware resources and the calculation of the iteration times is unfavorable to the real-time processing of remote sensing data. In addition, for reverse derivation of Euler angle with rotation matrix, the nonuniqueness in calculation result leads to convergence problem of the calculation. as a result, in order to facilitate the application FPGA to attitude calculation, a quaternion model was introduced to replace the Euler angle model.  (Diebel J, 2006).
According to the quaternion definition, the rotation matrix of the coordinate system C a could be transformed to the coordinate system C b, indicated by R.
(1) Where R is the quaternion expression of the rotation matrix, which can describe the rotation with an arbitrary angle of the rigid body. This matrix is identical to the orthogonal traditional rotation matrix used in photogrammetry. When describing the relationship between the coordinate system C a and C b , the expression with the Euler angle is: In the equation, each rotation of the coordinate axis can be represented by a quaternion: (3) cos 2 0 0 sin 2 z q i j k (4) cos 2 0 sin 2 0 The virtual quaternion transformed from C a to C b coordinates is: x y z a q q q qb Expansion of the above equation gives the quaternion for the Eu ler angle: cos 2 cos 2 cos 2 sin 2 sin 2 sin 2 d (8) sin 2 cos 2 cos 2 cos 2 sin 2 sin 2 a (9) cos 2 sin 2 cos 2 sin 2 cos 2 sin 2 b (10) cos 2 cos 2 sin 2 sin 2 sin 2 cos 2 c The Euler anglecould be calculated by quaternion: arctan R R The use of quaternions instead of Euler angles greatly simplifies the calculation of angles in attitude calculation, geometric correction and orthorectification. When designing the rotation matrix model using FPGA, the calculation the diagonal elements in R involves the same variables, for example R12 and R21 have the same variables ab and cd; R13 and R31 have the same variables ac and bd; R23 and R32 have the same variables bc and ad. As a result, the FPGA only needs a single calculation for these variables during calculating of R, which reduces the use of floating-point multipliers and improves data processing parallelism.

Common computing unit for data parallel processing based on FPGA:
(1) Matrix inversion module -LU matrix decomposition: Matrix inversion operations is needed in attitude calculation, geometric correction, or orthorectification tasks. In the FPGAbased remote sensing processing, if the inversion is directly performed by the adjoining matrix method, more hardware resources are consumed, and the real-time performance is inferior. therefore, it could not provide desired real-time performance. and, it is necessary to find an alternative for implementing matrix inversion in the FPGA. The essence of the LU decomposition algorithm refers to decomposition of the large matrix into several small matrices, so thatstandard LU could be applied to the decomposed small matrices. It has the same inversion process for matrices with different size. Here, a 5 5 matrix is taken as an example. First, B is the inverse matrix to be solved, and it is partitioned in the following way: (17)   22  21   12  11   55  54  53  52  51   45  44  43  42  41   35  34  33  32  31   25  24  23  22  21   15  14  13  12  According to the definition of the LU decomposition algorithm, one can obtain:  Figure 9. The calculation process is as follows: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International Conference on Geomatics in the Big Data Era (ICGBD), 15-17 November 2019, Guilin, Guangxi, China 1) B 11 is decomposed using the LU module to obtain L 11 and U 11 ; 2) L 11 and U 11 are inverted and substituted into the equation (20) to get U 12 and U 21; 3) U 12 and L 21 are substituted into equation (21)   (2) Module for matrix multiplication The time complexity of sequentially performing matrix multiplication is .Therefore, as the matrix dimensions ) ( 3 n O increase, the time required to perform matrix sequential multiplication increases dramatically. In order to efficiently obtain the result of matrix multiplication, hardware acceleration of matrix multiplication can be performed by FPGA through the parallel structure of matrix multiplication. The parallel structure of matrix multiplication is mainly composed of mutually independent multiply accumulate unit PE. Since each PE does not need data exchange and communication, the parallel structure of the matrix multiplication has good scalability. With sufficient hardware resources, matrix multiplication of arbitrary dimensions can theoretically be realized. Taking A×B=C as an example, the parallel computation of matrix multiplication is described. The size of the matrix A, B, and C is 39×39. As shown in Figure 10, A0101~A3939 and B0101~B3939 are elements in matrix A and matrix B, respectively; C0101~C3939 are elements in matrix C. In order to ensure the correct calculation result, the elements in the matrix A and the matrix B need to be sent to the corresponding PE processing unit in parallel simultaneously, and the matrix A and the matrix B sends data to the corresponding PE processing unit, according to the column priority and the row priority respectively. A0101 is transported to the PE processing unit of the first row (ie PE0101~PE0139), A0201 is transported in the PE processing unit of the second row (ie PE0201~PE0239), and so on, A3901 is transported in the PE processing unit of the 39th row (ie PE3901~PE3939); B0101 is transported to the PE processing unit of the first column (ie PE0101~PE3901), and so on, B0139 is transported to the PE processing unit of the 39th column ( ie PE0139~PE3939) at time t. After the above data completes the multiplication and accumulation operation, the second column data of the matrix A and the second row of the matrix B are sent to the corresponding PE processing unit for multiplication and accumulation operations in a similar manner, and the elements C0101 to C3939 in the matrix C are updated. When the elements of the last column of matrix A and the elements of the last row of matrix B complete the multiplication and accumulation operation according to the above procedure, the final matrix C is obtained. From the above analysis. the time complexity of matrix multiplication is reduced from the sequential execution of to the parallel execution of (Wu, 2011  The optimized algorithm reduces a multiplier and avoids floating-point operations. Therefore, reasonable algorithm optimization can reduce the consumption of hardware resources.

Method of avoiding division
The division operation consumes a large amount of hardware resources. When the division operation is performed, the denominator can be approximated to a power series of 2 on the premise of allowed accuracy. After the conversion is completed, the division operation is converted into multiplication, shift, and addition.To calculate the mean value for a 9×9 image, the following method can be used to avoid division calculation: In this case, it only needs to judge whether is less than or a equal to . b N According to the above method, when designing the algorithm, it can avoid floating-point operations and division operations as much as possible, thereby reducing hardware resource consumption and improving parallel computing speed.

CONCLUSION
The algorithm of remote sensing data processing is analyzed through hierarchy design strategy. There are a large number of public data processing modules for data processing between different remote sensing tasks, including public mathematical models, common algorithm modules, and public computing units. The public data processing module obtained by the analysis can be optimized by the algorithm for implementation in the FPGA. Among them, the main focus of analysis is laid on geometric correction, orthorectification part of the rotation matrix common mathematical model quaternion model, LU decomposition matrix inversion module and matrix parallel multiplication calculation module. The establishment of the above public modules based on FPGA can increase the parallelism of the processing of each task algorithm. According to the characteristics of the FPGA technology, in order to decrease hardware resource consumption rate and increase efficiency the calculation with acceptable accuracy, the method of avoiding floating point operation and the the division operation are analyzed. By using the power of 2 to approximate to the floating-point number and denominator, the floatingpoint operations and division operations can be equivalently transformed to multiply and shift operations that are advantageous for FPGA implementation. According to the above data processing method, it can ensure that the parallel processing of remote sensing data based on FPGA exhibits a more efficient computing performance.