FPGA-based on-board Cubic Convolution Interpolation for Spaceborne Georeferencing

: This paper proposes a novel low-complexity implementation with one parameter two dimensions cubic convolution interpolation (2D-CCI) of image interpolation for spaceborne georeferencing. The experiment results demonstrate that (1) the proposed scheme can achieve better performance without increasing the computational complexity compared to the traditional CCI; (2) the number of multiplication and addition is reduced to 33.33% and 40% respectively compared with original CCI; (3) the occupied resources of LUTs, register, DSP and memory are reduced to 23.7%, 21.9%, 18.5% and 25%, respectively; (4) the speedup time of the proposed Implementation is about 25 times higher than that when using the PC-based implementation.


INTRODUCTION
Cubic convolution interpolation (CCI) is an originally process for the development for the reconstruction (i.e., interpolation, resampling) of Landsat digital image (Rifman, 1973(Rifman, , 1974Robert, 1981;. The function has a good comprise between computational complexity and reconstruction. However, the traditional CCI consumes resource in hardware implementation. To solve this problem, this paper proposes an improved parallel pipelining two dimensional CCI (PP-2D-CCI) method based on field programmable date array (FPGA). Geometric calibration is one of the most important steps for remotely sensed (RS) imagery . With increasing demands in real-time or near real-time, such as military, disaster monitoring. The on-board implementation of RS processing has been investigated by many scientists and scholars (Zhou, 2017). To improve the georeferencing, this paper development the PP-2D-CCI scheme for real-time interpolation. The architecture of indirect georeferencing is descripted in Figure 1. All the function modules are managed by the timing control module. Mathematical distortion module is used to choose a suitable mathematical distortion model, which describes the relationship between image coordinate and geodetic coordinate. The coordinate transformation module is implemented pixel-bypixel through the image coordinate (X ref , Y ref ), and the corresponding transformed coordinate (x, y) are calculated at the same clock. In general, the value of (x, y) coordinate will not be integer. Hence, a new coordinate must be estimated by an interpolation process. The paper focuses on the interpolation of the spaceborne georeferencing. For the details of other modules, it can be referenced to Zhou et al. (Zhou, 2010;Liu, 2019;Zhou, 2016). The rest of work is organized as follow: Section 2 overviews the previously relevant work. Section 3 overviews of the cubic convolution interpolation and derives the proposed PP-CCI kernels. An FPGA-based implementation is presented in Section 4. The experiment and discussion are presented in Section 5. Finally, conclusions are given in the last section.

RELEVANT WORK
To correct the distortion, it must essentially reposition pixels from their original location into a specified reference grid. RS Imagery georeferencing includes mathematical distortion module, coordinate transformation and resampling (interpolation) (Zhou, 2016). Interpolation is fundamental to digital image processing, especially in operations that require image resampling, such as scaling, registration, warping, and correction for geometric distortions (Schowengerdt, 2006). The convolution interpolation method for RS imagery georeferencing includes nearest-neighbor interpolation (Toutin, 2004), bilinear interpolation (Zhang, 2005), cubic convolution interpolation (Rifman, 1973(Rifman, , 1974Robert, 1981;, and cubic B-spline interpolation (Arif, 2005). CCI function has been widely used to image interpolation and provided a good compromise between computational complexity and accuracy. Reichenbanch et al. (2001) proposed an improved 2D-CCI method for image reconstruction that is implemented the most general two-dimensional, non-separable, piecewise cubic interpolator with constraints for symmetry, continuity, and smoothness, and overcame the disadvantage of the traditional approach uses a separable convolution kernel. Meijering et al. (2018) established a link between classical interpolation and modern convolution-based interpolation and discussed the computational differences and given examples of other cubic interpolation schemes not previously studied in signal and image processing. Shi et al. (2003) presented 2D-3PCC and 2D-5PCC methods that were low-order polynomials with small spatial support and easy to implement and efficient to apply. Zhou el al. (2016) proposed a novel edge directed CCI scheme which can been adapt to the varying edge structure of image. Hilal (2018) proposed a new resampling interpolation kernel which was depended on five independent parameters that controlled its amplitude, angular frequency, standard deviation, and duration. Now, the CCI method are widely used in image processing, but the traditional CCI is still compute complex for hardware implementation. FPGAs have been becoming the standard for on-board RS processing due to their smaller size, weight, and power consumption compared to other high-performance computing systems. However, the interpolation method for georeferencing on-board has not yet been reported worldwide so far. And conventional computing scheme are unsuitable for real time processing of image, since the computing power required to process large volumes of image data is enormous. To solve these problems, this paper presents an improved on-board cubic convolution interpolation based on FPGA for RS imagery georeferencing.

Basic Concept of CCI Method
If ( ) f x is a sampled function, g( ) x is the corresponding interpolation function. Mathematically speaking, the interpolation function in one dimensional (1D) is : where, k c is the coefficient which is determined from the resampled data. h is the resampling increment. k x represents the interpolation node, ( ) K s signifies the interpolation kernel.
The basic function of the interpolation is that the value at the interpolation notes or resample spots is equal to the value of the resampled data (Robert, 1981) (2) CCI uses a separable piecewise cubic convolution (PCC) interpolation kernel with the constraint that it is continuous and has a continuous first derivative. These constraints result in the following kernel of Equation (3) (Rifman, 1973(Rifman, , 1974Robert, 1981;. The parameter a has a physical significance in Equation (3), it is the slope of K(s) at  1 s . The common value is 1 a   (Rifman, 1973(Rifman, , 1974, which duplicates the slope of the ideal interpolator at  1 s (also called standard cubic convolution), however, it was proven that (Robert,1981) 0.5 a   is superior to  1 a in that it provides an interpolation with better convergence properties. An intermediate choice, 0.75 a   , is also sometimes advocated, this choice results from forcing Equation (3) to have second derivative continuity at  1 s (Hou, 1978). The various members of interpolation family are illustrated in Figure 2 (time domain). Since ( ) K s is an even function, only values  0 s is shown in Figure 2. Now support that x is any point at which the sampled data is to be interpolated, then x must be between two consecutive interpolation nodes which can be denote (3) is substituted into Equation (1), the CCI function is obtained. (4) reduces to (Shlien, 1979;Bailey, 2011): The resulting one-dimensional interpolation weights two samples on either side of the desired data point, giving a region of support of four samples. Extending a one-dimensional cubic convolution to two dimensional (2D) cubic convolution interpolation, which requires sixteen (4*4) spots near the resampled spot. Figure 3 displays the processing of the 16-2D-CCI.
Equation 6 is rewritten as matrix form, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International Conference on Geomatics in the Big Data Era (ICGBD), 15-17 November 2019, Guilin, Guangxi, China where, ( , ) x y is coordinate of the image after interpolation. i and j are the integer part of x and y , respectively; v and u are the fractional part of x and y , respectively.  I  I  I  I  I  I  I  I  I  I  I  I   I  I  I  I .

3.2.1
Designing Memory Access: In order to improve the efficiency of memory access, a multi-array memory architecture is designed. Assuming the size of RS image is  m n . As such m n  memory unites are designed in double data rate (DDR).
The relationships between the address of DDR and (i,j) coordinate can be described as Equation (8).
where, i and j are the row and column of the raw image, respectively.
ij addr is the corresponding address of the memory.
3.2.2 Optimization the kernel function: Substituting the weight into equation (3) and simplifying by the power of the weight, Equations (9) thru (10) can be defined as (Robert, 1981).
From Equation (14), the proposed architecture of PP-2D-CCI is descripted in Figure 5, which are consist of four modules: decomposition coefficient module, Reading ROM module, Calculation weight module, and cubic convolution interpolation module. The function of decomposition coordinate module aims to generate the ROM address (the integer part of the coordinates x_row and y_colm, respectively) and the weights (the fractional part of the coordinates x_row and y_colm, respectively) that are needed to be interpolated. The reading ROM module is used to compute sixteen ROM address and read the corresponding value. The computation weight module parallel calculates the weight of the interpolation point in the horizontal and vertical directions according to the interpolation kernel. CCI module is used to accomplish the piecewise cubic convolution interpolation. All the function modules are managed by the timing control module. Figure 5. The FPGA-based system architecture

Parallel Decomposition of (x_row, y_colm ) Coordinate
Generally, the coordinate is floating-point, but the ROM address is integer, in order to reduce the resource consumption of the FPGA, the strategy of mixed data type is adopted, and the fixed-to-float IP core is used to realize the transformation. The specific flowchart is shown in Figure 6. Figure 7 displays the parallel decomposition of coordinate, which includes six subtraction and two absolute operation are implemented through Xilinx's IP cores. x_row and y_colm are parallel decomposed at the same clock.

FPGA-Based Parallel Reading Memory
From Equation (8), An architecture of reading memory is designed, which includes implementation address and reading memory. The structure is illustrated in Figure 8. A strategy for converting multiplication operations into left-shift operations is adopted, which not only reduces the consumption of FPGA resources, but also raises the speed of operation. Eleven adders, two subtractions, and three RTL left shifts are used to implement this system. Figure 8. FPGA-based reading DDR memory Figure 9 shows the architecture of parallel calculating the weights u and v, sixteen multiplies, eight adders and six add3 modules (containing twelve adders) are used. At the timing controlling, the results are output into the next module (interpolation module) at the same clock cycle.

FPGA-Based Parallel Compute the Interpolation
After implementing the matrices of u K , v K and I , the value of interpolation can be computed from Equation (14). The parallel architecture was designed in Figure 10. Twenty multipliers, and fifteen adders are used in this architecture. Figure 10. The architecture of PP-CCI.

The Software and Hardware Environment
The proposed architecture is implemented on a custom-designed board which contains a Xilinx Artix-7 XC7VX980-tffg1930-1 FPGA. The selected FPGA has 612,000 Logic Cells, 1,500kb Block RAMS, 1,224,000 Flip-Flops and 3,600 DSP Slices. In addition, the design tool is Vivado 2014.2, the simulation tool is ModelSim SE-64 10.4, and hardware design language is Verilog HDL. To validate the proposed method, the georeferenced algorithm is also implement using MATLAB R2014a on PC with Windows 7(64 bit) operation system, which is equipped with an Inter(R) Core i7-4790 CPU @ 3.6 GHz and 8 Gb RAM.

Data
To validate the proposed system based on an FPGA, 256*256 pixels SPOT TM data is used to perform the interpolation for Georeferenced image. The data set is bldt_tm.img from the ENVI software, with its own control points file is bldt_tm.pts. The bldt_tm.img image with the UMT protection, the zone of 13°, the resolution of 30m, the band of three, the wavelength range of 0.63～0.69μm. Figure 11 displays the raw image. with different parameter a of PP-2D-CCI method based on FPGA Various members of this family of interpolators corresponding to a=l, 0, -0.5, -0.75, -1, and -2 are compared with the same data, as shown in Figure 12. By visual inspection, when a=1, and 0, the interpolated image texture information is lost more, and the interpolation effect is significantly reduced. When a=-0.5, -0.75, -1, and -2, the interpolated image contains more texture information, and the interpolation effect is better than that of a=1, and 0. (2) Comparison with different interpolation method To validate the accuracy of proposed interpolation method, nearest neighbour interpolation, bilinear interpolation, and CCI algorithms are executed based on MATLAB. The interpolation images of nearest neighbour interpolation, and bilinear interpolation are shown in Figure 13. Through the visual check Figure 13 and 12, it can be found that the result of CCI interpolation georeferenced image has the enough smoothness and visual effect is better than that of the bilinear interpolation and nearest neighbour interpolation. However, the CCI algorithm consumes more resources.  Figure 14. Table 1 lists the comparison of interpolation accuracy. It can be found that the gray value of CPs obtained by PP-2D-CCI are approximately equal, respectively. respectively. There are 69371 LUTs, 107964 registers, 1368 RAM/FIFO, and 481 DSP48s are used with utilization rates of 11.34%, 8.82%, 30.40%, and 7.42%, respectively in the 16-2D-CCI algorithm. In PP-2D-CCI scheme, 52932 LUTs, 84314 registers, 1026 RAM/FIFO, and 292 DSP48s are used with utilization rates of 8.65%, 6.89%, 68.4%, and 10.59%, respectively. The results of the utilization ratio of the logical unit for the bilinear interpolation, 16-2D-CCI and PP-2D-CCI scheme are shown in Figure 15. From Figure 15, the utilization rate of PP-2D-CCI scheme is higher than that of bilinear interpolation algorithm, but lower than that of 16-2D-CCI method. Comparison with PP-2D-CCI and 16-2D-CCI, LUTS, register, memory, and DSP48s are reduced by 16439, 23650, 342, and 89, respectively. The reduction rates are 23.7%, 21.9%, 25% and 18.5%, respectively.
(a) (b) Figure 15. The utilization ratio of logic units for bilinear interpolation method, 16-2D-CCI algorithm, and PP-2D-CCI scheme. (a) The occupied FPGA resources. (b) The utilization ratio of logic units.

Processing speed comparison:
The processing speed is one of the most important factors on-board georferencing [6]. To evaluate the speedup of the proposed method. This paper compares the speedup of the bilinear interpolation, 16-2D-CCI and proposed PP-2D-CCI FPGA-based with PC-based. The clock frequency is defined as 100MHz in FPGA architecture.

CONCLUSION
This paper develops an improved to the cubic convolution interpolation based on FPGA. The PP-2D-CCI method is a new technique for resampling of RS image. It has several desirable features than others interpolation scheme.
The PP-2D-CCI method is an improved 16-2D-CCI algorithm, not only improves the speed but also reduces the occupied FPGA resources and the cost. Comparing PP-2D-CCI with 16-2D-CCI resource usage, LUTS, register, memory, and DSP48s were reduced by 16439, 23650, 342, and 89, respectively, the reduction rates are 23.7%, 21.9%, 25% and 18.5%, respectively. The speedup time of the proposed Implementation is about 25 times higher than that when using the PC-based. Therefore, the new scheme has lower computational complexity and higher interpolation speed while keeping the same precision as original CCI.