COMPARISON OF MULTI-IMAGES DEEP LEARNING SUPER RESOLUTION FOR PASSIVE MICROWAVE IMAGES OF ARCTIC SEA ICE

The observation of Arctic sea ice is of great significance to monitoring of the polar environment, research on global climate change and application of Arctic navigation. Compared to optical imagery and SAR imagery, passive microwave images can be obtained for all-sky conditions with high time resolution. However, the spatial resolution of passive microwave images is relatively low (6.25 km 25 km) for the observation of detailed sea ice characteristics and small-scale sea ice geographical phenomena. Therefore, in this paper, considering the suitability of different alignment and fusion strategies to the characteristics of passive microwave images of sea ice, two multi-images deep learning super-resolution (SR) algorithms, Recurrent Back-Projection Network (RBPN) and network of Temporal Group Attention (TGA), are selected to test the effects of SR technique for passive microwave images of sea ice. Both qualitative and quantitative comparisons are provided for the SR results oriented from two algorithms. Overall, the SR performance of TGA algorithm outperforms RBPN algorithm for the passive microwave images of sea ice.


INTRODUCTION
The observation of Arctic sea ice is of great significance to monitoring of the polar environment, research on global climate change and application of Arctic navigation (Serreze and Stroeve, 2015). Data from a variety of satellite sensors, including optical satellite images, passive microwave images, and synthetic aperture radar (SAR) images have been employed to observe polar sea ice. Although optical images, such as Moderate Resolution Imaging Spectroradiometer (MODIS), Medium Resolution Imaging Spectromete (MERIS), and Advanced Very High Resolution Radiometer (AVHRR), have high temporalspatial resolution, they are often contaminated by cloud, even no available images can be obtained due to poor atmospheric conditions (Petrou et al., 2018). SAR images, such as Sentinel-1 (Xian and Tian, 2017), have high spatial resolution, but limit to the small swath and low temporal resolution, resulting in mass data processing when producing Arctic sea ice characteristics.
Passive microwave images, such as the Special Sensor Microwave Imager (SSM/I) on the series of satellites of Defense Meteorological Satellite Program (DMSP), the Advanced Microwave Scanning Radiometer-EOS (AMSR-E) on Aqua satellite of the National Aeronautics and Space Administration (NASA) Earth Observation System (EOS), and the Advanced Microwave Scanning Radiometer 2 (AMSR2) on the Global Change Observation Mission 1st -Water "SHIZUKU" (GCOM-W1), are important data sources for Arctic sea ice observation with the advantages of wide coverage, high temporal resolution, strong surface penetration ability and all-weather work (Petrou et al., 2018). Among them, AMSR2 is one of the representative passive microwave sensors that has been observing sea ice since 2012, it has more frequency bands and relatively high spatial resolution (Han and Kim, 2018). Although AMSR2 can provide daily coverage of the entire Arctic, its typical spatial resolution of around 6.25-25 km makes it difficult to monitor small leads * Corresponding author and ridges, and it is prohibitively coarse for some fine-scale application, such as detailed characteristics of sea ice, and smallscale geographical phenomena (Agency and Project, 2013;Wagner et al., 2020).
Considering the high cost and limitations of increasing the resolution through "hardware", especially for large scale imaging equipment like AMSR2, signal processing methods, known as SR techniques, have become a potential way to improve resolution of images (Yue et al., 2016). SR techniques, which refers to the process of recovering high resolution (HR) images from one or sequence low-resolution (LR) images, is an important technique in computer vision and image processing. According to the input number of LR images, the SR techniques can be divided into single-image SR (SISR) and multi-image SR (MISR) (Dong et al., 2016). Compared with SISR, MISR methods have the advantage of combining spatial and temporal information from sequence images . Traditional MISR methods based on spatial and frequency domain, are not only unable to deal with complex motions but also have problems with huge computation (Tom and Katsaggelos, 1995;Li et al., 2001;Daithankar and Ruikar, 2020).
From 2015, MISR techniques based on deep learning (DL-MISR) have begun to be developed and been applied in natural images with good performance (Liu et al., 2020). DL-MISR uses deep neural networks to construct motion estimation and fuse the complementary information of sequence images to obtain HR images. The latest existing DL-MISR methods not only make full use of temporal information, but also deal with complex motion state. When there is a large amount of sample data to train the model, DL-MISR is more computationally efficient, saves time and cost, which is very suitable for batch data processing and has strong application value for the SR task of the same type of lowresolution images.
At the same time, the passive microwave images have the characteristics of "more and less", that is, because AMSR2 circle the earth one times a day, the amount of image data that is repeated multiple times in the same polar sea ice area is huge (Agency and Project, 2013). These data can serve as the data set for the training DL-MISR model. Therefore, in this paper, we analysis existing DL-MISR methods. Considering the suitability of different alignment and fusion strategies to the characteristics of passive microwave images of Arctic sea ice, RBPN (Harris et al., 2019) and TGA (Isobe et al., 2020) are selected and adopted to test the effectiveness of DL-MISR applied for AMSR2 images of sea ice. Both qualitative and quantitative evaluation on experimental results are compared and analysed.

METHODOLOGY
MISR basically consist of three modules, including alignment module, fusion module, reconstruction module as shown in Fig.1 (Liu et al., 2020). The alignment module extracts motion information from sequence frames to reference frame, which concerns on the spatial transformation applied to misaligned images. The fusion module refers to combining complementary information from the aligned images and fusing them into a feature map. Reconstruction module transforms the aggregated features to the final output image by deconvolution or subpixel convolution layers.
Alignment module and fusion module could always lead to a big swing in terms of performance and efficiency (Chan et al., 2021). Therefore, we chose DL-MISR method mainly to consider the applicability of these two modules for characteristics of passive microwave images and polar sea ice movement. The sea ice motion in the Arctic region is complex, which includes rigid motions of drifting and rotating in different directions and speed, as well as non-rigid motions of melting, freezing and disintegration. On the other hand, there are also large motions due to both seasonal and regional impacts (Tschudi et al., 2010;Maeda et al., 2020). In addition, there are some noise and unreliable values on passive microwave images.
For alignment module, Motion estimation and motion compensation methods (MEMC) and deformable convolution methods (DCN) are two common alignment strategies. Both MEMC and DCN may be able to handle the complex sea ice motions. But DCN such as Temporally Deformable Alignment Network is difficult to train and may smooths out high-frequency details . Some early MEMC methods such as Detail-revealing deep network are easy to generate artifacts, which lead to inconsistencies between image sequences (Liao et al., 2015;Kappeler et al., 2016;Liu et al., 2017;Tao et al., 2017). Therefore, we select RBPN model, a MEMC model based on optical flow. Compared with other MEMC methods, the back projection module of RBPN adopts an iterative error-correcting feedback mechanism to calculate both up-and down-projection errors for minimizing the feature map error. It actually reduces artifacts and temporal inconsistencies to a certain extent (Haris et al., 2018).
For fusion module, existing DL-MISR methods fuse features through direct concatenating (Huang et al., 2015;Sajjadi et al., 2018) or temporal and spatial attention (TSA). Considering that the information provided by different adjacent frames is not equal due to occlusion or blurring, TSA calculates a weight map to each neighbouring frame, so it is better than direct concatenating. Therefore, we select the TGA model based on TSA fusion (Isobe et al., 2020), it makes full use of complementary information across frames to recover missing details for the reference frame. In addition, the strategy of group fusion is also adopted in TGA.

TGA
TGA adopts a fast spatial alignment method. It estimates homograph between every two consecutive frames and warps neighbouring frames to the reference frame so it can handle images sequences with large sea ice motion. Since homograph transformation is a global, TGA keeps the structure better and introduces few artifacts. Considering that the contributions of neighbouring frames in different temporal distances are not equal, TGA designs temporal grouping. For each group, an intra-group fusion module is employed for feature extraction and fusion. Every intra-group fusion module is equipped with dilation rate to model the motion level associated with a group. To better integrate features from different groups, a temporal attention module is introduced. It works as a guidance to efficiently aggregate information across different temporal groups and produces a high-resolution residual map (Isobe et al., 2020). The final SR output is obtained through a sub-pixel convolution layer.

DATA
The training dataset comes from the AMSR2 passive microwave sensor on the GCOM-W1 polar orbiting satellite platform in the global change observation mission of the Japanese aviation research and development agency. This series of satellites carry out long-term (10-15 years) detection of the earth and provide observational data for the study of the global water cycle and climate change mechanisms. The current observations on water are carried out by the ASMR2 mounted on the water cycle variation observation satellite launched in 2011. AMSR2 obtains images with higher resolution than the other passive microwave sensors at more frequencies. It has the largest antenna diameter (approximately 2 meters) of the observation sensor for carrying satellites, and can achieve a high-speed rotation of 40 times per minute, sweeping along the arc of the earth's surface, and searching the earth day and night in two days (Agency and Project, 2013). AMSR2 can capture and measure microwaves from the ground and ocean in seven frequency bands (6~89HZ). The final products of the AMSR2 have five types: Level 0, Level1B, Level1R, Level2, and Level3.In particular, the Level 1B brightness temperature swath data of horizontal polarization and vertical polarization at 89 GHz is selected because it has the highest spatial resolution (5×5 km). As Fig. 1 shows, all swath AMSR2 data of one day is gridded to obtain daily average passive microwave brightness temperature data with the polar stereographic grids of the National Snow and Ice Data Center (NSIDC) at 6.25 km as real true images. To produce LR images, we downscale the HR images four times with bicubic interpolation at 25 km. The experimental images cover 30° N to 90° N, and -180° W to 180° E, including the entire Arctic region. The images acquisition time range is from 2013 to 2016. The dataset contains a total of 33,600 image sequences, of which 30,660 are used for training and 2.940 are used for evaluation. Each image sequence consists of 7 adjacent images, and the image size is cropped to 256×256 pixels. In addition, we use data augmentation technology such as rotation, mirror, and random cropping to expand the training set to improve the generalization ability of the model.

Implementation and training details
In this experiment, we convert the format of passive microwave images from GEOTIFF to PNG, which is more suitable for network. The original brightness temperature is stored as 16bit value. In all our experiments, we adopt with a 4× sampling factor to evaluate the different methods. Before fed into networks to train models, the input data is normalized.
The hyper-parameters of RBPN are set as follows: the network uses kaiming initialization and Adam optimizer, the loss function is L1 loss per-pixel between the predicted frame and the ground truth HR frame. The learning rate is 0.0001 for all layers and decreases by a factor of 10 for half of total 150 epochs. The batch size is set to 4. The RBPN model is trained under the environment of Ubuntu 16.04 + NVIDIA GTX 1080GPUs*2 + Python3.5 + CUDA9.2 + pytorch 1.0 for two days.
TGA is supervised by pixel-wise L1 loss as well and optimized using kaiming initialization and Adam optimizer with β1 = 0.9 and β2 = 0.999. Weight decay is set to 0.0005 during training. The learning rate is initially set to 0.0001 and later down-scaled by a factor of 0.1 every 10 epochs until 50 epochs. The batch size of TGA is 8. TGA model is trained under the environment of Ubuntu 16.04 + NVIDIA GTX 1080 GPUs*2 + Python3.6 + CUDA9.2 + pytorch 1.2 for one days.

Analysis of the super-resolution results
Evaluation index: the image quality evaluation indexes used in this experiment are full reference image quality evaluation indexes: peak signal-to-noise ratio (PNSR) and structural similarity (SSIM) (Daithankar and Ruikar, 2020).
The larger value of the PSNR only indicates the smaller the gap between the image to be reviewed and the reference image but also the better the image quality. The formula of PNSR is as follows:  (1)), contrast (equation (2)) and structure (equation (3)). The SSIM value range is (0,1). Higher value indicates a smaller distortion of the image. It can better reflect the subjective feelings of the human eye.
l(X, Y) = 2 + 1 2 + 2 + 1 (2) c(X, Y) = 2 + 2 2 + 2 + 2 (3) s(X, Y) = + 3 + 3 (4) Where X, Y=Original image, SR image , =the mean value of image X and Y , = the variance of image X and Y = the covariance of image X and Y C1, C2, C3 = (0.01 * 65535) 2 , (0.03 * 65535) 2 , C2/2 The quantitative comparison is given in Tab. 1. TGA algorithm has a better performance on both evaluation indexes especially in terms of SSIM. In addition, the results of vertical polarization image are slightly better than that of horizontal polarization. Fig. 4 shows two subregions of passive microwave images of Arctic sea ice including LR images, SR images of two model and HR images. Sub-region A in Fig. 3 mainly includes different sea ice texture, and sub-region B includes sea ice and sea water. It's obvious that there are finer boundaries and more detailed texture after DL-MISR. It seems that TGA generates more detailed edge while SR results is smoother. But RBPN maintains the consistency between LR and SR better than TGA. In addition, both two DL-MISR introduces less noise.

CONCLUSION
Considering the suitability of different alignment and fusion strategies to the characteristics of passive microwave images of sea ice, two DL-MISR methods, namely RBPN and TGA, are applied to the passive microwave images of sea ice, which can improve the spatial resolution of passive microwave images and obtain finer boundaries and more details. The TGA not only performs better on quantitative evaluation but also generates finer boundaries and more detailed texture than RBPN. However, RBPN keeps the consistency between LR and SR well. Overall, TGA might be more suitable method to improve the resolution of passive microwave images to observe Arctic sea ice. In the further, we will optimize the model according to the characteristics of sea ice motions and the applicable scene of different strategies of alignment and fusion module. In addition, we will make more detailed analysis and evaluation of SR process and experimental results. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2021 XXIV ISPRS Congress (2021 edition)