DEEP LEARNING TRAINING WITH UNBALANCE SAMPLE DISTRIBUTION FOR REMOTE SENSING IMAGE SEGMENTATION

: The intelligent interpretation of remote sensing images based on deep learning has become a hot spot with the increasing satellite images acquired due to the rapid development of aerospace technology. Sufficient and reasonable distributed samples are essential for the accuracy of deep learning. The spatial distribution of natural features is inhomogeneous in the real world. When people create sample dataset, they often collect within a certain local range, which may bring problems of unbalanced distribution of samples, including the unbalance between training dataset and validation dataset, and the unbalance among different sample categories. This long-tail distribution of samples (i.e., a few classes account for most of the data, while most classes are under-represented) can lead to bias in the training model and make it difficult to ensure accuracy. In this paper we tried to solved the above-mentioned problem in landcover classification with high spatial and spectral resolution (HSSR) remote sensing images. We first adopted an iterative stratification method for multi-label data classification to ensure that both training dataset and validation dataset contain reasonable proportion of landcover classes. Then we proposed a weighted loss algorithm to further strengthen the learning ability of the model for rare categories. Experiments on a large volume HSSR dataset shows that with our methods the accuracy of landcover classification increased by 2%.


INTRODUCTION
Land cover and its change are important for resource planning and monitoring, ecological environment assessment, and sustainable development. With the accelerating process of economic globalization, the demand for land cover information high spatial and temporal resolution in the region scope and even global scope is increasing (Nilsson et al., 2016) (UN, 2019). In 2015, the United Nations formulated the 2030 agenda for sustainable development for global sustainable development and proposed to achieve 17 sustainable development goals (SDGs) by 2030 (Grekousis et al., 2015), (CHEN and CHEN, 2018)many of the SDGs are close-related to land cover. Thus, land cover information can effectively support the formulation and implementation of national, regional and global public policies, economic and political programs. The realization of SDGs further puts forward an urgent demand for global scope high-resolution land cover data.
At present, the most precise land cover data in the world are GlobeLand30 land cover product with 30 meters resolution developed by Chen Jun etc.(CHEN and CHEN, 2018), FROM-GLC10 land cover with 10 meters resolution developed by Gong * Corresponding author Peng etc. (Gong et al., 2013), and fine classification products GLC_FCS30-2015 with 30 meters resolution developed by Liu Liang-yun etc. (Zhang et al., 2019). With the development of earth observation ability, satellite remote sensing images with high spatial and temporal resolution are increasingly rapidly, which support the extraction of finer ground object in large area to product landcover data with higher resolution. However, at present, the methods and technologies of satellite remote sensing interpretation are not enough to support the large-scale production of high resolution land cover data.
The intelligent interpretation of remote sensing images based on deep learning has become a hot spot. Sufficient and reasonable distributed samples (annotated datasets) are essential for the accuracy of deep learning. This has been proved by many successful cases of deep Convolutional Neural Networks (CNNs) for visual recognition (Hinton, 2006) (Huang and Learned-Miller, 2014) with large-scale, real-world annotated datasets (Rawat and Wang, 2017)  .
For remote sensing image interpretation, the inhomogeneous spatial distribution of natural features in the real world must be considered when creating the sample datasets. When people create sample dataset, they often collect within a certain local range, which may bring problems of unbalanced distribution of samples, including the unbalance between training dataset and validation dataset, and the unbalance among different sample categories. This long-tail distribution of samples (i.e., a few classes account for most of the data, while most classes are under-represented) can lead to bias in the training model and make it difficult to ensure accuracy (Chen et al., 2016) (Xu et al., 2019) (Khoshgoftaar et al., 2010).
People have tried to handling the problems caused by the longtailed training data by re-sampling and re-weighting (Bengio, 2015) (Zhang et al., 2017). Cui et al. proposed an re-weighting approach to design a class-balanced loss (Cui et al., 2019).
In this paper we tried to solved the above-mentioned problem in landcover classification with high spatial and spectral resolution (HSSR) remote sensing images. We first adopted an iterative stratification method for multi-label data classification to ensure that both training dataset and validation dataset contain reasonable proportion of landcover classes. Then we proposed a weighted loss algorithm to further strengthen the learning ability of the model for rare categories. Experiments on a large volume HSSR dataset shows that with our methods the accuracy of landcover classification increased by 3%.

Class balanced Loss
To address the problem of long-tailed sample distribution, typical solutions adopt class re-balancing strategies such as resampling and re-weighting based on the number of observations for each class. Cui et.al proposed a class balanced loss method to handle this problem (Cui et al., 2019). They argued that as the number of samples increased, the additional benefit of a newly added data point would diminish. So they introduced a framework to measure data overlap by associating with each sample a small neighbouring region rather than a single point. The effective number of samples is defined as the volume of samples and can be calculated by a simple formula (1-βn)/(1-β), where n is the number of samples and β ∈ [0, 1) is a hyperparameter. A re-weighting scheme was designed using the effective number of samples for each class to re-balance the loss, thereby yielding a class-balanced loss. We adopted this method to balance the number of samples in different landcover categories.
In this paper, we calculated the adjustment amount by using the class balance loss proposed by Cui et al. We firstly define the calculation method of the effective quantity of each category (the expected value of the sample size). Suppose that the data of a new sampling point can only be associated with the previous sampling data in two ways: either it is completely in the previous sampling data set, and the probability is recorded as p; Or completely outside the previous sampling data, and the probability is recorded as (1-p). With the increase of the number of sampled data, the probability p also increases. Using more samples in a certain category will bring diminishing marginal benefits. This is because there are inherent similarities between real-world data. With the increase of the number of samples, the newly added samples are likely to be almost duplicate of the existing samples.
The effective number of samples is called ， Where n ∈ ℤ > 0 is the number of samples. Super parameter γ control increases with the increase of n, then: In order to obtain the balanced loss function, we introduced a weighting factor 1/ . It is in inverse proportion to the effective sample number of each class , that is, for class i , is included in total. Adding a weighting factor to the loss function, and the sample size balance loss function can be written as:

Iterative sample division
Generally, samples should be divided into training, validation, and test sets. The number of classes in each dataset were often unbalanced. Szymański et.al. proposed an iterative stratification method for multi-label data classification (Szymański and Kajdanowicz, 2017) (Szymanski and Kajdanowicz, 2019). This method considered the second-order relationship between labels so that a more balanced sample distribution can be obtained. We adopted this method to ensure both training and validation datasets contain an reasonable proportion of landcover classes.

Deep learning Model
Zheng et al. proposed a fast patch-free global learning (FPGA) framework for hyperspectral image (HIS) classification (Zheng et al., 2020). It uses an encoder-decoder based FCN to consider the global spatial information by processing the whole image. And experiments show that FPGA framework is superior to the patch-based framework in both speed and accuracy for HSI classification.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France The spectral attention module (Zheng et al., 2020) As shown in Figure 1 and Figure 2, FGPA used an encoder network and a decoder network. The encoder is responsible for computing the hierarchical convolutional feature maps over an entire input HSI. The decoder recovers the spatial dimension of the coarsest convolutional feature map progressively via lateral connection based SSF, outputting a classification probability map of the same spatial size as the input image.

Original sample dataset
We used the high spatial and spectral resolution sample dataset Luojia HSSR developed by Wuhan University(2022). It is constructed based on aerial hyperspectral imagery of southern Shenyang City of Liaoning Province in China, covering area of 161 square kilometers with spatial resolution 0.75 meter 249 spectral bands as VNIR, geometric accuracy of 1.5-3 meter and corresponding to 47 categories of field-validated ground coverage. To the knowledge of this paper, it is the largest hyperspace spectral sample data set so far.

Adjustment of the sample dataset
The sample size of each category in Luojia-HSSR varies greatly due to the uneven spatial distribution of landscape types in real world. For example, paddy fields account for 45.7% of the total, while roads account for only about 2%.
We adjust the original Luojia-HSSR dataset with the method proposed in previous section of this paper, including the following 3 steps: (1) Category distribution adjustment: the categories with too small sample size are deleted or merged. After adjustment, the sample categories are changed from 47 to 23, as shown in Table  1.  Figure 3 shows the sample before adjustment. The category is a long tail distribution, and the number of samples in each category is very uneven. Figure 4 Figure shows the samples adjusted according to the method in the previous section, and the sample sizes of various types are closer. Figure 5 shows the divided training, testing and verification dataset. The sample size among the subsets is relatively equal.

Result and discussion
We verified the effect of sample adjustment through experiments with the dataset described in 4.2. Frequency Weighted Intersection over Union (FWIoU) is used as the evaluation index.
FWIou is an improvement of the Mean Intersection over Union (MIoU). It sets the weight for each category according to the frequency of occurrence. Therefore, when used in the case of uneven distribution of sample size among classes, the result will be more reasonable.
(3) Figure 6 is the landcover classification results with original dataset and adjusted dataset. The model is FPGA described in section 3.3. Overall, the classification accuracy (FWIoU) was improved from 0.7098 to 0.7325. In the first and second groups, the road with less pixels perform more completely in the adjusted dataset. The misclassification of natural forest, grass and dry land in the third group was also improved in the adjusted dataset.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France   Figure 7 is the confusion matrix of the classification results with original sample dataset. And Figure 8 is the confusion matrix of the classification results with adjusted sample dataset. We can also see that after adjustment, the accuracy of the model is improved, but misclassification still occur, such as dry farmland (code 3), arboreal forest (code 21), natural grassland (code 36), building (code 50), country road (code 74), trampled surface (code 89) be confused with nurse (11), Shrub-wood (26)), other structure (80), open dump (88), dug land (120), Soil surface (142) and Sandy surface (143). This is because the model has insufficient discrimination ability for categories with small intra class differences. For example, dry farmland spectrum is mixed with vegetation and land. It should be classified from the aspect of spatial features. FPGA is more focus on hyperspectral feature and is insufficient for spatial features.

CONCLUSION
Unbalanced distribution of samples for intelligent interpretation of remote sensing image can lead to bias in the training model and make it difficult to ensure accuracy of interpretation. This paper proposed a weighted loss algorithm to strengthen the learning ability of the model for rare categories, and adopted an iterative stratification method for multi-label data classification to ensure that both training dataset and validation dataset contain reasonable proportion of landcover classes. Experiments on a large volume high spatial and spectral resolution dataset shows that the proposed methods improved the accuracy of landcover classification.