TRANSFER LEARNING WITH LIMITED SAMPLES FOR THE SAME SOURCE HYPERSPECTRAL REMOTE SENSING IMAGES CLASSIFICATION

A classification method for hyperspectral datasets with a limited number of samples based on transferred convolutional neural network (CNN) is proposed. For the CNN model, a lot of labeled samples are needed for the classification of hyperspectral images, but it takes plenty of time and labor to annotate images in the experiment. In our work, CNN model and transfer learning are applied to solve this problem. By pre-training the model on the other hyperspectral dataset, the classification results of the target hyperspectral images can be effectively improved, when the number of trainable samples is limited. Three transfer approaches are chosen for classifying hyperspectral images and their performance are compared and analyzed. With the decrease of the number of samples, transfer learning has an increasing impact on the classification results of hyperspectral images. In the three transferred models, freezing the convolutional layer weights and retraining the fully connected layer weights yields the best classification performance, which reaches 77.23% in classification accuracy, when the number of samples per class is set 10. And when the number of training samples is 5, the classification accuracy growth rate reaches a maximum of 33%. The results indicate that a relatively high classification accuracy could be obtained by training only a limited number of samples with the same domain transferred parameters.


INTRODUCTION
Nowadays, hyperspectral remote sensing has become one of the important means of earth observation. Hyperspectral imaging technology combines imaging with spectral technology to detect the two-dimensional geometric space and one-dimensional spectral information of the target to obtain continuous and narrowband image data with high spectral resolution. Compared with the traditional monochrome, panchromatic and multispectral imaging technologies, hyperspectral remote sensing technology can get richer spectral information and have the ability of more prominent ground feature discrimination. Therefore, the use of hyperspectral data allows for a more detailed classification of ground objects. In recent years, machine learning provides an important way for hyper-spectral image classification automatically (Zhang et al., 2020). In the field of deep learning, the purpose of hyperspectral image (HSI) classification is to predict a unique label for each pixel so that it can be well represented by a given land-cover class (Bioucas-Dias et al., 2013).
Nowadays, there are many methods for HSI classification. In the early research, machine learning methods based on statistical methods such as Support Vector Machines (SVM) and Random Forests (RF) are dominant. Subsequently, deep learning methods were proposed and applied to the field of hyperspectral image classification. Compared with traditional classification methods of extracting texture and edge features, deep learning methods can extract more complex features at a deeper level, and the learning process is entirely automatic. Therefore, it can adapt to various situations and overcome the problem of poor generalization of traditional methods. A framework for HSI classification using multiscale spatial texture features, namely Multiscale Local Binary Pattern (MS-LBP) and Multiscale Complete Local Binary  Corresponding author Pattern (MS-CLBP), is proposed and it achieves good classification performance (Sidike et al., 2016).
The main problem is that hyperspectral image has a large number of bands, and there is a lot of information redundancy between many bands, which bring great difficulties to the classification of hyperspectral images. Common methods for solving band redundancy include Linear Discriminant Analysis (LDA), Principal Component Analysis (PCA) and other methods (Gao et al., 2020). Chen et al. used principal component analysis (PCA) to reduce the dimension of the original space, and then a twodimensional convolutional neural network was used to extract the features of the spatial information contained in the input center pixel domain . In the field of deep learning, the generalization of hyperspectral image classification models has always been a concern. Transfer learning is an effective way to improve the generalization of deep learning models. Transfer learning methods have been applied to medical image classification with good results (Fradi et al., 2020). In the field of remote sensing image segmentation, transfer learning has also been gradually applied. Li et al. proposed an interesting target detection framework with a transfer deep convolutional neural network (CNN) . Lin et al. applied a deep mapping transfer learning model in hyperspectral image classification, constructed a deep mapping network between the target domain and the source domain, and used correlation analysis to correlate the two data (Lin et al., 2018). Hyperspectral imagery has a large number of bands and a large imaging range, and it is very laborintensive to label all images. Therefore, a model with good classification performance can be obtained using only a limited number of samples to participate in training, the classification efficiency would be significantly improved.
Labeled hyperspectral images are difficult to get. Usually, the images we can get are unlabeled, and few-shot learning is an important research direction for hyperspectral image classification and segmentation. Many scholars in the field of remote sensing have carried out in-depth research on this topic. In the field of hyperspectral image classification, few-shot learning is a big challenge, and the concept of deep metric learning is applied to few-shot learning. It solves the problem of classification when the number of samples of hyperspectral images in the same scene or across the scene is insufficient (Deng et al., 2020). Liu et al. proposed a model that combines deep convolutional neural networks with deep metric learning, uses deep convolutional neural networks to extract image features to reduce the uncertainty of labeling, and then uses deep metric learning to classify hyperspectral images. (Liu et al., 2019). A deep self-attention and mutual-attention few-shot learning (SMA-FSL) method is proposed for HSI few-shot classification (Huang et al., 2021). Li et al. proposed a method for few-shot HSI classification, which based on two-branch deep learning (Li et al., 2020). However, they all did not consider the performance of transfer learning on few-shot learning of same source hyperspectral images.
In this paper, the transfer learning method is introduced to the field of few-shot learning of hyperspectral image classification. By pre-training the model on the other dataset, which also belongs to the hyperspectral dataset, the classification results of hyperspectral images can be effectively improved when the number of trainable samples is limited. First, we pre-train our model on the Indian Pines dataset, and by using the hyperspectral feature information extracted from the Indian Pines dataset, we can get a relatively good classification effect of the whole dataset with only a small number of the Salians dataset samples. Three transfer methods were applied and their results were compared. Then a relatively good classification performance can be obtained by only training a limited number of samples of the Salinas data set. It not only ensures accuracy but also improves the training speed. The main contributions of this letter are as follows: First, a classification method for hyperspectral data sets with a limited number of samples based on a transferred CNN is proposed. Second, the performance of the three transfer methods are evaluated and analyzed.
The rest of the paper is outlined as follows, the proposed classification framework and critical steps are provided in section 2. Section 3 illustrates the results of the experiment, and a summary is given in section 4.

PROPOSED CLASSIFICATION FRAMEWORK
The proposed classification framework mainly includes three steps: (1) Pre-training a CNN model on a labeled dataset Indian Pines.
(2) Selecting different numbers of small training samples from the data set Salinas. (3) Fine-tuning the CNN model with the extracted few-shot samples. The detailed structure is shown in Figure 1.

3D-CNN Model
The three-dimensional CNN (3D-CNN) model is chosen as our basic model. Compared with two-dimensional CNN, 3D-CNN has few parameters, and it is more suitable for processing hyperspectral images. Using 3D convolution has the capability to extract not only the spatial information from hyperspectral images, but also their spectral information.
Two consecutive convolution layers are used to extract the features of hyperspectral images. The size of convolution kernels is 3×3, and the activation function is rectified linear unit (ReLU). We also apply dropout for avoiding overfitting, and its parameter is set as 0.25. There are also two fully connected layers following behind convolution layers, and they can map the spectrum features to classes. In addition, the optimization of the 3S model or 3ST model is stochastic gradient descent (SGD), and its parameters are given in Table 1.

Pre-training
Pre-training the model is an essential step in the experiment. By pre-training homologous hyperspectral data sets, the feature information of hyperspectral images can be extracted. At this stage, the hyperspectral dataset, which is homologous to the target dataset, is used for pre-training, in which more spatial and spectral feature information can be extracted from the homologous hyperspectral data set. The Indian Pines dataset is pre-trained by the designed model, in which 75% of the dataset is selected as the training set and the remaining 25% as the test set. After preprocessing, the dimension of our input data is 5×5×220, the height and width of the input data are 5, and the number of bands of the image is 220.

Small Sample Selection
As the number of samples of each ground object in the Salinas dataset is different, how to extract samples from the target dataset is a very important step. There are many methods to extract a limited number of samples from the target hyperspectral data set. Two extraction methods have been tried in this experiment, and the method of taking edge samples for each kind of ground objects has poor experimental performance. Meanwhile, the method of random sampling for each kind of ground object performs better in classification. In the Salinas dataset, 1, 5, and 10 samples of each ground object are extracted randomly, respectively. Similarly, all the remaining samples are used as the test set.

Transfer Learning and Fine-tuning
After transferring and loading the pre-training model weight from the original dataset to the target dataset, the 3D-CNN needs to be fine-tuned by small samples, and Figure 2 shows three different transfer learning methods. Here, we first transfer and load the pretrain model weight from the model trained on the original dataset. And convolutional layers and fully connected layers are separately retrained or frozen in the processing of the next finetuning. In detail, the CNN TL1 model performs secondary training on all layers of the pre-trained model, while the CNN TL2 model freezes the parameters of the convolutional layer of the pre-trained model and trains the parameters of the fully connected layer. In addition, the CNN TL3 model freezes the parameters of the shallow convolutional layer of the pre-trained model and trains that of the deep convolutional and fully connected layers.

EXPERIMENT
The Indian Pines data used for pretraining is a classic hyperspectral dataset for image classification. Since the experiments are based on transfer learning, the hyperspectral datasets for the experiments were collected by the same sensor AVIRIS in 1992, mainly on pine trees in India, the United States. The size of the dataset is 145×145 pixel-vectors, with a spatial resolution of 20 m. There are 16 classes of features labeled in Indian Pines dataset, mainly including soybean, seedlings, and trees. The small samples used for fine-tuning are extracted from the Salinas dataset, which is also gathered by the AVIRIS imaging spectrometer over the Salinas Valley in California, USA. The Salinas dataset has 224 bands with a size of 512×217 pixels vectors. And the dataset contains a large number of labeled samples, mainly consisting of 16 classes, each of which has hundreds of labeled samples, such as bare soil, vegetables, and vineyards. The detail information of the Salinas data set is shown in Figure 3. Each color represents a category of ground objects. All experiments are performed on an Intel Xeon(R) Sliver 4210R CPU @ 2.40GHz (40 CPUs), 2.4GHz with 64GB RAM. GPU is also used for the experiments. The whole experiment was carried out under the framework of tensorflow 2.6. During the experiment, numpy, sklearn, scipy, skimage, and matplotlib libraries are used.
The base model we use is 3D-CNN because it works well with images and any data that can be transformed into image structure. In addition, the 3D-CNN model has better ability to extract the spatial and spectral information of hyperspectral images. Compared with traditional algorithms and other neural networks, the use of convolutional neural networks can efficiently extract multi-dimensional local information of images, extract image features, and classify images.
In the experiment, the Indian Pines (hyperspectral) dataset was used to pre-train the CNN model and the trained parameters were transferred to the target Salinas dataset, which would be applied to evaluate the performance of hyperspectral images classification with limited samples.
The hyperspectral image is firstly reduced in dimension by principal component analysis (PCA), and a larger number of principal components are retained as much information as possible. In fact, 220 principal components are kept after the PCA in both Indian Pines and Salinas datasets. This setting will reduce the loss of spectral and spatial information in hyperspectral images, thus achieving better results in transferred classification.
To get the effect of the sample size on the classification performance, the selected sample size was set to 1, 5, and 10, respectively. Accuracy, precision, and recall are used as indicators to evaluate the performance of model classification. By finetuning the transfer learning-based model, the accuracy of the model is significantly improved compared to that trained directly with small samples. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France obtained. With the increasing of the number of selected samples, the accuracy of the model classification is gradually improved. However, the effect of transfer learning on classification is remarkable when there are few samples. The comparison of the classification results of the transfer model (T) and the non-transfer model (UT) with different sample numbers is shown in Figure 4. As shown in the figure, when the number of samples of each class is 10, there is a 13% difference in classification accuracy between the transfer model and the initial model. When the number of selected samples is 1 and 5, the gap increased to 19% and 33%.
The convergence performance of the transfer model (CNN TL2) trained with 10 samples is shown in Figure 5. When the number of iterations is 30, the accuracy becomes stable, and the loss is stable when the epoch is 27. As the training epochs grow to 50, its loss function and training accuracy also do not change significantly. Only the results of 30 rounds for convenience are shown here. It can be seen from the figure that the training of the model does not have the severe overfitting phenomenon that often occurs when training deep learning models of other hyperspectral datasets. All these results show that the same domain transferred parameters is able to improve hyperspectral image classification accuracy with limited training samples.    Table 3 shows the classification results of the three methods with 1, 5, and 10 labeled samples per class for fine-tuning. When the number of samples is 1 and 5, compared with the other two transfer methods, the classification accuracy of the CNN TL2 is higher. When the number of samples is 5, the classification effect of the CNN TL3 method is better. Although the best model is not always the same when the number of samples is different, the classification accuracy of the CNN TL1 model is always the worst.  The classification performance of the CNN TL2 transfer model after training with 10 samples of each class of the Salinas dataset is shown in Figure 6. Finally, based on the transfer learning method, we can get a good classification result with only a limited number of samples. As shown, Vinyard_vertical_trellis and Celery have higher classification accuracy, and the classification accuracy reaches 99%, while the classification accuracy of Lettuce_romaine_7wk is the worst, and Vinyard_untrained, Fallow_smooth, and Lettuce_romaine_5wk are also easily confused with other classes. In the northwest and middle areas in Figure 6 (a), various types of ground objects are densely arranged, and the classification accuracy is relatively poor.
The results show that the method we proposed could classify most categories with higher accuracy, especially for the object covering large areas. Meanwhile, the number of target training samples affect the performance of our method. That is the improvement of overall accuracy is more obvious with smaller target training samples, and better performance could be obtained with more target training samples. However, the CNN model used in our work is relatively simple, and only some fine-tuning of the learning rate of the transfer model is carried out. Meanwhile, compared with large training samples, our method still has considerable space to improve.

CONCLUSION
A classification method for hyperspectral data sets with a limited number of samples based on CNN and the same domain transfer learning is proposed. The experiments in this paper prove that transfer learning between hyperspectral data sets is effective for improving the classification accuracy of few samples. As the number of sample increase, the accuracy of model classification will be improved. However, the effect of transfer learning on classification is remarkable when there are fewer samples. Three types of transfer models are considered, and finally, classification accuracies are obtained and analyzed. Transfer methods also affect the classification performance of the transfer 3D-CNN model. In our experiments, the transfer method of freezing convolutional layer weights and retraining fully connected layer weights shows the best classification performance.
Since the adopted model is relatively simple, the classification accuracy still has large room to be improved. In our classification results, there are still some categories with poor classification performance. To obtain better classification results, our next step is to carry out the research on transfer methods, such as the deep metric learning, or meta-learning methods in few-shot learning to the homologous datasets. Meanwhile, more types of training samples' number and more target datasets will be considered to validate the applicability and validity of our method.