Noise-Tolerant Hyperspectral Image Classification Using Discrete Cosine Transform and Convolutional Neural Networks

Hyperspectral image classification has drawn significant attention in the recent years driven by the increasing abundance of sensorgenerated hyperand multi-spectral data, combined with the rapid advancements in the field of machine learning. A vast range of techniques, especially involving deep learning models, have been proposed attaining high levels of classification accuracy. However, many of these approaches significantly deteriorate in performance in the presence of noise in the hyperspectral data. In this paper, we propose a new model that effectively addresses the challenge of noise residing in hyperspectral images. The proposed model, which will be called DCT-CNN, combines the representational power of Convolutional Neural Networks with the noise elimination capabilities introduced by frequency-domain filtering enabled through the Discrete Cosine Transform. In particular, the proposed method entails the transformation of pixel macroblocks to the frequency domain and the discarding of information that corresponds to the higher frequencies in every patch, in which pixel information of abrupt changes and noise often resides. Experiment results in Indian Pines, Salinas and Pavia University datasets indicate that the proposed DCT-CNN constitutes a promising new model for accurate hyperspectral image classification offering robustness to different types of noise, such as Gaussian and salt and pepper noise.


INTRODUCTION
Hyperspectral imaging systems have been increasingly used in a wide area of applications including agriculture (Sahu et. al., 2019), surveillance (Freitas et. al., 2019), biomedical imaging (Offerhaus, Bohndiek & Harvey, 2019), and astronomy (Hege et. al., 2004). This is due to the fact that they efficiently combine the spatial information with the subtle differences in spectral signatures of various objects, making them valuable tools in tasks relevant to material detection and object recognition. These tasks can be viewed as classification problems, i.e. classify image pixels based on their spectral characteristics in order to recognize various materials, and objects.
The high dimensionality and complexity of hyperspectral information (HSI), makes classification an arduous task, that requires special learning architectures to tackle them. This prohibited the use of shallow neural network architectures, such as Feedforward Neural Networks, since the complexity of the data could result in the accumulation of errors, especially when the derivative is in the saturation regions, where small error changes affect the weights significantly. This was evident even when trying to classify vision based data with fewer dimensions than HSI, like RBG or Thermal, narrow-spectrum data.
The advent of deep learning brought a new era to hyperspectral image classification. Architectures such as Convolutional Neural Networks (CNN) tackled the aforementioned challenges by using * Corresponding author a number of convolution steps, extracting representative features at different hierarchies of resolutions from the input data. These features are forwarded to neural network architectures that drive the classification step. The feature extraction capabilities of CNN have significantly enhanced the performance of classifying visual based data.
However, the majority of these approaches significantly deteriorate in performance in the presence of noise in the hyperspectral data, thus indicating insufficient robustness to noise and outliers. In this paper, we propose a new model that effectively addresses the challenge of noise residing in hyperspectral images. The proposed model, which will be called DCT-CNN, combines the representational power of Convolutional Neural Networks with the noise elimination capabilities introduced by frequency-domain filtering enabled through the Discrete Cosine Transform.

Related Work
Earlier works on classification of HSI, consist of two main steps, the computation of hand-crafted features from the raw data and the use of such features to train classifiers, such as Support Vector Machines (SVM) and Neural Networks (NN) (Camps-Valls & Bruzzone, 2009). Camps-Valls et. al. (2014 employs statistical learning methods to classify high-dimensional data, with a few training samples available. These approaches however The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) assume a priori knowledge of the important features for the classification, which is usually not-known.
More efficient approaches employ the paradigm of pattern recognition and machine learning (Lecun et. al., 1998, Hinton and Salakhutdinov, 2006, Bengio et. al., 2007 , that automate the feature extraction steep, by building high-level features from low-level ones. Moreover, when in possession of bigger training sets, such as large images with very high spatial and spectral resolutions, deep learning approaches seem more adequate for the classification problem (Chen et. al., 2014). Techniques based on deep learning have already been shown promising results for the detection (classification) of specific objects (Han et. al., 2018), materials (Mnih and Hinton, 2012), vehicles (Montavon et. al. 2012), actions (Bakalos et. al., 2019a), and HSI (Chen et. al., 2014). Chen et. al., (2014) utilises a greedy layer based training framework (Bengio et. al., 2007) and Autoencoders, which are used as the basic building block for spectral feature extraction that are in turn combined with spatial information and fed to a logistic regression classifier. However, it is necessary to employ various adaptations of simple deep learning techniques in classifying high dimensional data. This is evident even in application scenarios where the dimensionality of data is much lower than HSI such as Bakalos et. al., (2019b).
Typical deep learning architectures contain a high number of tunable parameters implying that a large number of samples is also needed to accurately train the network, and high complexity is introduced in the training process. Furthermore, such approaches are often not sufficiently tolerant to the presence of noise within the hyperspectral information. The proposed approach in this paper introduces a pre-processing step based on the Discrete Cosine Transform in a CNN framework for HSI.

THE PROPOSED DCT-CNN FOR NOISE-TOLERANT HYPERSPECTRAL IMAGE CLASSIFICATION
The proposed model, DCT-CNN, combines the representational power of Convolutional Neural Networks in pixel-level classification of HSI with the noise elimination capabilities introduced by frequency-domain filtering enabled through the Discrete Cosine Transform. In particular, the proposed method entails the transformation of pixel macroblocks to the frequency domain and the discarding of information that corresponds to the higher frequencies in every patch, in which pixel information of abrupt changes and noise often resides. We hereby present the steps of our proposed technique: we first describe the plain CNN approach without the DCT pre-processing, and the explain the additional DCT step.
The deep learning approach presented in this work employs CNN for a pixel level classification of a hyper-spectral image. However, the training of such architectures requires the convolution of hundreds of channels along the network inputs. It is understood that training of deep CNN architectures requires the convolution of hundreds of channels along the network inputs. This increases the computational costs of training and prediction. Also through a statistical analysis of spectral responses of pixels that belong to the same class, one easily observes that the variance of responses is small. This suggests that pixels that belong to the same class tend to have very similar values at every channel. At the same time, pixels that belong to different classes present different spectral properties.
Based on the above, to achieve the best performance, a preprocessing step that uses a Randomised Principal Component Analysis (R-PCA) is employed. R-PCA is applied among the spectral information, without affecting the spatial information.
The number of principal components that are retained after the application of R-PCA, is appropriately set, in order to keep at least 99.9% of initial information. During the experimentation process on widely-used hyperspectral datasets, this amount of information is preserved by using the first 10 to 30 principal components, reducing this way up to 15 times dimensionality of the raw input.
The hyperspectral image can be expressed as a 3D tensor, # × % × & , # and % representing the spatial information and & representing the spectral bands. We analyse the captured hyperspectral images in − square patches. After the R-PCA method, in the traditional CNN approach, each patch is a tensor of × × -./ size, where parameter -./ is the number The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) of principal components extracted during the R-PCA method. During the performance evaluation the selected batch size, s can be tried for different values. Increasing the size of s also increases the number of neighbours considered during the classification. Thus, it is evident that increasing the size of s, also increases the algorithms complexity. The selection of the batch-size is a critical step in the training process, as it optimises the accuracy/complexity ratio of the algorithm. Selecting batch-sizes bigger than 13 usually deteriorate the performance of the model. The CNN structure design includes the aforementioned data input and preprocessing (R-PCA) layers. These layers are followed by two convolutional layers of 3×3 and 3× -./ size. During training, the standard backpropagation algorithm was employed, i.e. minimizing the negative log-likelihood of the datasets under the model parameters. The CNN architecture can be seen in Figure 1.
When assuming that hyperspectral data include noise, then we leverage the Discrete Cosine Transform. Transforming the information to the frequency domain, the core information is retained in low-frequency coefficients, whereas high-frequency coefficients tend to represent abrupt changes, edges, outliers and noise, hence eliminating this information can potentially provide robustness to noise. Based on this, the noisy hyperspectral data are dissembled into patches of a fixed size (different patch sizes have been investigated in the experimental evaluation, as noted above) and a Discrete Cosine Transform is applied to every patch. Thus, passing to the frequency domain, each patch is now represented by amplitude coefficients of increasingly high frequencies. In our method, the coefficients corresponding to the higher frequencies of each patch are turned into zero. More specifically, in the evaluation we have experimented with various percentages of zeroed coefficient values, ranging from 0% to 64% of the coefficients, always starting from the highest frequency, i.e. from the lower right corner of the patch.

Experimental Setup
In our study, we experimented and validated the developed framework with three publicly available datasets. In particular, we employed i) the Indian Pines dataset, which consists of 145 x 145 pixels and 224 spectral reflectance bands in the wavelength range 0.4 to 2.510…meters. ii) the Salinas dataset, which is an 224-band hyperspectral image, characterized by high resolution and iii) the Pavia university dataset, whose number of spectral bands account for 102.
Supervised training was conducted using the ground truth images of aforementioned datasets. In particular, we split the tagged parts of images into two sets, training and testing data, with a split ratio 5:95 for Salinas and Pavia datasets, whilst for the Indian Pines dataset we used a different ratio accounted for 25:75. The different ratio concerned the last-mentioned image, has to do with the high variability of the number of samples between classes.
The splitting was done randomly, and each experiment was conducted 10 times and the results demonstrated at the next section, are the average number of them. All models have been quantitatively validated in terms of classification accuracy.
In the experimental evaluation phase, two different types of noise were introduced. Gaussian noise, one with std = 300 and mean = 0 and one with std = 100 and mean = 0. We also applied the wellknown Salt and Pepper (SnP) noise. In particular, the overall amount of imported noise was 10% and 80% respectively, with an analogy between salt and pepper of 50:50.   The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition)   The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition)

Salinas Ground Truth
Gaussian Noise without DCT Gaussian Noise with DCT Salt and Pepper without DCT Salt and Pepper with DCT

Results
Experimental results are provided for various patch sizes, ranging from 5x5 up to 15x15. Furthermore, we have experimented with different ratios of zeroed high-frequency DCT coefficients. The results tables indicate the yielded classification accuracy of CNN, SVM and kNN models, for different patches and different ratios of zeroed high-frequency DCT coefficients (i.e., no coefficients zeroed, a sub-patch of size 3x3 in the lower-right corner zeroed and a sub-patch of size 5x5 in the lower-right corner zeroed). Figures 2, 4, and 6 present the accuracy rates for the Indian Pines, Salinas, and Pavia University datasets respectively, whereas Figures 3, 5, an 7 provide the hyperspectral classification maps for indicative cases of noisy data handled with plain CNN and DCT-CNN.
The results indicate that the approach based on DCT-CNN handles noisy hyperspectral data in a more effective manner than plain CNN approaches. As expected, the noise-tolerance technique based on CNN outperforms similar techniques applied on other learning models such as SVM and k-NN, although in those cases the benefits of DCT for noise robustness can be seen as well. It can also be observed that generally when the patch size increases, a higher number of zeroed high-frequency DCT coefficients tends to provide better performance rates.

CONCLUSION
In this paper, we have proposed DCT-CNN, a model that combines the representational power of Convolutional Neural The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) Networks with the noise elimination capabilities introduced by frequency-domain filtering enabled through the Discrete Cosine Transform. The presented method involves the transformation of pixel macroblocks to the frequency domain and the discarding of information that corresponds to the higher frequencies in every patch, in which pixel information of abrupt changes and noise often resides. Experiment results in Indian Pines, Salinas and Pavia University datasets indicate that the proposed DCT-CNN constitutes a promising new model for accurate hyperspectral image classification offering robustness to different types of noise, such as Gaussian and Salt and Pepper noise. As future directions of our work, we aim at investigating the efficacy of other transformations, such as wavelets, in conjunction with deep learning, as well as tensor-based machine learning which have been shown to require a lower amount of annotated data for training.