DIMENSIONALITY REDUCTION VIA AN ORTHOGONAL AUTOENCODER APPROACH FOR HYPERSPECTRAL IMAGE CLASSIFICATION

Nowadays, the increasing amount of information provided by hyperspectral sensors requires optimal solutions to ease the subsequent analysis of the produced data. A common issue in this matter relates to the hyperspectral data representation for classification tasks. Existing approaches address the data representation problem by performing a dimensionality reduction over the original data. However, mining complementary features that reduce the redundancy from the multiple levels of hyperspectral images remains challenging. Thus, exploiting the representation power of neural networks based techniques becomes an attractive alternative in this matter. In this work, we propose a novel dimensionality reduction implementation for hyperspectral imaging based on autoencoders, ensuring the orthogonality among features to reduce the redundancy in hyperspectral data. The experiments conducted on the Pavia University, the Kennedy Space Center, and Botswana hyperspectral datasets evidence such representation power of our approach, leading to better classification performances compared to traditional hyperspectral dimensionality reduction algorithms.


INTRODUCTION
According to Ghamisi et al. (2017b), recent advances in hyperspectral imaging acquisition provides rich and boundless spectral and spatial information. Subsequent hyperspectral image analysis can be used for different Remote Sensing applications, for example, Earth's observation, monitor environmental changes, forest management, precision agriculture, urban planning, up to disease detection, among many others. Most such applications base their final processes on well-known classification or segmentation schemes (Ghamisi et al., 2017a;Su et al., 2017;Benediktsson et al., 2015), or just profit of such increasing spectral resolution to improve pattern recognition processes (Ghamisi et al., 2017b).
However, several factors in hyperspectral image analysis, such as their high dimensionality, data complexity, mixing pixels, to name a few, make this type of data naturally non-linear and complex, posing challenges to effectively process and analyze hyperspectral image data. In this sense, a major challenge that hyperspectral imaging scientific community has to overcome relies on adequately handling the enormous spectral dimensionality associated with hyperspectral images (Ghamisi et al., 2017b;Thilagavathi et al., 2018). Such characteristic exposes an increasing need for computational power along with diminished loss of essential characteristics in the data (Su et al., 2017;Sellami et al., 2018). To circumvent these issues, the use of dimensionality reduction techniques has shown to be a good alternative in the analysis of hyperspectral images.
In this regard, there exist a plethora of methods devoted to describing hyperspectral imaging data into lower-dimensional spaces, either thought a feature selection (Zhang et al., 2015) or, most commonly, via feature extraction processes. Among the latter, there is a special interest of developing methods based on supervised learning and semi-supervised learning, such as the Principal Component Analysis and its variants (Harsanyi et al., 1994;Zang et al., 2012;Jia et al., 2013;Su et al., 2017;Sellami et al., 2018;Xu et al., 2018).
Nevertheless, thus far, there exists the necessity of improving such processes, preserving the nature of the data, maximizing the independence among the new features, and recently retaining the neighborhood relationships among the image pixels (Huang et al., 2019b). Most recently, the emergence of deep learning technologies has allowed sophisticated approaches for performing such dimensionality reduction tasks (Arati et al., 2019;Huang et al., 2019a). For example, Chen et al. (2014) employed Autoencoders, whereas Liu et al. (2018) used Recurrent Neural Networks for hyperspectral image classification tasks. In a similar path, Wang et al. (2019) developed a semi-supervised learning method for clustering, which improves class-separability among the latent features via Orthogonal Autoencoders in general classification tasks, however, to the best of our knowledge, Wang's method was never used in the hyperspectral image analysis thus far.
In this work, we propose a novel semi-supervised approach to reduce the spectral dimensionality of hyperspectral images. Our method, called Hyperspectral Orthogonal Autoencoders (HOAE), extends the work developed by Wang et al. (2019) on autoencoders to ensure orthogonality among the features compounding the new lower-dimensional space from hyperspectral image data. To assess our approach, first, we used it to reduce the spectral dimensionality of hyperspectral images corresponding to the Pavia University, Kennedy Space Center, and Botswana public hyperspectral datasets. Next, the outcomes served as input for performing a classification process via a Support Vector Machine (SVM) algorithm. Then, the classification results were compared to the classification benchmarks reported in Ghamisi et al. (2017a), using a Principal Component Analysis (PCA) and conventional Autoencoders (AE) for the corresponding dimensionality reduction step over the three datasets.
The rest of the paper is organized as follows. The next section presents the considerations for implementing our method with Autoencoders. Section 3 provides information regarding the experimental procedure adopted to assess the performance of our method. Section 4 exposes the analysis performed over the experimental results. Finally, Section 5 finishes the paper presenting our final remarks and future works.

HYPERSPECTRAL DIMENSIONALITY REDUCTION VIA ORTHOGONAL AUTOENCODERS
In this section we describe our dimensionality reduction technique for hyperspectral images via autoencoders. Unlike typical neural network-based approaches, our method, called Hyperspectral Orthogonal Autoencoders (HOAE), learns to project hyperspectral image data onto an orthogonal space of lower dimensions, which improves subsequent hyperspectral image classification schemes.
Conventional Autoencoders are neural networks architectures composed of an encoding stage, followed by a decoding stage, which aim is to recreate given input patterns. In essence, an Autoencoder is built upon a visible input layer of d neurons, one hidden layer of h neurons, and an output layer of d neurons, as depicted in Figure 1. Thus, given an input pattern x ∈ R d , representing a pixel within an hyperspectral image, the Autoencoder reconstructs it by first mapping x into a latent representation z ∈ R h via the decoder f (·), then mapping z to a very close estimate of the original input data x ∈ R d through the decoder g (·). Formally, both the encoder and decoder are defined in equations 1 and 2, respectively, where W f , σ f , and b f represent the weights, activation function and bias associated with the encoder, and Wg, σg, and bg denote the weights, activation function and bias for the decoder, respectively.
Generally, the weights W f and Wg, as well as the biases vectors b f and bg, are estimated using the Backpropagation algorithm, which aims to minimize the reconstruction error L (·) between the input pattern x and its reconstructed estimate x, according to Equation 3, where · represents the Euclidean norm operator.
During Autoencoder's training process, the network enforces the learning of a compressed representation z of the input pattern x in a latent space Z. However, estimating the network's parameters via the minimization of the reconstruction error L (·) does not guarantee that the features in the latent space are orthogonal to each other, which might provide a better classdiscriminability. Thus, in order to ensure orthogonality among the components in the latent space, we added a regularization term into the reconstruction error L (·), similar to Wang et al. (2019). Formally, the orthogonal reconstruction error L (·) can be defined as: where, I is the identity matrix, z T denotes the transpose of the compressed representation z, and λ is a penalization parameter. It is worth noticing that a zero value in the penalization parameter leads to a conventional Autoencoder.

EXPERIMENTAL SETTINGS
We validated our orthogonal dimensionality reduction proposal in the context of hyperspectral image classification. Thus, we compared the classification performances from a Support Vector Machine (SVM) classifier using the features provided by our method (HOAE), the Principal Component Analysis (PCA), and the conventional Autoencoder (AE), when executed on three hyperspectral images datasets.
Next, we provide information about the datasets, the HOAE implementation details, and the experimental procedure adopted to conduct our experiments.

Datasets Descriptions
The datasets used in our study are among the most challenging datasets for hyperspectral image classification tasks as they contain a mixture of different land cover and land used characteristics.
• dataset is a collection of 145 spectral images covering a portion of the spectrum ranging from 400 ηm to 2500 ηm. The images are 1476×256 image resolution and 30 m spatial resolution per pixel, containing information of 14 land cover types, such as hippo grass, floodplain grasses, riparian, acacia woodlands, water, among others.

HOAE Implementation Details
We implemented our dimensionality reduction approach for hyperspectral images in Keras 2.3.1 and Python 3.7.6 programming language. Moreover, we constructed the HOAE with a single hidden layer and tied weights to simplify the network training.
On the training stage, we adopted the min-max normalization strategy to mitigate the effects of large input values to the network and randomly shuffled the training data on each epoch. The HOAE was configured to use the hyperbolic tangent activation function and the Glorot and Bengio's algorithm for weight initialization at the encoder layer.
As for the optimizer, the network was set to use the RMSprop algorithm with a learning rate equal to 0.001 and the moving average value of 0.9 for faster convergence. Finally, we included the early stopping criteria to obtain the best performing model in reconstructing the input patterns.

Experimental Design
We evaluated our HOAE dimensionality reduction algorithm in a hyperspectral image classification context. The main idea is to assess the impact of our technique, considered as a feature extractor, within a classification process using a Support Vector Machine (SVM) as the classifier. To further validate our proposal, we included the Principal Component Analysis (PCA) and a standard Autoencoder with tied weights (AE) as features extractors in the classification procedure.
For the execution of our experiments, we divided the hyperspectral datasets into training, validation, and testing sets.
Moreover, we adopted a uniform class distribution strategy to build the training and validation sets, so the classes received an equal amount of training and validation samples, respectively. Explicitly, the number of training samples per class corresponded to 65% of the samples in the minority class, whereas the validation samples corresponded to 15%. The remaining samples were used to construct the testing set. Notice that the validation set was provided to the HOAE and AE methodologies to evaluate the reconstruction error on networks' training to obtain the best models.
Finally, we compared the classification performances based on the overall classification accuracy, producer's accuracy, and the average accuracy over the PaviaU, KSC, and Botswana hyperspectral datasets. To maintain an uniform evaluation across the datasets, we set the PCA, AE and HOAE to provide reduced spaces of 1, 2, 4, 8, 16, 32, and 64 features. We also include the penalization parameter λ in our analysis by setting its value to 0.001, 0.01, and 0.1.

RESULTS AND EXPERIMENTAL ANALYSIS
In this section, we analyze the results obtained from the execution of the experiments described in Section 3.3, regarding the use of the HOAE as an alternative technique to reduce the dimensionality of the spectral input space in hyperspectral image classification tasks. In general, the results in Table 2 indicate that our method contributed to class-discriminability, as it produced steady and re- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress ( Table 2. Overall accuracy classification performances for the Support Vector Machine classifier in combination with the Principal Component Analysis (PCA), Autoencoder (AE), and Hyperspectral Orthogonal Autoencoder (HOAE) feature extraction techniques.
The results correspond to a different number of dimensions in the reduced space and penalization parameter (λ).
markably better classification performances compared to the PCA and AE techniques. Moreover, the results show that the HOAE method mostly ranked first and second (strong and light gray cells, respectively) across the different features spaces of 1, 2, 4, 8, 16, 32, and 64 dimensions on the three hyperspectral datasets.
Except for a few cases, the results also reveal that unlike the PCA and AE techniques, the classification performances associated with the HOAE tend to improve as the number of dimensions in the reduced spaces increased, scoring the best global performances after the eighth dimension, as represented by the values in black bold from Table 2.
Despite the good results of our proposal, it is not possible to establish any specific behavior regarding the values of the penalization parameter (λ) for classification, which represents an opening roadway to investigate.
On the following, we present a classifier's performance analysis focused on the producer's and average accuracies for the Pavia University hyperspectral dataset. We expect to describe such analysis for the Kennedy Space Center and Botswana datasets in the near future. Table 3 presents the producer's accuracies, along with the average accuracies and overall accuracies of the best performing classification models on the testing sets regarding the Pavia University hyperspectral datasets. The classification models correspond to the combination of the SVM classifier with the PCA, OE, and HOAE feature extractors techniques that achieved the best overall accuracies across their different parameter configurations, as presented in Table 2.
The results in Table 3 Table 3. Summary of the producer, average, and overall accuracies over the Pavia University hyperspectral dataset produced by the SVM classifier working on a reduced feature space of 8 dimensions provided by the PCA, AE, and HOAE (with a λ equal to 0.001) feature extraction schemes. Figure 2 presents the false color composition (a), the ground truth (b) and class predictions for the testing sets from Pavia University hyperspectral dataset regarding the best performing models corresponding to the reduced feature space of eight dimensions, as can be inferred from Table 2, which involved the use of the PCA (c), AE (d), and HOAE (e) methods. It is worth mentioning that the pixels associated with the missing values causing the salt and pepper effect in the ground truth image correspond to the data used to train the models.
A visual inspection of such images give and insight that our method (HOAE) provide promising class-discriminability attributes to obtain better classifications in comparison to the PCA and AE based models. Nevertheless, the three methods seem to be affected by the reflective characteristics, which are similar, between the Gravel and Self-blocking Bricks classes, as the SVM returned mixed responses for such classes. Finally, despite the reasonably good predictions, the Bare soil class represents a challenge to overcome, as the classifier got confused with Meadows and Self-blocking Bricks classes.

CONCLUSIONS AND FUTURE WORK
In this work, we have presented a novel dimensionality reduction algorithm for hyperspectral images based on an Autoencoder, called Hyperspectral Orthogonal Autoencoders (HOAE). In contrast to typical Autoencoders, our approach enabled the learning of orthogonal features in some lower-dimensional space by adding an orthogonal constrain within the loss function.
Our experiments conducted on the Pavia University, Kennedy Space Center, and Botswana hyperspectral datasets confirm that the orthogonal features, provided by our method, contributed to better classification rates in comparison with the standard Principal Component Analysis and the typical Autoencoder methods.
Despite the remarkably good classification performances originated from the use of our method, as related works, we envisage its adaption to complex neural network architectures, such as the Convolutional Autoencoders, to further improve our results. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition)