IMPROVING LULC CLASSIFICATION FROM SATELLITE IMAGERY USING DEEP LEARNING - EUROSAT DATASET

: Machine learning (ML) has proven useful for a very large number of applications in several domains. It has realized a remarkable growth in remote-sensing image analysis over the past few years. Deep Learning (DL) a subset of machine learning were applied in this work to achieve a better classification of Land Use Land Cover (LULC) in satellite imagery using Convolutional Neural Networks (CNNs). EuroSAT benchmarking data set is used as training data set which uses Sentinel-2 satellite images. Sentinel-2 provides images with 13 spectral feature bands, but surprisingly little attention has been paid to these features in deep learning models. The majority of applications focused only on using RGB due to high availability of the RGB models in computer vision. While RGB gives an accuracy of 96.83% using CNN, we are presenting two approaches to improve the classification performance of Sentinel-2 images. In the first approach, features are extracted from 13 spectral feature bands of Sentinel-2 instead of RGB which leads to accuracy of 98.78%. In the second approach features are extracted from 13 spectral bands of Sentinel-2 in addition to calculated indices used in LULC like Blue Ratio (BR), Vegetation index based on Red Edge (VIRE) and Normalized Near Infrared (NNIR), etc. which gives a better accuracy of 99.58%.


INTRODUCTION
Remote sensing plays a significant role in the modern world. It supports and assists the society in several fields not limited to agriculture, environmental monitoring, geology, hydrology, and LULC. Remote sensing facilitates the country's development as to classify and monitor land use, as well as to detect problems in dangerous, or inaccessible areas.
Satellite technology can generate regular updates on urban areas, desertification, agriculture land monitoring, crop area estimation, soil mapping and monitoring, water resources monitoring, identification of the characteristics of soil, water, crop, etc.
This study uses the Sentinel-2 images for analysis because these satellite data are free to use, easy to obtain, enough revisit time, and are capable of supporting LULC analysis.
In this paper, we aim to build a model able to classify and indicate the land use and cover situation using satellite imagery. This work looks into improving a LULC classification method using remote sensing technology. In addition, it explores new proposed methods for supervised LULC classifications using deep learning methods, specifically CNN.
This research proposes two approaches to improve performance.
The first approach investigates the us 1 e of all 13 spectral bands which have shown success in other image classification models. While the second approach involves the use of LULC calculated indices. These two approaches can be formulated in the following research sub-questions: 1. Can using the 13 bands of Sentinel-2 spectral features improve the LULC classification performance? * Corresponding author 2. Can involving the LULC calculated indices into the 13 bands of Sentinel-2 spectral features improve the LULC classification performance?
The overall objective is to evaluate the new model, which suggests to use the major 13 bands of Sentinel-2 in addition to LULC calculated indices. (Chong, 2020) implemented increasingly complex deep learning models to identify LULC classifications on the EuroSAT dataset [16]. The best overall model uses all 13 bands and a 50% training set with 4 convolution-max-pooling layer pairs before a dropout layer and a dense layer. Indeed, training data has been augmented through random shearing, rotating, and flipping. It was able to accurately predict the classification for 94.9% of the testing set.
While his best RGB model based on VGG16 with image augmentation through random shearing, rotating, and flipping for the training dataset, this model accurately classified 94.5% of the testing set images.
Knowledge distillation (KD) is one kind of teacher-student training (TST) method first defined in (Hinton et al., 2015) [1], in which they distill knowledge from an ensemble of models into a single smaller model via high-temperature softmax training. (Chen et al., 2018) introduced the KD into remote sensing scene classification for the first time to improve the performance of small and shallow network models [15]. They performed experiments on several public datasets including EuroSat, and make quantitative analysis to verify the effectiveness of KD. (Chen et al., 2018) were able to achieve a total accuracy of 94.74% in their proposed model. (Sonune, 2020) tried to fine-tune 3 models using RGB bands [17]. The first model is VGG_19 which achieved a classification accuracy of 97.66%. While the second model ResNet_50 model had a classification accuracy of 94.25%, and for the RandomForest model classification the accuracy was 61.46%. (Helber et al., 2019), the creators of the novel dataset EuroSAT [12], experimented two approaches. First they compared GoogLeNet and ResNet-50 models using RGB bands and achieved consecutively an accuracy of 98.18%, and 98.57%. In the second approach, they tried to evaluate different band combinations, RGB, CI, and SWIR and achieved the following accuracies respectively 98.57%, 98.30%, and 97.05%. (Li et al., 2020), proposed the DDRL-AM method for remote sensing scene classification [18]. They addressed the problem of class ambiguity by learning more discriminative features. The approach involves two main tasks: (1) Building a two-stream architecture to fuse attention map semantic feature with original image semantic feature; (2) Training DDRL-AM that is coupled with a center loss to obtain discriminative feature for remote sensing images.
Extensive experiments were conducted on EuroSAT Dataset and obtained a classification accuracy of 98.74%.

Data
EuroSAT dataset [12], which this research depends on, is based on Sentinel-2 satellite images covering 13 spectral bands and consisting out of 10 classes within a total of 27,000 labelled and geo-referenced images.  Table 1.

Sentinel
The three bands B01, B09 and B10 are intended to be used for the correction of atmospheric effects (e.g. aerosols, cirrus or water vapor). The remaining bands are primarily intended to identify and monitor land use and land cover classes. In addition to mainland, large islands as well as inland and coastal waters are covered by these two satellites.
Each satellite will deliver imagery for at least 7 years with a spatial resolution of up to 10 meters per pixel. Both satellites carry fuel for up to 12 years of operation which allows for an extension of the operation. The two-satellite constellation generates a coverage of almost the entire Earth's land surface about every five days, i.e. the satellites capture each point in the covered area about every five days. This short repeat cycle as well as the future availability of the Sentinel satellites allows a continuous monitoring of the Earth's land surface for the next 20 -30 years. Most importantly, the data is openly and freely accessible and can be used for any application (commercial or non-commercial use).  Looking at the preview of the different classes, we can see some similarities and stark differences between the classes (Fig. 1).
Urban environments such as Highway, Residential and Industrial images all contain structures and some roadways. AnnualCrops and PermanentCrops both feature agricultural land cover, with straight lines delineating different crop fields. Finally, HerbaceaousVegetation, Pasture, and Forests feature natural land cover. Rivers also could be categorized as natural land cover as well, but may be easier to distinguish from the other natural classes.
If we consider the content of each image, we might be able to estimate which classes might be confused for each other. For example, an image of a river might be mistaken for a highway, or an image of a highway junction with surrounding buildings could be mistaken for an Industrial site. We'll have to train a classifier powerful enough to differentiate these nuances.

Convolutional Neural Networks (CNNs)
Convolutional Neural Networks (CNNs) are a type of Neural Networks [10] which became with the impressive results on image classification challenges [5], [7], [19], the state of-the-art image classification method in computer vision and machine learning.
To classify remotely sensed images, various different feature extraction and classification methods (e.g., Random Forests) were evaluated on the introduced datasets. (Yang et al., 2010) evaluated Bag-of-Visual-Words (BoVW) and spatial extension approaches on the UCM dataset [8]. (Basu et al., 2015) analyzed deep belief networks, basic CNNs and stacked denoising autoencoders on the SAT-6 dataset [14]. (Basu et al., 2015) also presented an own framework for the land cover classes introduced in the SAT-6 dataset.
The framework extracts features from the input images, normalizes the extracted features, and uses the normalized features as input to a deep belief network. Besides low-level color descriptors, (Penatti et al., 2015) also evaluated deep CNNs on the UCM and BCS dataset [4]. In addition to deep CNNs, Castelluccio et al. (2015) intensively evaluated various machine learning methods (e.g., Bag-of-Visual-Words, spatial pyramid match kernels) for the classification of the UCM and BCS dataset [2].
In the context of deep learning, the used deep CNNs have been trained from scratch or fine-tuned by using a pretrained network [2], [3]. The networks were mainly pretrained on the ILSVRC-2012 image classification challenge dataset [11]. Even though these pretrained networks were trained on images from a totally different domain, the features generalized well. Therefore, the pretrained networks proved to be suitable for the classification of remotely sensed images [9]. The previous works extensively evaluated proposed machine learning methods and concluded that that deep CNNs outperform non-deep learning approaches on the considered datasets [2], [6], [11], [9].

Satellite Remote Sensing
Remote sensing is the process of identifying the physical characteristics of an object or an area by measuring its reflected and emitted radiation at a distance.
Each material has a unique spectral signature which becomes the basic criterion for material identification. The graph below in Fig. 2 depicts the typical reflectance characteristics of water, vegetation, and soil. The typical curves of water, vegetation and soils are close in the visible region and quite different in the infrared spectrum.

Figure 2. Material reflectance at different wavelengths
The basic information for developing classification models is discriminative reflectance characteristics of materials at different wavelengths [13]. The spectral reflectance characteristics of Sentinel -2 image data for different crops are presented in Fig. 3. In the visible region, the spectral reflectance values of different crops are close but in the infrared region, crops are discriminable.

Dataset Preparation
As mentioned before, the EuroSAT dataset is split into 10 classes of land cover. Each class varies in size (Fig. 4), thus we have to split the data into training, validation, and testing sets respectively 70%, 20%, and 10% per class.

Figure 4. EuroSAT Class Distribution
In the ImageDataGenerator, the batch size is 64. For the training dataset, we applied rotation, horizontal, and vertical flip for the images to generate and handle more data. Moreover, we calculated the mean and standard deviation(std) for the bands and indices as shown in Table 2. These values were used to normalize the inputs as recommended in deep learning models by subtracting the mean per channel and then divide by the std value. In the RGB model, we read the bands 2, 3, and 4 respectively, while in All 13 bands model, we read all the 13 bands, while in the new proposed model, we read all the 13 bands and calculated the following indices (MSI, NDSI, NDWI, NDVI, GNDVI, BNDVI, NDRE, BareSoil, NDCI, NBR, DSWI, CVI) and appended them to the data.

Calculated Indices
Today many different remote sensing indices exists. It was successfully used for the identification of drifting sand areas, vegetation cover, loss of wetlands, and urban land use mapping. The mentioned calculated indices used in the new model are mainly related to LULC classification, we present an overview for these indices as followed:

MSI: Moisture index
The index is inverted relative to the other water vegetation indices; higher values indicate greater water stress and less water content. The values of this index range from 0 to more than 3. The common range for green vegetation is 0.4 to 2.

NDSI: Normalized difference snow index
Normalized difference snow index is a ratio of two bands: one in the VIR (Band 3) and one in the SWIR (Band 11). Values above 0.42 are usually snow. NDSI = (B03 -B11) / (B03 + B11)

NDWI: Normalized difference water index
NDWI was proposed by (McFeeters 1996). It is used to monitor changes related to water content in water bodies. As water bodies strongly absorb light in visible to infrared electromagnetic spectrum, NDWI uses green and near infrared bands to highlight water bodies. It is sensitive to built-up land and can result in over-estimation of water bodies. NDWI = (B03 -B08) / (B03 + B08) Index values greater than 0.5 usually correspond to water bodies. Vegetation usually corresponds to much smaller values and built-up areas to values between zero and 0.2.

GNDVI: Green normalized difference vegetation index
GNDVI is similar to NDVI, but it uses visible green instead of visible red and near infrared. Useful for measuring rates of photosynthesis and monitoring the plant stress. GNDVI = (B08 -B03) / (B08 + B03)

BNDVI: Blue Normalized Difference Vegetation Index
BNDVI is an index similar to NDVI, but without red channel availability it uses the visible blue, this index is useful for areas sensitive to chlorophyll content. BNDVI = (B08 -B02) / (B08 + B02)

NDRE: Normalized Difference Red Edge
This index formulated with NIR and red edge band and it is useful for areas sensitive to chlorophyll content in leaves against soil background effects. NDVI_RedEdge = (B08 -B05) / (B08 + B05)

NDCI: Normalized difference chlorophyll index
NDCI is an index that aims to predict the plant chlorophyll content which plays a critical role in plant growth and helps predicting the plant type in addition to its health condition. It is calculated using the red spectral band B04 with the red edge spectral band B05. NDCI = (B05 -B04) / (B05 + B04)

NBR: Normalized burn ratio
To detect burned areas, the NBR index is the most appropriate choice. Using bands 8 and 12 it highlights burnt areas in large fire zones greater than 500 acres, with Darker pixels indicate burned areas. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2021 XXIV ISPRS Congress (2021 edition) NBR = (B08 -B12) / (B08 + B12) To observe burn severity, you may subtract the post-fire NBR image from the pre-fire NBR image.

DSWI: Disease-Water Stress Index
This index is used to classify the water stress diseases and its health condition. DSWI = (B03) / (B04)

CVI: Chlorophyll vegetation index
CVI is an index that aims to predict the plant chlorophyll vegetation index. Chlorophyll plays a critical role in plant growth and helps predicting the plant type in addition to its health condition. CVI = (B08 * B04) / (B03)2

Model Architecture
In this research we use DenseNet201 architecture excluding the fully-connected layer at the top of the network and without loading its defined weights (random initialization) with three inputs: width, height, and number of channels respectively (W, H, C# Moreover, we added a global spatial average pooling layer using GlobalAveragePooling2D, and the Softmax activation function is applied to the very last layer in the model since it is useful as the activation for the last layer of a classification network.

Figure 5. Updated Dense201 Model Architecture
This model was compiled using Adam optimizer, this optimizer is an extension to stochastic gradient descent that has recently seen broader adoption for deep learning applications in computer vision and natural language processing. The compile parameters configured the 'categorical_crossentropy' as the loss function, and 'categorical_accuracy' as model metric. After compiling the model, we run the training on the training datasets using fit function, with two callback functions, 'Checkpointer' to monitor the accuracies and save the best weights for the model which are saved on Google Cloud storage buckets, and 'EarlyStopping' to stop the training when a monitored metric has stopped improving, which specified in the training with patience equal to 50 epochs. The number of epochs specified to train the model is maximum 10000 which is the number of maximum iterations over the entire data provided. Moreover, each epoch trains 1000 batch of samples which is 64, i.e. 64000 image to be trained every epoch.

EXPERIMENTAL RESULTS
This section presents the results of the various models including Confusion Matrix, Classification Report, and mislabeled images.

Models Benchmarking
We are presenting and comparing the results of the new models supported by analyzing the Confusion Matrix, Classification Report, and mislabeled images of each model.

RGB Model
After 60 epochs the training process for the RGB model finished with accuracy 96.83%, with an unstable learning as seen in the accuracy graph (Fig. 6) and the loss graph (Fig. 7).  Although this model is straightforward, it achieves a good accuracy of 96.83%. Below we present the Confusion Matrix (Fig. 8) and Classification Report (Fig. 9) for this model. We can find from both figures that this model perfectly classifies the main classes with minimal wrong classifications, but has limitation in differentiating from similar classes ( Table 3)  We can find from the Classification Report (Fig. 12) and Confusion Matrix (Fig. 13) This model input consists of 25 channels, in which 13 are from  the Sentinel-2 bands and 12 are calculated indices as follows:  MSI, NDSI, NDWI, NDVI, GNDVI, BNDVI, NDRE, BareSoil,  NDCI, NBR, DSWI, and CVI. After 90 epochs the training process finished with accuracy 99.58%, with better learning as seen in the accuracy graph (Fig.  14) and the loss graph (Fig. 15) Classification Report (Fig. 17) for this model, we can find that involving the indicated indices that targets LULC classification to the proposed model helps better classify the similar classes as follows (Table 3): This significant enhancement in the LULC classification performance for the Sentinel-2 images in the new model can play a major role in our society especially in the environmental and agricultural sectors. These accurate classifications can help us manage, monitor and predict the agricultural areas periodically in a wide range without any efforts in going onsite, as well as to detect problems in dangerous, or inaccessible areas.

All Bands with Calculated Indices Model
Due to time and computational limitations, we are forced in this study to cover only the novel EuroSAT dataset. Future research may focus on the BigEarthNet Dataset using TPU, and benchmarking different sets of calculated indices.