Deep Learning Algorithm for Urban Feature Extraction using SAR Data

Abstract. This paper aims to discusses the extraction of urban features from airborne NISAR (NASA-ISRO SAR) data using deep learning algorithm for a part of Ahmedabad City. NISAR data is acquired in two wavelength bands (L and S) in hybrid polarization i.e., RH and RV. This study has used level two data viz., amplitude data. Pre-processing of NISAR data in L and S wavelength bands was carried out by using MIDAS, software developed and provided by the Space Applications Centre. Pre-processing viz., Speckle suppression using different filters in varying window sizes, radiometric and geometric calibration was performed. Variation of backscattering coefficient (Sigma- nought) in different wavelengths and polarizations for different land use features were analysed. NISAR data in conjunction with LISS 4 (5.8 m resolution) data is subjected to different fusion techniques. Qualitative and Quantitative analysis was carried out and Gram Schmidt technique was chosen for further analysis. Segmentation was performed to achieve better analysis of the fused image and the amplitude image. Lastly, a deep learning architecture was developed for the automatic classification of the image, and the Convolution Neural Network model was designed using mobile net and the regularization techniques. Deep learning architecture in conjunction with e-cognition developer was used for extracting urban features.



INTRODUCTION
Synthetic aperture radar (SAR) is an active remote sensing technology that can gather ground information at any time and under any circumstances. In SAR imaging processing, the coherent interaction between elementary scatterers on the ground and the electromagnetic waves leads to a multiplicative noise, known as speckle, affecting the SAR images. It is essential to monitor urban change at spatial and temporal scales to monitor the changes in cities and their impact on natural assets and environmental systems. Analyse this urban feature using deep learning such as deep neural networks and the architecture has extracted the urban area. Deep learning-based algorithms have exceeded conventional algorithms in terms of performance by a significant limit.  Successful management policies include revised and accurate information representing the current urban status. Present reporting methods are inefficient and incapable of holding the information up to date. For better identification of the feature and to extract information remote sensing is used. Identification and monitoring of urban areas through airborne and spaceborne missions is the best approach. Synthetic aperture radar (SAR) has long been accepted in many remote sensing systems as an effective tool for urban data analysis due to its all-weather capability and depicting geometrical properties of the target. The deep learning algorithm is being used for the classification of the images and the backscatter signatures library. (Pottier 2011)

STUDY AREA AND DATA USED
The selected study area was Ahmedabad city one slice of bopal region. This is the growing city of Gujarat. The city Ahmedabad selected to characterize its urban form. Under the city corporation jurisdiction, the extent of the city is 190.84 square kilometres. The river Sabarmati flows inside the city Ahmedabad. The city is renowned for its textile industries. It is observing the massive rate of urban sprawl towards all directions around 20 km from its centre.
NISAR data acquire in two-wavelength, L, and S. It has two different polarizations. In this circular polarization data is transmitted in circular polarization. (R). It received in linear polarization horizontal and vertical.

METHODOLOGY
The methodology comprises of data preparation, analysis of the data and modelling. It is implemented in four phases: Literature survey, Selection of data and study area; Pre-processing of SAR data; Fusion techniques and Segmentation; Deep learning algorithm and Feature extraction. In the first phase understanding SAR, data and its basics for better implementation. Various manuscripts and Reports were referred to understand different approaches used to implement in the present study. In second phase, Pre-processing of NISAR data in L and S wavelength bands was carried out by using MIDAS, software developed by SAC. Pre-processing viz., Speckle suppression using different filters in varying window sizes, radiometric and geometric calibration was performed. After the pre-processing, Variation of backscattering coefficient (Sigmanought) in different wavelengths and polarizations for different features were analysed. For a better understanding and enhancement of the functionality, NISAR data was fused with LISS 4 MX data. Specific fusion techniques were performed and analysed, such as HIS (Intensity Hue Saturation), Borvey, Ehlers Resolution Merges, PCA (Principal Component Analysis) and Gram Schmidt (GS). Appropriate technique (GS) was selected from the qualitative and quantitative analysis. In the last stage, Segmentation was performed for better analysis of the fused image and the amplitude image. A deep learning architecture was developed for the automatic classification of the image, and the Convolution neural network model was designed using mobile net and the regularization techniques. Deep learning architecture in conjunction with e-cognition developer was used for extracting urban features. Aim of this research to extract urban features from the NISAR data. Pre-processing of the data and generation of the primary result was performed on Microwave Data Analysis Software (MIDAS) software. MIDAS is in-house developed SAR data processing software. This software is developed in Advanced Microwave and Hyperspectral Techniques Development Group (AMHTDG)/EPSA (Techniques ,2018). To start with, speckle suppression was carried out using different types of filters in different window sizes. From the different window sizes, we observed different results and one window size is fixed for our data set and further analysed. For the removing noise from the data, we used ten different types of speckle suppression filters and window sizes for identifying which filter has suited the best of the pre-processing of the data. The appropriate speckle suppression filter is chosen based on SSI value which is lower than a filter. From the trial-and-error method, we found out the Enhanced Lee with 3x3 window size is appropriate speckle suppression filter for this study.

Back Scatter co-efficient
Area of the isotropic target which will reflect (backscatter) power the same as that being received by the system, corresponding to a given transmit power. The normalized measured of the radar return from a distributed target is called the "Backscatter Co-efficient". Scattering by targets = returns in multiple directions.(Pottier 2011)

Figure 3. Pre-processed Image
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2021 XXIV ISPRS Congress (2021 edition)

Figure 4. Mean of Backscatter
As per the graph, values are different for different types of features. For the same wavelength in RH in two-band L and S. As per the graph, backscattering coefficient values of different land use are observed as urban area values range from -5 to +5 dB; Road values range from -12 to -20 dB; Water bodies range from -18 to -23 dB; Barren land lies between -13 to -19 dB and Agriculture area values lie between -8 to -11 dB. As per the analysis, we observed that L band RH has good results than the others. Therefore, that is useful for the identification of urban features. There is an overlap of backscatter coefficient values between the different features hence, they may not be discriminated accurately. L and S-band characteristics are different so after the process data, both bands are RH value is comparing with each other in the urban features. As per the graph is it has clearly shown that the result of urban features is better in the L band than the Sband.

Analysis on Processed Image
After the processed image, our goal is to extract urban features.
In the analysis, we observed that built-up in different orientation has different backscatter values. In order to analyse, different Region of Interests (ROI's) were selected for different building orientations as shown in the image. Variation of backscatter with respect to orientation of the building is carried out. The backscatter value differs for high-rise and low-rise buildings. ROI took for high-rise and low-rise buildings; a reflection of the high-rise building is more than the low-rise buildings. Same as backscatter value is different for the highly dense urban rea and sparsely dense urban areas as shown in the fig sparsely dense has low value compared to highly dense urban areas.

Image Fusion
The image fusion process allows merging multispectral imagery of relatively low spatial resolution with other co-registered panchromatic images of relatively high resolution. However, for this research, we are using fusion, not for high resolution but we use fusion for feature enhancement. Available literature suggests several approaches to perform the fusion. The commonly used process known as Principal Component Analysis (

Qualitative Test
From the visual interpretation, the fusion images are good.
From the different methods, which fusion method has improved results compared to other methods? The interpreter analyses the tone, contrast, saturation, sharpness, and texture of the fused images.

A) Borvey Method
For the histogram, this method developed to gain the observed votarist in the high and low end. Three-band at a time are combined from multispectral scene.

B) Intensity Hue Saturation(HIS) Method
It Separates spatial (intensity) and spectral information from a standard RGB image. Preserves more spatial features and more required functional information with no color distortion. Only three bands involved in it.(Web 2018) Figure 11. IHS Fused Image.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2021 XXIV ISPRS Congress (2021 edition)

C) Ehlers Resolution Merged Method
It is based on N IHS transform coupled with Fourier domain filtering. To facilitate these demands two prerequisites have to be addressed. First color and spatial information have to be separated. (Bao et al. 2012) Figure 14. Ehlers Fused Image.

E) Gram Schmidt Method
The gram-Schmidt process transforms a set of vectors into a new set of orthogonal and linearly independent vectors.(Web 2018) Figure 16. Gram Schmidt Fused Image.

Spectral Evaluation:
The bias of mean: The value is given relative to the mean value of the original image. The ideal value is zero. (Web 2018) Spatial Evaluation: Entropy: If the entropy of the fused image is higher than parent image then it indicates that the fused image contains more information. (Bao et al. 2012)

Table 3. Quantitative Statistical Test
As per the analysis, the mean and standard deviation values closer to those of the actual image value. Highest value obtained in Gram Schmidt fusion for Entropy. Same as lower value in Bias also obtained in the Gram Schmidt of this research Gram Schmidt method is highly take into consideration.

Segmentation
In this, it provides a detailed specification of different segmentation software. Later image segmentation applied using different image segmentation software on ENVI. Image segmentation parameter is defined by applying the trial-anderror approach and finally, the best segmentation result was drawn by making visual (assessment. Alternatively, consideration.) Quantitative quality assessment was made based on the visual observation besides a descriptive explanation of algorithms used by different segmentation tools was also included. The result achieved by applying multi-resolution segmentation.

Image Segmentation: Multi resolution Segmentation
Multiresolution segmentation is used to test the segmentation result. Scale parameter defined as 50 for image segmentation as well as the shape and colour properties assigned as 0.1 and 0.5 respectively. Overall segmentation quality is quite good, despite there are some over-segmentations and very few under segmentation. The major reason for the mismatch between the urban features edges polygon and the segmented result is because reference polygon was drawn much-generalized way represents the outline of the feature outline.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2021 XXIV ISPRS Congress (2021 edition)

Diagram of Convolution neural Network
Data Creation: After the fusion with the Liss-4 image, we have two different types of input for the classification of the image. One is amplitude RH image, and another is the fused image. In the first part, we have to create data for CNN. Using the deep learning algorithm tool, we sliced images into different window sizes. For this study, we take 256x 256, 350 x 350-window size taken into consideration. It takes an input file like JPEG, PNG, and Tiff formats. After the whole image sliced in particular window sizes then we have to label all the images. After that first part of the diagram is ready to input the images into the model.

RESULTS AND DISCUSSION
For target identification classes such as built-up, within water body and city, linear features like road and bridge, vegetated land, open field, and water body are taken to analyse and study. Also, certain linear features, such as roads and bridges, and trees, open fields, and water bodies.

Result of CNN for fused Image (256 X 256)
For, the CNN applied on fused image for the classification because it can identify all the features in the image. There was a 256-X-256 window size with the approx. 1800 image chips used in this model. (Table 5) For a better result, more the dataset gives more accurate results. We used regularization model with different accuracy we get in running our model.

Result of CNN for fused Image (256 X 256)
As per the window size is increasing the accuracy is decreasing but the feature classification is also better in this model. (   Here, as shown in fig 22, the result obtained for agriculture and plantation classification is better as per the feature in the image so the model accuracy can improve with the use of different epochs.

Result of CNN for amplitude Image (256 X 256)
In this part, we are using the only urban feature for classification of the image because the amplitude image has given better classification for urban features other features are mixing. (Table 7).

Result of CNN for amplitude Image (350 X 350)
As the window size increasing the training samples are also, decreasing hence the accuracy is low and the loss is higher compared to the 256-window size. (   For this model, we achieved 0.66% of accuracy and 0.048 % (as shown in Fig.26) of loss where the loss in small but accuracy as some of the features like highly dense feature classification is overfitting in this model. Also, for other feature like the low dense area, it identifies clearly. (Fig.27).

Extracting Urban features from the Image
Using the e-cognition developer for different features assigns the class to each feature. In the first part after the final image, we have to first segment the image for the identification of different features from the image and assign the class to each feature for the classification of the image. Then the feature was selected using the different backscatter values. After selecting the feature from the value, the polygons are merged with the same feature. For the removal of the small polygons and those are not lies in the urban features has to remove from the image using the geometry the area pixel is used. After the removal, the neighbouring urban polygons were merged. Then the features were ready to extract from the image. As per the different formats, we extract the features. As shown in fig 49, from the amplitude image we extract the urban features. As we can compare the result with the fused image, (as shown in Fig.28) that in the fused image we can identify all the features. In the amplitude image, we can identify the urban feature only, besides, we can also identify the Highly dense, Low dense and high rise building from the image and Extract that feature from the image as shown in the fig.   Figure 28. Fused and Amplitude Image feature extraction.

CONCLUSION AND FUTURE SCOPE
The present study addressed the potential of airborne NISAR data to discriminate against the land cover classes emphasizing the urban area. Pre-processing of NISAR data was carried out and it is observed that Enhanced Lee filter with window size 3X3 is suitable for the present study. Different type of fusion techniques was used as SAR and optical data give complementary information about the target. Based on qualitative and quantitative analysis, fused data of NISAR and The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2021 XXIV ISPRS Congress (2021 edition) L4 MX using Gram-Schmitt techniques were found suitable for the study and further analysed. For automatic classification of the image, a deep learning architecture was developed, and Convolution neural network model was built with the use of mobile net. In model training and testing, labelled data for the model has been created and using different types of hyperactive parameters. Classification accuracy of 80.67% and 70.81 % was achieved for fused and amplitude images, respectively. We also observed that with the increase in no. of Epochs i.e., one of the hyper-parameters; the accuracy is increased, and loss is decreasing. Deep learning architecture in conjunction with ecognition developer has been used for extracting urban features by applying different types of the rule set for each feature and assigning classes. Segmentation of fused image was able to discriminate different land cover classes viz., Urban, vegetation, open land, water, roads while segmentation of RH amplitude data was able to discriminate different built-up forms viz., High rise, high dense and sparse dense built-up area.
This research has a vast scope in the future. In the present study, the identification of urban polygons was carried out using CNN. However, with the availability of high-resolution data, identification of slums, industrial areas, residential areas, etc within the urban area can also be done accurately. Urban growth monitoring, change detection, and prediction can be addressed more accurately as per today's need using deep learning. By using more no. of datasets for automation in the classification of images, the accuracy of more than 90% can also be achieved.