DEEP LEARNING-BASED METHOD TO EXTEND THE TIME SERIES OF GLOBAL ANNUAL VIIRS-LIKE NIGHTTIME LIGHT DATA

The nighttime light (NTL) remote sensed imagery has been applied in monitoring human activities from many perspectives. As the two most widely used NTL satellites, the Defense Meteorological Satellite Program (DMSP) Operational Linescan System and the Suomi National Polar-orbiting Partnership (NPP)-Visible Infrared Imaging Radiometer Suite (VIIRS) have different spatial and radiometric resolutions. Thus, some long-time series analysis cannot be conducted without effective and accurate cross-calibration of these two datasets. In this study, we proposed a deep-learning based model to simulate VIIRS-liked DMSP NTL data by integrating the enhanced vegetation index (EVI) data product from MODIS. By evaluating the spatial pattern of the results, the modified SelfSupervised Sparse-to-Dense networks delivered satisfying results of spatial resolution downscaling. The quantitative analysing of the simulated VIIRS-liked DMSP NTL with original VIIRS NTL showed a good consistency at the pixel level of four selected sub datasets with R2 ranging from 0.64 to 0.76, and RMSE ranging from 3.96-9.55. Our method presents that the deep learning model can learn from relatively raw data instead of fine processed data based on expert knowledge to cross-sensor calibration and simulation NTL data.


INTRODUCTION
Since 1992 the first spaceborne nighttime light sensor Defense Meteorological Satellite Program (DMSP) Operational Linescan System launched, with the development of nighttime light (NTL) remote sensors, more NTL satellites have been launched such as Suomi National Polar-orbiting Partnership (NPP)-Visible Infrared Imaging Radiometer Suite (VIIRS) (Elvidge et al., 2021), LuoJia1-01 , and EROS-B (Levin et al., 2014). The NTL remote sensed imagery has been widely used to monitor human activities from many perspectives. Researchers used NTL data in monitoring urbanization (Wang et al., 2021), estimating and mapping gross domestic product (Elvidge et al., 1997), mapping greenhouse gas emissions (Oda and Maksyutov, 2011), monitoring and predicting urban crime Yang et al., 2020), managing and monitoring disasters (Molthan and Jedlovec, 2013) and regional armed conflicts (Román and Stokes, 2015). It is noteworthy that most of these studies were used the DMSP and VIIRS NTL data.
DMSP NTL measures lights from cities, towns, and other lighting areas at night, in digital numbers ranging from 0 to 63 (6-bit depth) . The annual DMSP product is in 30 arc-second grids, which is approximate 1 km spatial resolution at the equator . NPP-VIIRS day/night band (DNB) started collecting data from April 2012. Due to the lower radiometric resolution and recording the digital number instead of radiance value, DMSP NTL data has saturation and blooming problems in urban areas (Levin et al., 2020). Similar to DMSP NTL, VIIRS NTL also spans the globe from -180 to 180 degrees longitude and -65 to 75 degrees latitude (Elvidge et al., 2017). The products are produced in 15 arcsecond geographic grids (approximate 500 m spatial resolution at the equator) (Elvidge et al., 2017). The VIIRS NTL data unit is nW/cm 2 /sr, and data is stored in a 14-bit depth (Elvidge et al., 2017).
Due to DMSP and VIIRS two datasets having different spatial and radiometric resolutions, some applications of NTL data are limited to conduct a long-term time series analysis. For instance, Zhou et al. presented a globe urban dynamic monitoring using DMSP NTL which was limited to 1992-2013 . Chen and Nordhaus proposed a method of building a time series VIIRS NTL and GPDs relationship, which is only limited to 2014-2016 (Chen and Nordhaus, 2019). Therefore, some studies focus on extending NTL data by using both DMSP and VIIRS NTL data to obtain a longer time series NTL dataset. Zhu et al. (2017) employed a power function by using NPP-VIIRS NTL intensity within each local region area in China to generate simulated DMSP NTL intensity to construct a time series dataset from 1992-2015 (Zhu et al., 2017). A similar study was conducted by . The author applied the power function from DMP annual data and VIIRS monthly data to internal calibrate this dataset to analysis Syria's regional armed conflict . While these methods highly rely on the selected training sample and are limited to a regional scale. Zhao et al. (2019) applied a sigmoid function model to construct DMSP NTL liked data from 1992-2018 in Southeast Asia.  proposed a globe DMSP liked dataset using a stepwise calibration method to build a harmonized long time series NTL dataset across the world . Most of these studies proposed methods either focus on a regional scale or simulated a DMSP-like dataset which wastes the higher spatial and radiometric resolution of VIIRS NTL. Therefore, simulating a global extent time series of VIIRS liked NTL data could contribute to further analysis for all applications mentioned before.
Since NTL radiance value highly depends on human settlements and activities which corelate with vegetation distribution and urban structures, some studies proposed methods to cross-sensor calibrate the DMSP and VIIRS NTL by fusion vegetation indices data. Liu et al. (2019) proposed a vegetation adjusted NTL urban index (VANUI) based on the inverse correlation between vegetation and urban surfaces . The results had improved urban extent extraction by improving the DMSP dataset. Zhou et al. (2015) presented an improvement index named enhanced vegetation index adjusted NTL index (EANTLI), and EANTLI's similarity to VIIRS NTL data is consistently higher than VANUI's similarity to VIIRS NTL data (Zhuo et al., 2015).
With deep learning techniques widely used in remote sensing, researchers started applying these techniques to image processing issues such as denoising, super-resolution, image fusion, and registration (Ma et al., 2019). Downscaling the lower spatial resolution DMSP can be treated as a single image superresolution problem. The single image super-resolution is by giving a low-spatial-resolution (LR) image, and the SR algorithm accurately estimates a high-spatial-resolution (HR) image (Yang et al., 2015). In 2014, Dong et al. first proposed a convolutional neural network based super-resolution method (SRCNN) (Dong, et al. 2014), neural network has been widely adopted for superresolution techniques. Since then, many deep learning-based super-resolution networks have been introduced, such as FSRCNN (Dong et al., 2016), SRGAN (Ledig et al., 2017), and RCAN (Zhang et al., 2018). These methods showed the promising capability of deep learning models on the image downscaling problem. However, improving DMSP NTL data problems cannot be treated as a simple super-resolution problem without other auxiliary data. It is not only downscaling the spatial resolution but also needs to enhance the radiometric resolution, which also has been verified by our prior experiment.
To improve the quality of DMSP data in both spatial and radiometric resolution to the VIIRS-like DMSP datasets more supplementary data are required. Vandal et al. (2017) introduced the DeepSD model, a stacked SRCNN framework with auxiliary data, to statistical downscale the daily precipitation data. By integrating the DEM data into the training process, the deep learning method performed better than traditional methods of precipitation data downscaling (Vandal et al., 2017). Zhang et al. (2021) proposed a spatial and spectral reconstruction network (SSR-NET) to reconstruct an HR hyperspectral image by fusing an LR hyperspectral image and its corresponding HR multispectral image . The experiment showed that SSE-NET can deliver satisfying results in terms of super resolve the spatial and spectral resolution of satellite imagery. Li et al. (2022) presented a CNN-based relative radiometric calibration method to acquire consistent satellite images from different sensors, especially for the ones from long-term time series (Li et al., 2021). This CNN-based regression model performs quite outstanding to other methods (Li et al., 2021). These studies demonstrated that the deep learning-based model has the capability of enhancing the resolutions in different perspectives.
Based on the previous studies, vegetation indices show a strong correlation with NTL radiance distribution. Thus, Chen et al. (2021) proposed a learning-based method to extend the VIIRS time series from DMSP data by cross-sensor calibration (Chen et al., 2021). They put the EANTLI into an auto-encoder-based convolutional neural network (CNN) to simulate VIIRS-liked DMSP NTL data. The results of this method are satisfactory in terms of having a good spatial pattern and temporal consistency at the pixel and city level (Chen et al., 2021). However, this method needs to calculate EANTLI before using it as model input, which inspired us to directly use EVI and DMSP NTL as the input to train a deep learning-based model without additional processing.
This study will present a deep learning-based model based on the Self-Supervised Sparse-to-Dense networks (SSSD) (Ma et al., 2019) to cross-sensor calibrate DMSP data and simulate a VIIRSlike higher spatial and radiometric resolution NTL data to support extending the annual NTL time series. The vegetation index and DMSP data are directly input to the CNN based model. Compared to other simulation or enhancement methods, our model is easier to apply without additional image pre-processing. The rest of this paper is structured as follows. The second section will introduce the details of the used dataset, experiment environment, and the methods of this study. Section 3 will present the experimental results, evaluation, and discussions. The last section will have a summarized conclusion of this paper.

Materials
This section is going to introduce all datasets used and related preprocessing, the methods of this study, and the evaluation metrics.   (Elvidge et al., 2021). The geographical coordinate system of both datasets is WGS84 (EPSG 4326). The annual DMSP and VIIRS NTL product have an approximate 1 km and 500 m spatial resolution at the equator, respectively, as shown in Figure 1. It is clearly to be noticed from Figure 1 that the DMSP NTL data has severe saturation and blooming problems as mentioned in the previous section. While the VIIRS NTL provided more urban structural details.

MODIS Vegetation Indices Dataset:
The MODIS Terra MOD13A1 V6 16-Day Global 500 m products were used to provide the EVI value on a per-pixel basis (Didan et al., 2015). This EVI layer minimizes the variety of canopy backgrounds, keeps the sensitivity over dense vegetation, and removes residual atmosphere contamination. The MOD13A1 products were produced from the atmospherically corrected surface reflectance data. To alleviate the sensitivity to the seasonal and interannual fluctuations, the annual average EVI datasets were produced from MOD13A1 using the Google Earth Engine (GEE) platform. The annual average EVI datasets were reprojected to EPSG 4326 to correspond with the NTL datasets, and the no data values were assigned as nan.

Preparing Training Samples:
To future train the deep learning-based model, we preprocessed the two datasets into training samples. Firstly, the method of this study requires the input data to have the same spatial resolution as the target data, so we first resampled the DMSP NTL to 500 m. Since there is no improvement in the prediction accuracy of the input resampled imagery by different interpolated methods such as nearest neighbour and bilinear interpolation (Ma et al., 2019), in this paper, we first resample the DMSP NTL by an easily implemented method, cubic resampling to match the input 500 m spatial resolution. Secondly, we aligned and retiled the global coverage NTL datasets and EVI data into 256 by 256 pixels paired data samples. Lastly, we remove the outliers in the data samples by several strategies. VIIRS sensor detects the light radiance from 1 nW/cm 2 /sr, and by examining the histogram of radiance value, there are only few sites with radiances over 1000 nW/cm 2 /sr across the world. Thus, to avoid the influence of skewed large values on the loss function, we assigned the pixel value as 1000 if the VIIRS NTL data value is greater than 1000. In addition, the ocean area usually has no radiance captured by the sensor, and the oceans account for a large proportion of the global area, which may cause the class imbalance problem during the training process. Therefore, we filtered out the data pairs which all the pixel values are 0 of the DMSP NTL or the VIIRS NTL. Similarly, all the pixel values are nan values of the EVI, we removed the data pairs. Finally, there are 8375 paired data samples created.

Methods
To simulate the VIIRS-liked DMSP NTL data, we formulate the problem as a deep regression learning problem. The network adopted in this study is SSSD, initially proposed for depth completion.
The modified model architecture is shown in Figure 2. The resampled 500 m DMSP and the EVI are processed by the initial convolutional block (Conv). Then a total of four residual (Res.) blocks of ResNet-34 were used as the encoder, which are sequentially increasing the filter size and downsampling the feature spatial resolutions. The decoder has a reversed structure with four transposed (Transp.) convolutional blocks. Output from each encoding layer is passed to the corresponding decoding layers by skip connections. Finally, the predicted VIIRS-liked DMSP will be produced with the same spatial resolution as the network input. Except for the last Conv, each Conv is followed by batch normalization and ReLU.
The pixel values of the input data, including the DMSP NTL, EVI and VIIRS NTL images, are scaled to 0-1 according to the possible minimum and maximum values. During prediction, to mitigate the blooming effect of DMSP NTL images, the nan values in the EVI images were used as a mask to mask out water bodies.

Figure 2.
The proposed deep regression network with DMSP and EVI as input. The dashed lines denote skip connections and circles denote concatenation of channels. F represents the number of channels.

Accuracy Evaluation Metrics
To conduct the quality evaluation of the simulated VIIRS-liked DMSP data, the Root Mean Square Error (RMSE) and the coefficient of determination R 2 between the results and VIIRS NTL data will be used (Chen et al., 2021;. Given the observed image (here denoted the VIIRS NTL imagery) and its corresponded predicted image (here denoted the VIIRS-liked DMSP imagery), the RMSE is defined as: where # = the radiance of the VIIRS NTL imagery # = the predicted radiance value of the VIIRS-liked NTL imagery N = the sample size The R 2 is defined as: where 4 :::::::::::::: = the mean value of the predicted radiance of the VIIRS-liked NTL imagery

Experiment Setting
Among the 8375 pairs of data samples, 98% were randomly selected as the training set, while the rest 2% was used as the validation set. To train the proposed deep regression network for simulating NTL data, the learning rate was initialized at 0.0001 and optimized by the ADAM optimizer. The learning rate was reduced by 10% if validation loss reached a plateau. The L2 loss was chosen to better reconstruct high-intensity areas.
The experiments were implemented in PyTorch 1.6 with Python 3.7 on a workstation with an Intel Core i9-9900K CPU @3.60GHz, 32GB RAM, and an NVIDIA RTX 2080Ti GPU, under Ubuntu 20.04.

Experimental Result
The deep regression network was trained using data pairs in 2013, and it was tested to reconstruct higher-resolution NTL images using DMSP NTL and EVI image pairs in 2012. Figure 3 illustrates the comparison between the resampled DMSP NTL data, VIIRS NTL data, and the predicted VIIRS-like NTL data in 2012 in four major metropolitan areas, including Cape Town, South Africa, the Greater Toronto Area (GTA), Canada, Los Angeles, USA and Shanghai (extended to the Yangtze River Delta area), China.
Compared with the DMSP NTL which suffers saturation and blooming effects, the reconstructed VIIRS-like NTL images resemble the higher-resolution VIIRS NTL images with better representations of urban spatial structures in all tested areas. The reconstructed NTL images have better spatial details of the blooming areas in DMSP data. For instance, the DMSP dataset of the GTA has a significantly blooming problem, especially in the shoreline area of Lake Ontario. However, the reconstructed VIIRS-liked DMSP data eliminated most blooming areas and delivered the spatial variations which highly corresponded to the original VIIRS imagery. Similarly, the Yangtze River Delta area consists of many towns and cities of different levels. In the DMSP image, due to high light radiance, the whole region is connected without spatial details. But our predicted VIIRS-liked DMSP demonstrated the clearer different hierarchic urban structures in this area. In addition, some main road networks were successfully reconstructed. Thus, it demonstrates that the fusion of lower-resolution DMSP NTL with higher-resolution EVI images could significantly improve the interpretability of the DMSP NTL images in saturated regions, with the potential of reconstructing higher-resolution NTL images from historical records to extend the NTL time series.  Quantitative comparisons between the reconstructed VIIRS-like NTL images and the true VIIRS NTL images in the four sample The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France metropolitan areas are shown in Figure 4. In the scatter plots, each dot represents one corresponding pixel in the prediction and the ground truth, and the green line represents the 1:1 line. The overall RMSE is relatively low considering the total range of data, and the R 2 values showed that the points generally follow the 1:1 line. The GTA has the highest accuracy with R 2 0.76, and Cape Town followed with R 2 0.74. Shanghai and Los Angeles have the accuracy of R 2 0.68 and 0.64, respectively. In particular, the RMSE of Cape Town, the GTA, and the Yangtze River Delta area (Shanghai) are very close from 3.96 to 4.84. In contrast, the RMSE of Los Angeles is relatively large at 9.55. There are many scatter points alone with the 0 value of VIIRS imagery significantly not predicted well in this case. Meanwhile, in Figure  3 and Figure 4, the predicted values do not reconstruct the high radiance values well, with maximum radiances significantly smaller than the actual VIIRS NTL images.

Discussions
The predicted VIIRS-liked DMSP imagery evaluations showed that it is possible to directly use DMSP NTL data and EVI to simulate the VIIRS-liked NTL data. However, there are still have some places that need to be further investigated. Firstly, as Figure 5 shown, the blue rectangle area has a high radiance value, while the predicted NTL data was not shown the high radiance of light. By further examining this blue rectangle area, the underestimated area is where the California State Prison, Los Angeles County located. This facility may produce more light than other similar land cover facilities. Thus, the prediction failed to estimate. On the contrary, the red rectangle areas were overestimated light radiance. Most of these areas are in the subrural and rural area of Los Angeles. The landscape of these areas is human settlements within the area covered by barren soil and desert, and less vegetation covered. Since this study only used EVI as the supplementary data to estimate VIIRS-liked DMSP NTL data, it may cause this type of error. The land cover and land used data may improve the under-and overestimated issues.  Secondly, skewed data distribution has been observed in both the DMSP and the VIIRS NTL datasets, due to most pixels do not have valid radiance values. Over 92% and 98% of pixels are observed in the DMSP and the VIIRS NTL datasets, respectively.
Moreover, high radiance values are seldomly observed, so the intensity distributions are still significantly skewed even with low values removed, especially in the VIIRS NTL dataset as observed in Figure 6. The skewed distribution may significantly constrain the performance of the deep regression network. Further improvements to the network can be made in future work to better address this issue, including data normalization with log scaling, and masked losses.
In addition, although using the higher-resolution EVI images could significantly improve the spatial details, such as road networks, of DMSP NTL data in the saturated regions, while it still cannot provide as clear as VIIRS NTL data in the urban area.
Considering the light radiance is highly corresponding to the development of human settlement status, here we suggest that in the future it is worthy of trying using new supporting datasets such as MODIS land surface reflectance data or other optical daytime datasets to improve the spatial details of the VIIRS-liked DMSP NTL data.

CONCLUSIONS
This study proposed a deep regression model that simulated a higher spatial and radiometric resolution VIIRS-liked NTL data by inputting a lower resolution DMSP NTL and corresponded EVI data. The results demonstrated that the reconstructed VIIRSliked DMSP NTL data have similar spatial patterns as the original VIIRS NTL data. Further works can be conducted by integrated more supplementary datasets and a refined network. Meanwhile, it is worth adding more pre-and post-processing of the whole framework to produce more accurate VIIRS-liked DMSP NTL data. The extended time series VIIRS-liked NTL data could be used to monitor urbanization and the socioeconomic related applications.