A MULTI-SCALE SEGMENTATION APPROACH TO FILLING GAPS IN LANDSAT ETM+ SLC-OFF IMAGES THROUGH PIXEL WEIGHTING

Monitoring changes on Earth’s surface is a difficult task commonly performed using multi-spectral remote sensing images. The absence of surface information in optical images due to the presence of cloud, low temporal resolution and sensors defects interfere in analyses. In this context, we present an approach for filling gaps in imagery mainly caused by small clouds and sensor defects. Our method consists of an adaptation from an existing method that uses spatial context of close-in-time images through the use of the most frequent value obtained using multiscale segmentation. Our method uses the pixel proportion contained in each segment to fill missing values. We applied the gap-filling methodology on three dates containing simulated images from Landsat7 using Landsat8 images. We validated the method by introducing and filling artificial gaps, and comparing the original data with model predictions. The developed approach surpassed Maxwell et al. (2007) gap-filling method for all bands, presenting a minimal R of 0.78. Our method proved to enhance the Maxwell et al. (2007) gap-filling method, while also asymptotically maintaining the algorithm cost. It also allowed image texture to be conserved on reconstructed images. This characteristic enables narrow features, e.g., as roads, riparian areas, and small streams capable of being detected on the filled images. Based on that, further object-based approaches can be used on images filled using this methodology, demonstrating its capacity to estimate Earth’s surface data.


INTRODUCTION
Satellite remote sensing instruments are one of the most valuable tools available for understanding and monitoring changes occurring in land, coastal and oceanic ecosystems. Several remote sensing applications, such as crop mapping and agricultural monitoring, need a great amount of images acquired in different times and free of clouds to map dynamic changes (Kuenzer et al., 2015, Bendini et al., 2017. At the present, Landsat program consists in the most consistent historical source used when mapping Earth's surface (Wulder et al., 2012). Nowadays, Landsat-8 and Landsat-7 are active and acquiring images. However, Landsat-7, began to present hardware failure on Enhanced Thematic Plus (ETM+) Scan Line Corrector (SLC) in May 31, 2003 (Markham et al., 2004). This problem resulted acquisition loss for certain regions of the images. Images collected before such failures are called SLC-on images, and afterwards they are referred as SLC-off. This missing sections, also called gaps, correspond to a single pixel near the center of a path/row image and can reach 14 pixels in width near image borders (Storey et al., 2005), as illustrated in Figure 1.
Considering the possibility of data absence, it is important to develop tools capable of reconstructing Earth Observation (EO) data, removing null observations and estimating close-to-reality values to fill the gaps. Reconstruction of EO data to obtain images without gaps, remains a challenge, nevertheless some studies have been developed to solve these problems (Vuolo et al., 2017, Maxwell, 2004, Maxwell et al., 2007. Considering Tobler's first law of geography which establishes that near things are more related to each other than distant things (Tobler, 1970), interpolating Landsat-7 SLC-off missing data would fill the gaps with neighboring pixel values. This procedure may be acceptable for small gaps, however it produces less satisfactory results for large gaps. Following that, (Maxwell, 2004) developed a segmentation approach to fill Landsat-7 SLC-off miss-ing data, pairing it with a reference image that does not contains gaps, based on three principles: • adjacent pixels are more likely to be similar; • groups of the same landscape unit are likely to have similar spectral values; • most landscapes remain constant for certain periods.
This approach consists in using a segmentation to delimit homogeneous regions, in which pixels are similar. This segmentation is acquired using a close-in-time reference image from a source that does not suffer from the gaps, e. g. a cloud-free image from Landsat-5/TM (Maxwell, 2004) or Landsat-8/OLI (Marujo et al., 2017b). Once the reference image segmentation is obtained, it is overlaid on the image which contain the gaps, the target image, and each gap area is replaced by the segment mode value, calculated using the values from the target image.
The mentioned gap-filling through segmentation has been improved further by using several segmentation levels instead of just one (Maxwell et al., 2007). The method employed three hierarchical segmentation levels and a nearest neighbor approach in case the third level isn't enough. In this approach, if a segment is fully contained within a gap, a segmentation level with larger area segments is used to fill the gap instead and so on with the third segmentation level (Maxwell et al., 2007). This approach is exemplified in Figure 2. The process is similar to (Maxwell, 2004), being the difference the number of segmentation levels.
Considering that nowadays Landsat-7/ETM+, Landsat-8/OLI and similar sensors, e. g. CBERS-4/MUX are in operation and also considering that these sensors present similar spectral bands (visible and near-infrared) and similar spatial resolution (30 and 20m), with a minimal spectral alignment and spatial resampling, they can be used to build a virtual constellation and increase the number of EO (Marujo et al., 2017a), helping gap-fill Landsat-7/ETM+ data. Within this perspective, the objective of this work is to propose a procedure to reconstruct Landsat-7/ETM+ gaps based on a multi-scale segmentation approach that considers pixel weights, to maintain image texture. The method uses non-gapped images of close-in-time-dates as reference to obtain homogeneous regions through segmentation. For each segment an internal distribution of the pixel values is calculated and the segments are overlaid on the gapped image. The filling values are then extracted from the gapped image, from within the segments and the pixel distribution is applied. To validate the method, artificial gaps were simulated in images and the filled result was compared to the reference.

Datasets
In this study, Landsat-7/ETM+ SLC-off effect was simulated using Landsat-8/OLI images. Landsat-8/OLI images were acquired, Figure 2: (a) a target image that contains gaps, in this case a Landsat-7/ETM+ SLC-off, (b) a reference image, in this case a Landsat-8/OLI acquired in a close date; (c) the reference image is segmented; (d) the segmentation is overlayed in the target image; (e) the mode value within the segment is calculated and used to fill missing data; (f) resulted synthetic image.
as shown in Table 1. The images were selected considering their overlap areas, which enables images from the same sensor with a difference between 7-9 days. The images were obtained as surface reflectance products, processed by Center Science Processing Architecture (ESPA) (UNITED STATES GEOLOGICAL SURVEY -USGS, 2017) using the Landsat 8 surface reflectance code (LaSRC), (UNITED STATES GEOLOGICAL SURVEY -USGS, 2019).
The SLC-off effect was simulated using a Landsat-7/ETM+ image from the same orbits (Path/Row 219/075 and Path/Row 220/075) by replicating gap pixels to the Landsat-8/OLI images. These areas were stored for further use as validation. Since the objective of this paper was to evaluate the feasibility of the proposed gap-filling algorithm, data from Landsat-7/ETM+ and Landsat-8/OLI was not used combined due to the need of harmonize the data from the different sensors (Bendini et al., 2017, Marujo et al., 2017a, which means that the only spectral information being used is derived from Landsat-8/OLI with synthetic SLC-off effect.

Methodology
As mentioned, our proposed algorithm, is based on Maxwell et al. (2007). Both algorithms are similar in most part of their processing, being expected that the algorithm's asymptotic cost is maintained since only linear operations are added to calculate the pixel weighting.
As input, the algorithm requires one gapped image, here called target, one non-gapped image, here called reference, and three segmentations, containing segments of three different scales, obtained by a hierarchical segmentation procedure. Considering a bottom-up approach, the method starts using the segmentation level with smaller segments, here called level-1 segmentation. For each segment that contains a gap, the segment mean value is calculated using the reference image values. Also on the reference image, a weighting factor is calculated for each pixel within the segment, which is the pixel value divided by the segment mean. This pixel weight is then used on the target image, in which gap values are estimated as the segment mean value of the target image multiplied by the pixel weight.
The validation procedure was performed by comparing the gapfilled values and the original values, which were initially removed from the image. The Pearson Correlation Coefficient (R 2 ), Root Mean Square Deviation (RM SE), Mean Absolute Error (M AE), Universal Image Quality Index (U IQI) (Wang and Bovik, 2002) and Visual Information Fidelity (V IF ) (Sheikh and Bovik, 2006) were used to compare the results. The Pearson Correlation Coefficient indicates how the results are correlated with the expected results (the reference). However correlation does not imply causality. Based on that the visual indices were used to evaluate the results and the RMSE and MAE were used to measure the errors. RMSE and MAE both measure the results associated error, being MAE less affected by outlier values (Chai and Draxler, 2014). UIQI and VIF both measure compare the result image with the reference and are indicators of image quality. However, VIF is capable of detecting changes more similarly to the human eye when compared to UIQI (Sheikh and Bovik, 2006), while UIQI is more susceptible to be masked by the image areas that haven't changed.

RESULTS
In order to gap-fill the Landsat SLC-off images, reference segmentations were needed. Based on that, using a hierarchical segmentation, which is required for the proposed method, a multiscale segmentation (Baatz and Schäpe, 2000) algorithm was adopted and several segmentation parameters were tested. The scale parameter (the algorithm merge criteria) was empirically adjusted to 50, 100, 200, respectively for levels 1, 2 and 3, whereas the compactness and shape parameters were maintained in 0.5 (which implies 0.5 for smoothness) and 0.1 (which implies in 0.9 for color), respectively. These parameters were adopted for all segmentation levels and were estimated by visual interpretation, since the human eye is a strong and experienced evaluator of segmentation techniques (Gamanya et al., 2007).
A common effect through (Maxwell et al., 2007) gap-filling approach is the loss of narrow features such as roads, riparian areas, and small streams. In our method these features remain detectable, since texture is maintained. This is mainly due to the pixel weighting step. In larger gaps, like the ones found closer to the Landsat Path/Row borders, (Maxwell et al., 2007) approach fills the areas as homogeneous regions due to the repetition of the same spectral value across missing values, while in our method the filled areas look visually more realistic due to the textured filling. These visual differences can be seen in Figure 3, which illustrates a target Landsat image, presenting SLC-off gaps (upper image) and the same image filled by Maxwell et al. (2007) (central image) and by our method (bottom image).
Filling small gaps with an homogeneous value may be unnoticed in homogeneous target, as the central pivot at the bottom right corner of 3. However, in heterogeneous areas, e.g. the gallery forests and mixed fields, the single value gap-filling may not represent the reality distribution of that area, which may imply affect analysis depending on the adopted strategy. Although it is also an estimation of a non-observed area, gap-filling using the weighting pixel strategy conserves the texture, which may be a valid option for certain methods, e. g. Geographic Object-Based Image Analysis (GEOBIA).
To quantitatively evaluate both methods, Table 2 and Table 3 shows the agreement between (Maxwell et al. 2007) and our proposed method when comparing the results to a reference. The R 2 , RMSE, MAE, UIQI and VIF were used to evaluate the results.   For all spectral bands, the R 2 obtained using our method presented was more correlated to the original images than the R 2 obtained when using the original method, being the least R 2 obtained 0.78. Based on the same approach can also be noted that the residues RMSE and MAE are lower using our method. Considering that our data is scaled from 0 to 10,000, as provided by (UNITED STATES GEOLOGICAL SURVEY -USGS, 2017), the worse RMSE was obtained at the SWIR 1 band presenting values of 344.98 and 283.99, for Maxwell et al. (2007) gap-filling method and our proposed adaptation, while MAE worse results were obtained for the NIR band, presenting 199.97 and 171.55 respectively for both methods. As stated by Chai et al. (2004), MAE presented lower values than RMSE due to this index deal better with outliers.

Band
Analysing the image indices, UIQI and VIF, UIQI was close to 1 (0.99) at all image bands. This effect was due to UIQI been affected by the values that have not changed from the reference and the filled images. Based on that, the VIF describe better the visual differences between the images (Sheikh and Bovik, 2006). Based on the obtained VIF values, the gap-filled image quality assessment is closer to the reference when using our method to fill the gaps rather than the original proposition, since for all bands the VIF presented better values when using the proposed approach.
In Table 2 and Table 3 is also possible to note that the blue and green spectral bands presented lower correlation results when compared to the other spectral bands. The almost crescent correlation results occurred probably due to atmospheric interference in shorter wavelength bands, also known as Rayleigh Scattering (Jensen, 2007), that is not completely suppressed even with atmosphere correction.
Another characteristic that we noticed is that the algorithm's asymp-totic cost was maintained since only linear operations were added to calculate the pixel weighting. We tested the algorithm performance using a 16GB of memory ram and an Intel processor i7-6500U of 2.5GHz computer. Figure 4 shows the execution time for both algorithms using images ranging from 100x100 to 2000x2000 pixels. Implementations of the methods, in Python and R are being developed as an Earth Observation fill package and can be found in https://github.com/marujore/.

CONCLUSIONS
Our multiscale segmentation gap-filling method enhanced the original Maxwell et al. (2007) method, while asymptotically maintaining the algorithm cost. Our approach allowed image texture to be conserved on reconstructed images, allowing object based analysis to be further applied on Landsat-7/ETM+ images. Another improvement was that narrow features, e.g., roads, riparian areas, and small streams, remained detectable in reconstructed images.
Pragmatically, it is expected that the approach developed in this The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W11, 2020 PECORA 21/ISRSE 38 Joint Meeting, 6-11 October 2019, Baltimore, Maryland, USA Figure 4: Execution time of Maxwell et al. (2007) gap-filling algorithm (blue) and the proposed method (red) using images ranging from 100x100 to 2000x2000 pixels.
thesis can be used effectively in academic and industrial environments and henceforth contribute to improves overall multi-source harmonization and Earth observation data reconstruction. The developed codes are open source, free and published so it can be used or adapted, as is all its dependencies.