STRATEGIES FOR PS PROCESSING OF LARGE SENTINEL-1 DATASETS

Several advanced DInSAR techniques have been used to map surface deformations due to volcanism, active tectonics, landslides, subsidence, and uplift as well as to monitor the deformation of critical infrastructure such as bridges and dams. Recently, studies have explored the potential of these techniques to be integrated into a permanently operating monitoring system. ESA’s Sentinel-1 satellites have been providing SAR images for such a purpose since 2014. Nowadays, it is easy to access more than 230 SAR images of any area of interest, and update this dataset every six days with a new image. Due to the high frequency of the data acquisition, the question arises on how to best handle such a dataset. Is it suitable to always consider the whole available dataset or would a partial processing of the dataset and combining the results at a later point be more appropriate? To answer these questions, three different processing strategies are investigated in this paper. The first is a continuously growing dataset and for the second and third strategy, the dataset was divided into sub-stacks with and without overlap. In this study, the key parameters of each strategy are analyzed. In addition, the size of the sub-stacks is varied and the results are compared.


INTRODUCTION
In the last decades, numerous improvements and developments in the field of DInSAR techniques have been made to overcome its major limitations (i.e., the spatial and temporal decorrelation as well as the atmospheric phase delay), resulting in techniques such as persistent scatterer interferometry (PSI) (Ferretti et al., 2001, Hooper et al., 2004, small baseline subset approach (Berardino et al., 2002), and SqueeSAR T M (Ferretti et al., 2011). The advantage of these techniques is their capability to map displacement fields accurately down to a millimeter over a large area, providing a full picture of the surface deformation (Marinkovic et al., 2007). Numerous studies have employed these techniques to map or monitor either naturally occurring or human-induced surface deformations, such as landslides, uplift, subsidence, or the deformation of critical infrastructure (e.g. dams or bridges) (Berardino et al., 2003, Bonì et al., 2015, Bonì et al., 2018, Tomás et al., 2013, Sousa et al., 2013. More recent studies have explored the possibility of integrating these techniques into a permanently operating monitoring system. Their primary focus is on automatically identifying areas of active deformation, characterizing their behavior and creating alerts, if any changes in the behavior occur (Berti et al., 2013, Tomás et al., 2019, Kalia, 2018. A continuous stream of interferometric SAR images is essential for such applications. The European Space Agency's (ESA) Sentinel-1 satellites have been continuously mapping the surface of the earth since they were launched in April 2014 and April 2016. Their products are provided by the Copernicus initiative free of charge. Today, it is easy to access more than 230 SAR images of any area of interest and update this base dataset every six days with a new image. Due to this high frequency of the data acquisition, some questions concerning their PSI analysis arise. On the one hand, a large dataset is needed to accurately identify a set of persistent scatterers (PS) and correct their differential phase for unwanted contributions. Also, a larger dataset is needed for a full history * Corresponding author -E-Mail: madeline.evers@iosb.fraunhofer.de of the PS movements to accurately characterize their behavior. On the other hand, the likelihood of a pixel to be a PS drops with an increase in images. Also, if the PS density is too low, some areas of active deformation might not be identified or their extent is underestimated. Furthermore, to integrate this method into a permanently operating monitoring system a timely analysis of the data is needed. Thus, a smaller dataset is preferred. Hence, the question is, is there a processing strategy that can fulfill all these requirements? Is it acceptable to always consider the whole available dataset or would a partial processing of the dataset and a later combination of the results be more appropriate? To answer these questions, we investigated three different processing strategies. The first strategy is to always consider the entire available data set, while the second and third strategies only take into account a part of the full available dataset, which from now is referred to as a sub-stack. In all three cases the dataset is regularly updated and reprocessed. Thus, the PS analysis always provides information about the current situation. The parameters processing time, memory consumption, number of PS candidates (PSC), number of PS, PS density, percentage of PSC identified as PS, and number of PS common between sub-stacks are analyzed. The paper is structured as follows. The method PSI and the three processing strategies are described in Section 2. The study area and dataset are briefly described in Section 3. The results for each processing strategy are presented in Section 4. In Section 5, the results are compared concerning their performance and some conclusions are drawn.

Persistent Scatter Interferometry (PSI)
The response signal, a pixel in a SAR image depicts, is the coherent sum of the backscattered signal of each scatterer within the corresponding ground resolution cell. Taking into account the deviation in look angle and the possible relative movement of the individual scatterers, the constructive and destructive interference of their response signals leads to variations in the phase and amplitude of the corresponding signal. However, this effect is greatly reduced, if a single scatterer dominates the response signal of the ground resolution cell. These scatterers can be man-made structures such as the corner of a building or naturally occurring ones, for example a large rock. A pixel whose response signal is dominated by such a scatterer is called a persistent scatterer (PS). The goal of all PSI algorithms is to identify these pixels and to determine their relative movement over time. This concept was first presented in 2001 (Ferretti et al., 2001). However, we used the Standford Method for Persistent Scatterer (StaMPS) (Hooper et al., 2004).

Interferometric Pre-Processing
In order to use StaMPS, a stack of interferograms and a second stack with the corresponding single look complex (SLC) images were created. All the SLC images need to be coregistered to the same master. This was done using the software SNAP provided by ESA. The interferometric pre-processing of Sentinel-1 images for a PSI analysis consists of eight steps: (1) S-1 TOPS Split, (2) Apply-Orbit-File, (3) Back-Geocoding, (4) Enhanced-Spectral-Diversity, (5) S-1 TOPS Deburst, (6) Subset Creation, (7) Interferogram Generation and (8) Topographic Phase Removal. A Digital Elevation Model (DEM) is needed for steps 3 and 4, which are responsible for the coregistration of the slaves to the master. We used the freely available SRTM-3 DEM. The results of the interferometric pre-processing are the differential interferograms, the coregistered SLC images, the height information, and geocoordinates for each pixel. These serve as input for the StaMPS algorithm.

PSI Analysis
The PS selection of the StaMPS algorithm consists of two steps. In the first step, a set of PS candidates (PSC) is identified employing the amplitude dispersion DA (Hooper et al., 2004). This parameter is a good proxy for the phase standard deviation σΦ, in case of a constant signal and high signal to noise ratio. The amplitude dispersion is calculated for each pixel as the quotient of the mean and standard deviation of a time series of amplitude values (Ferretti et al., 2001). All pixels with DA ≤ 0.4 are chosen as PSC (Hooper et al., 2004). The second step is the phase analysis. The goal is to identify PSC, which distinguish themselves by having a low phase noise level. The algorithm uses the value γx as an indicator. It is similar to the magnitude of coherence, however, the amplitude is not used to calculate it: Here, ψint,x,i is the wrapped phase of pixel x in the i th topographically corrected interferogram, ψint,x,i is the estimate for the spatially correlated part of the phase contributions due to look angle error, atmospheric disturbances, orbit inaccuracies, and ground deformation. The non-spatially correlated part of the phase contribution due to look angle error is denoted by ψ nc θ,x,i , and N is the number of interferograms. The value γx is calculated for each pixel and all pixels with γx ≥ γ thres are considered to be PS. The threshold γ thres is chosen, so that the percentage of false positives is still acceptable (Hooper et al., 2004). We decided to use the default value, which is 20 %. After identifying the set of PS, their phase is corrected for the spatially uncorrelated look angle error and afterwards a 3D un-wrapping algorithm is used to derive the displacement for each interferogram and the corresponding mean velocity for the time series . The last two steps are the estimation of the spatially correlated look angle error and the atmospheric filtering . The final results are a map of the mean velocity in line-of-sight (LOS) of the satellite and a time series of the displacement of each PS relative to the master.

Different Scenarios
The motivation for this study is the recent push to utilize advanced DInSAR techniques for continuous monitoring systems. At the moment, in case of PSI, this would be done by adding new images to the dataset when they become available, and completely reprocess the entire dataset. However, this does not seem to be the best approach. The most important demands towards a PSI processing strategy are: (1) a high enough PS density to accurately estimate the extent and intensity of the deformation, (2) a high sampling rate of the mean velocity to detect short term variations and (3) a possibility for recombining the sub-stack results. In this study, three different processing strategies were tested. The strategies are described in detail in the following paragraphs. The pre-processing steps are not further considered in this study because the amount of time needed to pre-process a single image does not change with the total number of images used.

Approach A:
The first approach represents the conventionally used procedure. As a first step a base dataset of 28 images covering half a year was processed. This base dataset was then updated 13 times with all available images acquired within the time period of two months (i.e, 7 to 10 new images) and entirely reprocessed. For each sub-stack the same master scene as in the base dataset was used. The main questions for this approach are: 1. How does the number of initially chosen PSCs and the number of used images affect the processing time?
2. How does the PS density change with an increasing number of images?
3. How does the ratio between PSC and PS change with an increasing number of images?

Approach B:
For the second approach, we decided to process consecutive sub-stacks instead of adding new images to a base dataset. These sub-stacks did not overlap in time, i.e., as soon as enough new images were available, they were used to perform a PSI analysis independently from the previous sub-stack. The master was chosen individually for each sub-stack. To analyze the impact of the sub-stack size, we analyzed sub-stacks with sizes of 20, 30 and 40 images. The main questions here were: 1. What is the processing time of this approach?
2. How many PSC are common to the sub-stacks and how many are particular to one of them?
3. Does the size of the sub-stack influence the number of PS common to the sub-stacks?
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2020, 2020 XXIV ISPRS Congress (2020 edition)

Approach C:
The third approach, is similar to Approach B in the sense that we also processed sub-stacks. However, in this case the substacks did overlap in time. The procedure can be best described as a sliding window approach. Whenever a new image became available, it was used to create a new interferogram. The oldest interferogram of the current sub-stack was then exchanged for the new one and the sub-stack was reprocessed. In this case, we also varied the size of the sub-stacks and processed substacks with 20, 30, and 40 images. Here, we processed on the following questions: 1. What is the processing time of this approach?
2. What changes does the set of PSC undergo after exchanging one image?
3. Does the size of the sub-stack influence the changes the set of PSC undergoes after exchanging one image?
Furthermore, we compared the three approaches, weighted their advantages and disadvantages, and chose the approach, which best serves our purpose.

STUDY AREA & DATASET
The different processing strategies were applied to a dataset covering an area of roughly 2,344 km 2 on the northwestern tip of the Peloponnese Peninsula (Greece). The area of interest is framed by the Erymanthos and Panachaiko mountains to the East and South as well as the Mediterranean Sea to the West and North. The topography of the area varies from flat at the shoreline to hilly terrain with steep slopes in the East and South. The area consists of non-urban as well as urban parts. The expected ground deformations for this area are landslides in the hilly terrain (Del  and vertical as well as horizontal displacement alongside known active faults within the area (Sakkas et al., 2018). The SAR dataset used for this study consists of 147 images. The images were recorded by the Sentinel-1 satellites and provided by the Copernicus Initiative. The images were recorded in descending acquisition geometry and with the standard acquisition mode Interferometric Wide Swath. They cover the time period from 10/06/2016 to 07/23/2019. The study area and the data set are described in more detail in (Evers et al., 2019). Figure 1 shows an exemplary SLC image of the dataset. Due to the size of the scene, we chose a subset. The subset is marked with a red rectangle. While Approach A was used to process the entire dataset, only part of the dataset was processed using Approaches B and C. The long processing time of a PSI analysis limited the number of sub-stacks processed for this study. For this reason, some key parameters needed to be extrapolated. Figure 2 illustrates the segmentation of the dataset into sub-stacks for each of the three approaches. In case of Approach A (blue bars), the dataset was divided into 14 sub-stacks. Each sub-stack was used to perform a PSI analysis. In case of Approaches B (green bars) and C (red bars), 9 sub-stacks each were used to perform a PSI analysis. The sub-stacks were processed in groups of three consecutive sub-stacks. The sub-stacks of one group had the same size. The number next to each bar signifies either the number of used images for the respective process (Approach A) or the size of the three consecutive sub-stacks (Approaches B and C).

RESULTS
In this section, the results of each approach are analyzed individually. First, the curve progression of each parameter regarding an increasing number of images is analyzed. Second, the velocity maps are analyzed visually. Afterwards, a general recommendation for each approach is given.

Approach A:
Processing the entire dataset with Approach A leads to the values summarized in Table 1. The memory consumption increases linearly with an increase in the number of images. The base dataset of 28 images needs 40 GB and after the last update (i.e., 147 images) the dataset needs 147 GB. The parameters listed in Table 1 are: the number of identified PSC NP SC , the number of PS NP S , the quotient Q of NP S and NP SC , the PS density ρP S and the processing time TP . The parameters NP SC , NP S , ρP S , and TP decrease with an increasing number of images. The quotient Q, which signifies the percentage of PSC identified as PS, increases the longer the time series is. The last parameter ρP S drops from 823 P S km 2 for the base dataset to 140 P S km 2 after the last update.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2020, 2020 XXIV ISPRS Congress (2020 edition) One initial question concerning the processing of large datasets was, how the number of images and the number of PSC influence processing time. Figure 3 illustrates their relationship. Both, TP (black) and NP SC (red), follow the same trend. At first, both drop significantly, while the number of images increases continuously. The processing time then stabilizes around 22 hours. The number of PSC continues to decrease, however, at a much smaller rate. The outlier in the TP graph is most likely due to background process of the server, which we used to conduct this study. The similar curve progression of TP and NP SC indicates, that NP SC has more influence on the processing time than the number of images. The second question relates to the percentage of initially selected PSC, which are actually identified as PS. The graphs of NP SC (blue) and NP S (red) follow a similar trend as shown in Figure 4. Though, as expected there is an offset between them. The number of selected PSC is usually higher than the number of identified PS. This offset, however, decreases with an increasing number of images. This is also reflected in the black graph of Figure 4, which illustrates the percentage Q of PSC actually identified as PS. Q increases with increasing number of images, as is shown in Figure 4. For the base dataset, only 56 % of the PSC turned out to actually be PS and after the last update, the percentage has increased to 95 %. Therefore, the conclusion can be made that the longer the time series is, the higher is the probability that a PSC is identified as a PS. Though a drawback of longer time series is that short term deformations are not noticeable anymore. In addition to the analysis of the numeric values, a visual comparison of the results was performed. The analysis of the spatial distribution and local variations is especially important concerning the PS density. The PS dens-  ity is a crucial value for the detection of areas of active ground deformation. However, the average density calculated for the entire scene does not provide information concerning the spatial distribution of the PS. The PS density might be high enough in one area of the scene, but not in another one. This becomes visible in Figure 5. The figure shows two mean velocity maps of the same area. The only difference between the maps is the size of the sub-stacks used to generate them. Stack I (left map) considers 99 images and Stack II (right map) 147 images. In general, both mean velocity maps look similar. However, a closer look shows that the PS density differs. Particularly in the area marked with the red rectangle the PS density is lower for Stack II than for Stack I. The blue clusters of PS in this area mark active landslides in the rural parts of the study area. Their extent and maximum velocity are underestimated in case of Stack II. If the PS density decreases further in this area, they would probably not even be identified anymore. The black rectangle marks a part of the study area located in the city of Patras. The difference in PS density is less noticeable in this area. However, it is visible that less noise is superimposed on the mean velocities map of Stack II. It is necessary to balance between the PS density and the superimposed noise to identify the optimal sub-stack size for this scene (taking into account image repetition time and type of ground deformation). The PS density needs to be high to detect the ground deformation of interest, while the noise superimposed on the mean velocities needs to be minimized. The optimal sub-stack size for this scene is between 80 and 100 images. The processing time stabilizes around 22 hours for a sub-stack size of more than 70 images and therefore is not a limiting factor. This recommendation is tailored for this dataset. In the case of another dataset with other types of deformation, the parameters would need to be readjusted. Nevertheless, it can generally be said that a sub-stack size can be too large to detect certain types of deformation accurately.

Approach B:
In case of Approach B, memory consumption exhibits different values for the same sub-stack size, e.g., sub-stacks with 30 images need between 38 GB and 45.7 GB with an average of 41.4 GB of hard disk space. Sub-stacks with 20 and 40 images need an average of 41.2 GB and 46.3 GB, respectively. The parameters listed in Table 2 are the same as listed in Table 1. Each one of these parameters varies significantly for sub-stacks of the same size. For example, the number of selected PSC varies, in case of a sub-stack size of 30 images, between 4,232,665 and 2,794,999. Additionally, the number of selected PS varies between 2,770,730 and 1,361,120. The processing time varies between 41 hours and 124 hours with an average of 75 hours. The average processing time for a sub-stack size of 20 and 40 images is 188 hours and 50 hours, respectively.
Further, the number of PSC identified in one sub-stack and the number of PSC identified in multiple sub-stacks was a concern. Table 3 provides information concerning this. For sub-stacks with 40 images no PSC was identified in more than one substack. For sub-stacks with 20 and 30 images 3 PSC were identified in two sub-stacks, but none were identified in all three Figure 6. The PSC, which are identified in one subset (dark blue), in two (petrol) or in three (yellow) subsets, are featured in the maps.
There is a map for each subset size, starting with the size 20 images on the left side, 30 images in the middle and 40 images on the right side (Approach C).
sequential sub-stacks. The lack of PSC identified in more than one sub-stack is most likely due to using different masters for the sequential sub-stacks. The change in master has the consequence that the localization of the PSC is slightly different. This would make a recombination of the sub-stack results difficult. A solution could be to coregister all slaves to the same master, which is from now on referred to as a super-master. The interferogram generation would then be performed with a different master, which can be interchangeable. Another possible solution could be to interpolate the results. In that case, areas of interest would need to be analyzed instead of individual PSC or PS.
An additional drawback of Approach B is the sampling rate of the mean velocity. Depending on the sub-stack size, a new value for the mean velocity is calculated every 120, 180, or 240 days. Short term variations of the mean velocity are not detectable.
In order to provide a high sampling rate of the mean velocity to detect short term variations, a small sub-stack size is preferred.
However, this would lead to a higher noise level and longer processing time. In addition, the mean velocity maps would either need to be interpolated for each sub-stack or all slaves would need to be coregistered to the same master. This would increase the processing time further.
40.6 GB and 47.5 GB, respectively. In addition to the key parameters, the number of PSC identified in one sub-stack and the number of PSC identified in multiple sub-stacks was a concern. In Figure 6 three maps of the same area are shown, which correspond to a different sub-stack size (i.e., 20, 30 and 40 images). All PSC that are identified in one sub-stack are featured in the corresponding map. The color indicates, if the PSC is identified in one sub-stack (dark blue), in two (petrol) or three sub-stacks (yellow). Figure 6 illustrates that the portion of PSC identified in one or two sub-stacks decreases with an increasing sub-stack size. For the sub-stack size of 20 images 20 % of the PSC are only identified in one of the sub-stacks, 17 % are identified in two sub-stacks, and 58 % in three sub-stacks. In case the substack size is 30 images, 18 % of the PSC are only identified in one of the sub-stacks, 14 % are identified in two sub-stacks, and 68 % in three sub-stacks. And for a sub-stack size of 40 images 15 % of the PSC are only identified in one of the sub-stacks, 11 % are identified in two sub-stacks, and 74 % in three sub-stacks.
In case of Approach C, we did not vary the master for each substack. However, it would be necessary to change the master, if the master scene is no longer included within the sub-stack.
In that case, PSC common between the sub-stacks would most likely not be identified. A solution would be to either use a super-master or interpolate the results. However, the interpolation would need to be performed less frequently than in case of Approach B. Further, Approach C provides a high sampling rate of the mean velocity. A new value is calculated every 6 days.
In case of Approach C, a larger sub-stack size is preferential, because the same master could be used for a longer time period and the processing time would be less than in case of a smaller sub-stack. The noise level would also be lower in case of a larger sub-stack.

DISCUSSION & CONCLUSION
The recent advances in integrating advanced DInSAR techniques into permanently operating monitoring systems (Berti et al., 2013, Tomás et al., 2019, Kalia, 2018 were the motivation for this study. We wanted to identify a processing strategy, which best serves the purpose of continuously monitoring active deformation areas employing PSI. Therefore, at the beginning of this paper we asked the question, if it is a better approach to always consider the whole available dataset or to process sub-stacks and later combine the results. The most important demands towards a PSI processing strategy are: (1) a high enough PS density to accurately estimate the extent and intensity of the deformation, (2) a high sampling rate of the mean velocity to detect short term variations and (3) a possibility for recombining the sub-stack results.
Using Approach A showed, that the average PS density decreases with an increasing number of images. However, the average value calculated for the entire scene does not provide information concerning the spatial distribution of the PS. Figure 5 illustrates, how important the spatial distribution is to detect areas of active deformation. The differences between the two maps are particularly given in the areas of the black and red rectangles. While in the urban area (black rectangle) of the mean velocity map the differences in PS density are not visible, the map of Stack II shows less noise superimposed on the mean velocities compared to the results from Stack I. In contrast, the differences in PS density are visible in the non-urban area (red rectangle). The area is located about 12 km southeast of Patras and is characterized by a hilly terrain with steep slopes. This is an area known to be affected by landslides and in case of Stack I, these landslides are visible as clusters of dark blue colored PS. In case of Stack II, they are not as clearly visible. In addition, their extent and maximum velocity are underestimated. Thus, it would be preferential to always use the same number of images (exact number depends on the scene, image repetition time and type of deformation to be observed) to achieve a consistent PS density and distribution. Table 2 indicates that a change of master leads to a lack of PSC, and thus PS, which are identified in all sequential substacks. The change in master occurs once the original master is no longer part of the sub-stack. Since, the sub-stacks of Approach B do not overlap in time, the master is changed for every sub-stack. In case of Approach C, only one image of the substack is interchanged and for this reason the same master can be used for a longer time period. Identifying common PS between the sub-stacks is necessary for a seamless and continuous postprocessing analysis of the PS displacement time series. In case of Approach A, this is easily achieved, because the approach always considers the entire available dataset, and therefore a change in master would not matter. In case of Approaches B and C the master needs to be changed once it is no longer part of the sub-stack. Here, two solutions are possible. Either all images are coregistered to a super-master and an additional interchangeable master is used for the interferogram generation or the results of the sub-stacks are interpolated. An additional step such as interpolating the results would be needed less frequently in case of Approach C than for Approach B. Another concern is the sampling rate of the mean velocity. Sentinel-1 provides a new image every six days. Approach C provides a new value for the mean velocity at the same rate, while Approach B provides a new value, depending on the sub-stack size, every 120, 180, or 240 days. In case of Approach A, a new value is provided roughly every 60 days. However, Approach A could be adjusted to provide a new value every 6 days.
Even though the challenges resulting from changing the master have yet to be solved, we concluded that Approach C serves our purpose the best. Approach C provides a high sampling rate of the mean velocity and a consistent number of PS to detect short term and long term variations. Depending on the sub-stack size, the master can be reused for a longer time period than for Approach B. Thus, additional steps to recombine the sub-stack results need to be performed less often. Also, because Approach C uses sub-stacks that have a large overlap, most of the interferograms are used multiple times. At this point, there might be the potential to reuse some of the intermediate results already calculated with the previous sub-stack.
In the future, we will focus on solving the challenge of changing the master and still recombing the sub-stack results. Additionally, we will explore the possibility to decrease processing time by reusing intermediate results and comparing the total amount of time needed per year to use the approaches for a continuous monitoring of ground deformation.