Concept to Analyze the Displacement Time Series of Individual Persistent Scatterers

Persistent Scatterer Interferometry (PSInSAR) exploits a time series of Synthetic Aperture Radar (SAR) images to estimate the mean velocity with which the surface of the earth is deforming. However, most PSInSAR algorithms estimate the mean velocities using a linear regression model. Since some deformation phenomena can exhibit a more complex behavior over time, using a linear regression model leads to potentially wrong estimations for the mean velocity. For example, the velocity of a landslide moving down a steep slope can change depending on the water content of the material of the landslide, or an inactive landslide can reactivate due to an earthquake. Both scenarios would not result in a time series with a constant linear slope but in a piecewise linear time series. This paper presents a Matlab-based tool to analyze an individual Persistent Scatterer (PS) time series. The Persistent Scatterer Deformation Pattern Analysis Tool (PSDefoPAT) aims to build a mathematical model that sufficiently describes the time series trend and seasonal and noise components. The trend component is estimated using polynomial regression and piecewise linear models, while a sine function approximates the seasonal component. The goal is to identify the best fitting model for the displacement time series of a PS. PSDefoPAT is introduced by examine the time series of three different PS located in the region surrounding Patras, Greece. Based on the derived models, we discuss the nature of their deformation patterns.


INTRODUCTION
Persistent Scatterer Interferometry (PSInSAR) is a method widely used to monitor the deformation of the surface of the earth. In contrast to Synthetic Aperture Radar Interferometry (InSAR), PSInSAR uses a time series of repeat-pass SAR images instead of only two images to overcome the limitations of InSAR, such as the atmospheric phase delay and the spatial and temporal decorrelation (Ferretti et al., 2001;Hooper et al., 2004). The main product of this method is a set of estimated rates for the mean deformation velocity over a specific observation period. Studies have shown that these estimates are very accurate (Adam et al., 2009). In recent years, studies such as Tomás et al. (2019) have used these estimates to, for example, distinguish active persistent scatterers (PS) from non-active ones and then define areas of active deformation. Afterward, they use the displacement time series of those areas to associate them with different deformation patterns. However, most PSInSAR algorithms estimate the mean velocities using a linear regression model. Figure 2a provides an example of a linear time series. Since some deformation phenomena can exhibit a more complex behavior over time, using a linear regression model leads to potentially wrong estimations for the mean velocity. An example of a mean velocity map is given in Figure 1. The map shows the mean velocities of the city of Patras in southern Greece and its surrounding region, estimated for the observation period from January 2019 to January 2021. Based on this map, many red or blue PS clusters would be regarded as actively deforming areas. This conclusion is possibly a fallacy, because, for example, the velocity of a landslide can change over time, resulting in a piecewise linear time series. An example of a piecewise linear time series is shown in Figure 2b. Estimating the mean velocity over the entire observation period with a linear regression model would not provide the information that the deformation process is slowing down. Faults can switch from being active to being dormant and vice versa, meaning the mean velocity of that area is underestimated or overestimated.
Further, a periodic shift between subsidence and uplift due to changing soil-water content results in a periodic signal. An example of a periodic time series is presented in Figure 2c. In this case, the linear regression slope estimate, i.e., the mean velocity, would be zero. An algorithm that identifies actively deforming areas based on the mean velocity would not identify this area as active. Additionally, several deformation patterns may overlap, as shown in Figure 2d. The time series has a linear trend and a seasonal component. An example of such a deformation is the deformation of a newly built dam. The dam body experiences subsidence due to the settlement of its building materials and the dead load of the reservoir water. The water level of the reservoir can vary over time due to variations in the water intake of the reservoir and water consumption due to irrigation, freshwater usage, or power generation. The varying water level can result in a periodic shift between subsidence and uplift.  Hence, different deformation models need to be considered, and their significance needs to be evaluated to determine the best fitting model. A corresponding approach was implemented by Berti et al. (2013). The study considers linear, quadratic, and piecewise linear models to approximate the trend component of the time series. However, the study does not take into account that the time series may have a seasonal component as well.
In this paper, we present a Matlab-based tool to analyze the individual Persistent Scatterer time series. The Persistent Scatterer Deformation Pattern Analysis Tool (PSDefoPAT) aims to build a mathematical model that sufficiently describes the time series trend, seasonal and noise components. The trend component is estimated using polynomial regression and piecewise linear models, while the model for the seasonal component is a sine function. The best-fitting model for the displacement time series is determined using the Schwarz Bayesian Information Criterion (BIC) and tests for the significance of the regression models. Based on these models, we will discuss, which deformation patterns can be identified solely using the information gathered from the displacement time series and which may need further information. The paper is structured into five sections. The area of interest (AOI) and the data set are presented in Section 2. PSDefoPAT and the statistical methods it uses are described in Section 3. In Section 4, we present the time series of three PS located in Areas A, B, and C marked in Figure 1. The time series are decomposed into their trend, seasonal, and noise components. Based on the resulting model, we discuss what deformation phenomena may be their cause. Our conclusions are presented in Section 5.

Dataset
The AOI is located in southern Greece on the Peloponnese Peninsula. The Gulf of Patras frames the AOI in the North, the Hellenic subduction zone to the West, and the Erymanthos and Panachaiko mountains in the East and the South. Expected deformation patterns are vertical and horizontal displacements due to landslides and numerous active faults (Sakkas et al., 2018;Chalkias et al., 2014). The dataset used to analyze the surface deformation of the area over time consists of 122 Sentinel-1 A and B SAR images. The images were acquired with a descending acquisition geometry and Interferometric Wide-Swath (IW) acquisition mode. The dataset covers the period from January 2019 to January 2021 with a repeat time of 6 days. The master scene is the image recorded on January 7, 2020.

Persistent Scatterer Interferometry Processing
PSInSAR exploits a time series of SAR images instead of only two images to address decorrelation and atmospheric phase delay. The main idea is to identify PS pixels. A PS pixel distinguishes itself from other pixels by having a low noise level (Ferretti et al., 2001;Hooper et al., 2004). PSInSAR algorithms such as the Standford Method for Persistent Scatters (StaMPS) only use the PS pixels to estimate the deformation occurring during the observation period (Hooper et al., 2004). The main products of each PSInSAR analysis are the estimated mean velocities of the observation period and the corresponding displacement time series for each PS. Figure 1 shows the estimated mean velocity in the direction of line of sight (LOS) for the area surrounding the city of Patras. The mean velocity in Figure 1 ranges from -12 mm/a (blue) to -10 mm/a (red). The negative velocity represents a movement away from the sensor, and the positive velocity a movement towards the sensor. Figure  1 shows many actively deforming areas in the region surrounding Patras and in the city itself. However, in this paper, we will concentrate on three areas marked with black rectangles in Figure  1. Area A includes the south and central part of Patras, while Area B contains a mainly mountainous region, and the main feature of Area C is a newly built and filled dam.

METHODS FOR PS DISPLACEMENT ANALYSIS
In general, a time series is defined as a chronologic sequence of observations made on a specific variable, in this case, the displacement of a PS. Time series analysis aims to characterize the time series and develop a mathematical model that describes the evolution of the variable over time (Neusser, 2016;Montgomery et al., 2015). PSDefoPAT, presented in this paper, was designed to analyze the individual displacement time series of each PS. The tool is Matlab-based and uses mainly the following three toolboxes: The user interface of PSDefoPAT is shown in Figure 3. It is structured into three areas. The estimated mean velocities of the PS are plotted in Area I. Area II provides general functions such as selecting a specific PS to analyze or loading another data set. Area III is dedicated to examining the displacement time series of a selected PS. We divided the analysis into four steps:

1) Exploratory Data Analysis (EDA) 2) Data Preparation 3) Estimation of the Trend 4) Seasonal Component
Each step is represented by one tab. The components of the decomposed time series are plotted in the last tab of Area III.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2021 XXIV ISPRS Congress (2021 edition) Figure 3. The user interface of PSDefoPAT is structured into Areas I, II, and III.
Part of EDA is to use basic statistical concepts such as the mean , the median , the variance 2 , or the standard deviation to characterize the time series. The mean and the median are both measures of central tendency. While the mean is also commonly referred to as the average value of a time series, the median is the middle value of the time series, if all data points are sorted from the lowest to the highest value. Comparing both parameters provides information on the distribution of the data points. For example, the median and the mean are identical, if the data distribution is symmetric and mound-shaped with a single peak. If the data distribution is asymmetric, meaning one tail of the data distribution is longer than the other, it is referred to as skewed. In that case, the mean and the median differ. The mean is shifted towards the elongated tail of the data distribution.
Positively skewed: > Symmetrical: = Negatively skewed: < The skewness is a measure of how much the symmetry of the data distribution differs from the symmetry of a normal distribution (Ott et al., 2015). Another measure parameter that expresses the deviation of the data distribution from a normal distribution is kurtosis. It compares the likelihood of extreme values to be present in the data distribution to the one for normally distributed data. The kurtosis of a normal distribution is three. A value above three indicates that extreme values are more likely to be present in the data, and a value below three that they are less likely to occur (Shumway et al., 2017). The variance and the standard deviation are measures of dispersion. Both parameters indicate how spread out the data is. The variance is the average squared difference between the mean and the data points, while the standard deviation is simply the root square of the variance (Ott et al., 2015). Outliers in the time series have an impact on the regression model used to estimate the trend and seasonal component of the time series. Hence, outliers need to be identified and replaced. The detection of outliers is influenced by the data distribution of the time series. Therefore, the type of data distribution needs to be known to choose the appropriate approach to identify outliers. Graphical methods such as histograms and box plots visualize the data distribution. The histogram of the time series of a selected PS is shown in the first tab of Area III in Figure 3, while the box plot is displayed in the second tab. Both graphical methods support the detection of outliers in the time series. For the outlier detection, we implemented a manual and a semi-automatic approach. In the case of a manual outlier selection, the user can declare data points as outliers after examining the time series plot, the box plot, and the histogram. An exemplary box plot can be seen in Figure 4. The method employs the first and third quartile and the median to visualize the symmetry and variability of the data. The first or lower quartile marks the 25 % percentile, and the third or upper quartile the 75% percentile of data points, if the data points are arranged from the lowest to the highest value. The interquartile range (IQR) is the difference between both. In this context, the median is sometimes referred to as the second quartile (Ott et al., 2015). The upper and lower line of the blue box, displayed in Figure 4, represent the upper and lower quartile, while the red middle line of the box marks the median. Additionally, the points situated at 1.5 IQR above and below the upper and lower quartile are marked with black whiskers. Any point situated outside these limits is represented by a red cross and is considered as an outlier. In the case of the semi-automatic outlier detection, the user can choose between three different approaches. A data point is considered as an outlier if:  it deviates more than three times the scaled median absolute deviation from the median,  it deviates more than three times the standard deviation from the mean, or  it is situated more than 1.5 interquartile ranges above the upper quartile or below the lower quartile.
The first two options are both applied over a moving window. In our tool, the size of the moving window is set to 5 % of the time series length. Extreme measurements influence the median less than the mean. Therefore, using the median as a reference value is more robust towards outliers than using the mean. The third approach, which uses the upper and lower quartile, and the interquartile range to identify outliers, performs better for nonnormally distributed data than the first two approaches (Ott et al., 2015). After identifying data points as outliers, they are replaced by linear interpolation using the non-outlier neighbors. The alternative to outlier detection is smoothing the time series. We included five different methods to smooth the time series: All methods are applied locally over a moving window. The window size is set to 10% of the time series length. Both LLR approaches use the concept of weighted least squares. However, the robust LLR method assigns a weight of zero to possible outliers and is more robust to very noisy data. After removing the outliers from the time series or eliminating part of the noise by smoothing the time series, the trend and periodic component become more apparent. The next step in time series analysis is to develop a mathematical model that describes the evolution of the variable of interest over time. Therefore, the time series needs to be decomposed into its trend, periodic, and noise component (Neusser, 2016). In our tool, estimating the trend and seasonal component are two separate steps. We test first-, second-, and third-degree polynomial models and a piecewise linear regression model for the trend component. Estimating the first-, second-, and third-degree polynomial regression models is based on the concept of ordinary least squares (OLS). A k-degree polynomial regression model can be written as follows: ̂= 0 + 1 + 2 2 + ⋯ + . (1) Here represents the regression coefficients, the predictor variable, and ̂ the predicted data points. In the case of a linear model, the number of regression coefficients would reduce to two, and the equation can be written as follows: The OLS approach minimizes the squared difference between the collected data points and the data points ̂ predicted by the regression model (Ott et al., 2015).
Instead of a polynomial regression model, the time series can also be approximated by a piecewise linear regression model, representing the time series as a sequence of linear segments. The connecting data points between these segments are referred to as change points (CP). If only one CP exists, the regression model can be described as follows (Malash et al., 2010): with = { 0, < 1 1, ≥ 1 .
In our tool, the regression coefficients and the CP 1 are determined using the concept of non-linear least squares. However, to solve the non-linear least squares problem, an initial value for the CP 1 needs to be approximated. For that reason, we included a semi-automatic change point detection (CPD) and the option to select a CP manually. The semi-automatic CPD detects changes in the mean, root-mean-square level, standard deviation, or the estimated linear slope. The next step is to determine the goodness of the fit for each model. Commonly used criteria are the Schwarz Bayesian Information Criterion (BIC), the Akaike Information Criterion (AIC), and R². Another option is to perform a significance test for the regression models as a whole or their individual terms (Montgomery et al., 2015). All four options rely on knowing the following parameters:  total sum of squares (SST),  sum of squares due to the regression model (SSR),  sum of squares due to the residual or error (SSE),  number of samples N,  number of predictor variables ,  degree of the polynomial regression model.
The relationship between SST, SSR, and SSE can be written as follows: The criterion 2 is calculated using SSE and SST.
The model that maximizes the criterion R² is considered the bestfitting model. Since SST is independent of the regression model and SSE tends to minimize with higher degree polynomial regression models, R² tends to favor higher degree polynomial regression models. The AIC and BIC criteria both penalize SSE for adding more terms to the regression model. However, the BIC uses a more severe penalty term than the AIC. Therefore, relying on R² or the AIC to select the best fitting model is more likely to result in overfitting than using the BIC (Montgomery et al., 2015). A significance test for a regression model determines whether or not the null hypothesis 0 is rejected. The null hypothesis assumes that the regression model does not sufficiently explain the evolution of the collected data points over time.
0 : 0 = 1 = ⋯ = = 0 (9) 1 : at least one ≠ 0 If the null hypothesis is true, the test statistic 0 is calculated as follows: The null hypothesis is rejected if the corresponding p-Value of the 0 test statistic is less than the threshold . The p-Value is defined as the area under the curve of the F-distribution between 0 and infinity (Montgomery et al., 2015). An example of an Fdistribution is presented in Figure 5a. If the significance test is applied to the individual terms of the regression model, the test hypothesis is: The term in question can be deleted from the model, if 0 is accepted. The test statistic, in this case, is the t-statistic.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2021 XXIV ISPRS Congress (2021 edition) The parameter is a diagonal element of the variancecovariance matrix of the estimated regression coefficient . The null hypothesis is again rejected, if the corresponding p-Value of the test statistic is less than the threshold . In this case, the p-Value is defined as the sum of the area under the curve of the tdistribution between and infinity and between − and negative infinity (Montgomery et al., 2015). An example of a tdistribution is presented in Figure 5b. We decided to use the BIC value, a significance test for the model as a whole, and its individual terms as the criteria to select the best-fitting model. The threshold α for the p-Value is usually set to 0.1, 0.05, or 0.01. We use α = 0.05, i.e., if the p-Value is smaller than 0.05, the probability that the null hypothesis is accepted is less than 5%. After selecting the best-fitting model for the trend, the original time series is detrended. The residual time series may still hold a seasonal component. Displacement time series that have a seasonal component are often linked to deformation phenomena induced by, for example, the varying water content or temperature of a material. Sine functions can describe both phenomena. Thus, a sine function is used to approximate the seasonal component.
The amplitude , the frequency , and the time offset offset are estimated by solving a non-linear least squares problem, which requires initial values. The initial value for is set to = max and the one for offset is set to zero. The initial value for is estimated using the frequency spectrum of the collected data points . The final step is to subtract the estimated seasonal component from the residual time series and display the trend, seasonal, and noise components separately.

RESULTS & DISCUSSION
In the following section, the displacement time series of three different PS, referred to as PS 1, 2, and 3, will be presented and discussed. The PS are located within Areas A, B, and C, marked in Figure 1. The first time series belongs to PS 1, which is located in Area C. Figure 6 shows the time series plot of PS 1, the corresponding histogram, and provides an overview of statistical parameters such as the mean, the median, and the standard deviation of the time series. The time series is normally distributed, but comparing the values for the median, ( = −0.74 mm a ⁄ ) and the mean ( = −2.23 mm a ⁄ ), reveals that the distribution is negatively skewed. The initially estimated mean velocity for this PS is mean = −13.0631 mm a ⁄ . The time series plot in Figure 6a indicates that the time series has a linear trend and that some data points might be outliers. Figure 6b illustrates the second step. The manually selected outliers are marked in red in the time series plot. Even though the box plot indicates that no outliers are present in the dataset, three data points were declared as outliers. The data points were marked as outliers, because the time series plot shows that they deviate significantly more from the trend of the time series than the other data points. The outliers are replaced, and the edited time series is shown in Figure 7a. The trend of the time series was estimated using the edited time series. Three polynomial regression models were tested: (1) linear, (2) quadratic, and (3) cubic. A piecewise linear regression model was not tested, because the semi-automatic CPD did not identify a significant change within the slope of the time series. The table in Figure 7a presents the significance test results and the calculated BIC value for each tested model. The p-Value for the overall significance of each model is smaller than 0.005, meaning all models describe the time series sufficiently. However, the p-Values for each term of the regression models (see column Coefficient p-Value in Figure 7a) indicate that the cubic model is overfitted, while the linear and quadratic models are not. The last parameter to consult is the BIC. The regression model with the smallest BIC explains the time series sufficiently and is the least overfitted. In this case, the BIC of the linear regression model is the smallest. Therefore, the linear regression model is selected as the best-fitting model. After choosing a model for the trend component, the time series is detrended. The residual time series, shown in Figure 7b, is inspected for periodicity. The power spectrum of the residual time series is presented in Figure 7b. It reveals which frequencies are present in the time series. The frequency with the highest significant peak is selected as an initial value to fit a sine function to the data points. The result is shown in Figure 7b as a red line. The blue dotted line is a fitted sine function with a period of 365 days. Comparing both fitted curves to the data points reveals that the time series has a seasonal component. The estimated period of the seasonal component is 373 days. All three components of the time series of PS 1 are plotted separately in Figure 8. PS 1 is located on top of the newly built Paraperios-Peiros Dam. The construction of the dam body was finished in early 2019, and the filling process started in September 2019 (Evers et al., 2020). The dam body was expected to experience subsidence due to its dead load and the reservoir water's dead load (Hunter et al., 2003). The original time series plotted in Figure 8 has a level shift between October 2019 and January 2020. This level shift was not detected as a CP because the slope did not change significantly enough after the level shift. However, the level shift may coincide with an increase in the amount of water being impounded due to heavy rainfalls, because the west coast of Greece is known to have heavy rains beginning in September or October (Gofa et al., 2019). Another possibility is the phenomenon called collapse compression. The building material of an embankment dam can experience additional short-term subsidence upon its first wetting (Hunter et al., 2003). Since, the estimated seasonal component of the time series of PS 1 has a period of 373 days, which is close to a periodicity of one year. A seasonally varying water level might cause the seasonality in the displacement time series. Further, in order to confirm whether the level shift was caused by the amount of water being impounded or collapse compression, a time series of PS 1 would need to be compared to a time series of the reservoir water level and the time series of surrounding PS. The effect of collapse compression depends on material-specific parameters, and embankment dams are not homogenously built. Therefore, collapse compression would have only a local impact on the dam body. The dead load of the reservoir water, on the other hand, affects the entire dam body (Hunter et al., 2003). Also, a water level time series would be needed to confirm a correlation between the varying water level and the seasonal component in the time series of PS 1.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2021 XXIV ISPRS Congress (2021 edition) The original displacement time series of PS 2 and 3 and their components are illustrated in Figure 9. PS 2 is located in Area A, which is a mainly urban area. The initially estimated mean velocity of PS 2 is mean = − 2.0505 mm a ⁄ . Decomposing the time series revealed that in addition to a linear trend with a slope of mean = −3.139 mm a ⁄ , the time series also has a seasonal component. The estimated period of the seasonal component is 355 days. A periodicity of one year is typically linked to weather conditions. For example, a varying groundwater level could cause a shift between subsidence and uplift. To verify this, the displacement time series would need to be compared to a time series of groundwater levels or precipitation levels. Concerning the linear trend, additional information in the form of a digital elevation model would be needed to distinguish a landslide from an area of subsidence or uplift. PS 3 is located in Area B, which is a mountainous area west of Patras. The area is prone to landslides (Del Soldato et al., 2018). The mean velocity of PS 3 was initially estimated to be mean = − 7.197 mm a ⁄ . The time series analysis revealed that the time series is piecewise linear, meaning the slope changes over time. This change occurred 416 days after the beginning of the observation period in January 2019. The slope changed from − 12.045 mm a ⁄ to −1.679 mm a ⁄ implicating that the velocity with which the PS is displaced decreased. PS 3 belongs most likely to a landslide. However, without the information that the PS is located on a slope, it would not be possible to distinguish a landslide from an area of subsidence or uplift. Additionally, it needs to be stated that only displacement rates in LOS direction were analyzed. Therefore, displacement time series in an ascending and descending LOS direction would be necessary to distinguish between a vertical and a horizontal displacement.

CONCLUSION
In this paper, we presented a Matlab-based tool to analyze the displacement time series of the individual PS. Our PSDefoPAT aims to determine a mathematical model that sufficiently describes the time series. We demonstrated with three exemplary PS displacement time series, that additional relevant information is gained by decomposing the time series into its trend, seasonal and noise component. For example, the time series of PS 1 was decomposed into a linear trend component with a slope of 13.32 ⁄ and a seasonal component with a period of 373 days. The information that the displacement time series of PS 1 has a seasonal component would not be available, if the deformation of the area was only analyzed using the mean velocity map. Since PS 1 is located on the Parapeiros-Peiros Dam, it is plausible to assume that the seasonal component of the deformation time series is linked to the water level in the reservoir. To make this conclusion with confidence, the seasonal component of PS 1 would need to be compared to the seasonality of the reservoir water level. Therefore, we will include additional data sources and implement options to analyze the correlation between different time series in the future.