DAM DEFORMATION PREDICTION BASED ON EEMD-SARIMA MODE

Abstract. There are many factors affecting dam deformation, and the time series of deformation data is directly modeled without considering the seasonality and periodicity of each influencing factor, the Ensemble Empirical Mode Decomposition (EEMD) and the Seasonal Autoregressive Integrated Moving Average (SARIMA) is proposed for prediction in this paper. Firstly, the time series of deformation data is decomposed by EEMD, which weakens its volatility to some extent, and decomposes various factors affecting dam deformation, so as to obtain a series of Intrinsic Mode Function (IMF) with different frequencies; secondly, according to the seasonal characteristics and periodic characteristics of each IMF, the SARIMA model was established respectively for rolling prediction; thirdly, the final forecast results can be obtained by superimposing the forecast results of each IMF. It is verified by experiments and compared with Gray Model, Kalman Filter Model and SARIMA model that EEMD-SARIMA model has higher prediction accuracy, and it has better fitting degree, which means that it is an effective method for dam deformation prediction.



INTRODUCTION
Dam will be affected by many factors during the process of construction and use. Under the joint action of various factors, the series of deformation data observed during the process of construction and use of dam tend to be highly volatile. It is of great significance for dam safety to grasp dam deformation law and make accurate forecasts in time (JIANG .et al,2006). At present, the prediction methods commonly used in dam prediction research include time series method, grey theory, neural network, regression analysis and wavelet analysis. For dam deformation data with strong volatility, various methods have certain limitations (WEN .et al.2000). The accuracy of time series method and grey theory is lower for the data with high volatility; neural network has strong dependence on the selection of the influence factors, and it is prone to over-fitting; however, regression analysis only analyzes from the perspective of data, lacking physical significance and objectivity; wavelet analysis has strong dependence on the selection of basis function, and it lacks of adaptability (RONG .et al.2018). The key of prediction is the fitting of deformation trend. Only by accurately fitting the deformation trend of settlement 1 data can the prediction accuracy be fundamentally improved. The main influencing factors of dam deformation include aging, temperature, water pressure, etc. (LIU .et al.2009). These factors often have obvious periodicity and seasonality, which makes deformation data have certain periodicity and seasonality. However, the periodicity and seasonality of various factors inevitably have slight differences. If we can distinguish the influencing factors, it is a reasonable method to establish a reasonable model for the periodicity and seasonality of single or a small number of influencing factors. Based on this, this paper establishes a model based on EEMD and SARIMA. Because EEMD decomposition has adaptability and decomposed each IMF have certain physical meaning (PAN .et al.2018), EEMD is used to decompose the original observation data of the horizontal displacement of dam crest, and separate the influencing factors, so as to weaken the volatility to a certain extent, then, SARIMA model is established to better grasp the periodic and seasonal influences and fit the trend of the dam deformation, so as to improve the prediction accuracy.

EEMD
EEMD is an improved model of Empirical Mode Decomposition (EMD) (ZHAO .et al.2015). Huang believed that any original signal is composed of several overlaps of eigenmode functions (WANG .et al.2010). In essence, EMD stabilizes the signal to obtain a series of IMF with gradually stable frequency and a margin B. However, when the time scale of the signal has a jump change, IMF will contain different time scale characteristic components, and EMD itself is prone to modal aliasing. In view of these situations, Huang, Wu proposed EEMD (Wu Z H .et al.2009). By adding white noise in the signal to be decomposed, EEMD utilizes the uniform distribution of the white noise spectrum to make the original signal spread over the white noise background with uniform distribution throughout the time-frequency space. Signals of different time scales will be automatically distributed on the appropriate reference scale, so that the signal has continuity on different scales, so as to achieve the purpose of suppressing mode aliasing. Because the mean value of white noise is zero, after adding white noise multiple times to obtain an average value, the additional noise will be eliminated, and the result is the signal itself. The decomposition steps of EEMD have been studied in great detail by predecessors, so this paper will not go into details.
When performing EEMD, the added white noise amplitude of normal distribution is  and the number of repetitions n needs to be determined through multiple experiments. Chen Zhong(CHEN .et al.2013)pointed out that  is usually takes The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International Conference on Geomatics in the Big Data Era (ICGBD), 15-17 November 2019, Guilin, Guangxi, China 0.01~0.5, and the repetition number n is generally 100~200.

SARIMA
SARIMA is an improved Model of Autoregressive Integrated Moving Average Model(ARIMA) (CHEN .et al.2018). ARIMA model is also written as ARIMA(p,d,q) (p is the autoregressive term, d is the number of differential, and q is the moving average term). It is obtained by applying the dth order differential of ARMA model. Through such operation, a non-stationary time series is transformed into a stationary time series, and then (p,q) parameters are determined by observing the truncation and trailing characteristics of correlation functions and analyzing autocorrelation and partial correlation, so as to determine the model parameters and conduct modeling of time series. However, when there are obvious cyclical trends and seasonal changes in the time series, the results obtained are often not ideal if directly analyzing it by ARIMA . At this time, the seasonal product model SARIMA(p,d,q)(P,D,Q)(p,d,q is the same as the above representatives, P is the seasonal autoregressive order, Q is the seasonal sliding average order, and D is the order of the seasonal difference)is appropriately introduced, so as to get a better analysis of the result. The time series affecting the dam settlement displacement value is often accompanied by periodicity and seasonality. Therefore, the author chooses to use SARIMA to conduct experimental analysis.
Modeling steps of SARIMA: Step 1 Establish model ARMA(p,q) for the time series t Y of equidistant sampling: Among them, q is the lag operator, is the polynomial of order p .
Step 2 When the original time series t Y is unstable, carry out d-order differential processing according to the following formula: Among them, u is the differential operator, d is the differential order, and  is the differential symbol.
Step 3 After the d order differential, the unstable time series t Y becomes a stationary time series. After the correlation and partial correlation analysis and observing its trailing and truncating properties, ARIMA(p,d,q) modeling can be carried out for the post-processing time series. The general form of the model is as follows: Step 4 If seasonal factors exist in the original time series, SARIMA with a period of c is used to extract its seasonal periodicity, and ARIMA model is used to extract its short-term correlation (FAN .et al.2009). The two are combined to establish the SARIMA model. The general form is as follows: Among them, D is the seasonal differential order, C is the seasonal period, is the seasonal m-order sliding average polynomial, and is an autoregressive polynomial of n-order of seasonality.

Establish EEMD-SARIMA Model
Due to the complexity of the dam project and the uncertainty of the external environment, as well as all kinds of uncertain factors exist in the process of observation, the volatility of the time series of dam observations tend to be stronger, and the original data are decomposed by EEMD, the frequencies of different IMF is obtained, and the more suitable parameters of the IMFs with different frequencies are selected to establish the SARIMA model, which can better fit the IMF of each order and is more practical. Therefore, EEMD-SARIMA model is adopted to analyze settlement data. The modeling steps are as follows: Step 1 Perform EEMD decomposition on the original observation data to obtain a series of IMF with different frequencies; Step 2 Parry out unit root test for each IMF order, and perform differential treatment on unstable IMF; Step 3 Preliminarily determine the model through correlation analysis and partial correlation analysis of each IMF order; Step 4 Adjustment model to be optimal by Akichi Information Criterion (AIC ) and Schwartz Criterion (SC); Step 5 Predict the IMF values of each order using the tested model; Step 6 Overlaps the IMF forecast results of each order to obtain the final result.
The basic flow chart of the model is as follows: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International Conference on Geomatics in the Big Data Era (ICGBD), 15-17 November 2019, Guilin, Guangxi, China Since the sampling interval of the original observation data is extremely uneven, the author first interpolates the original data to obtain the time resolution which is the settlement data of one day, and analyzes whether it satisfies the experimental conditions. Correlation analysis and significance test were carried out on the data of each dam over the years by Statistic Package for Social Science (SPSS). (Due to the limited number of data, correlation analysis is only performed on the year in which the data is complete) .
As shown in the following  Dam between 1985Dam between , 1986Dam between , 1987 and over the year is more than 0.9, and the significance is less than 0.05; and the correlation coefficient of the observation data of Fengman Dam between 1988Dam between ,1989 and over the year is more than 0.3, and the significance is less than 0.05. As the number of samples selected in this paper is large and the average number is more than 300, and the increase in the number of samples increases the difference among samples, which leads to the reduction of correlation coefficient. However, through the significance test, it is proved that there is significant correlation among the observation data over the years, that is to say there is significant periodicity and seasonality in the observation data. In summary, the settlement observation data of Fengman dam is applicable for this experiment (CAO .et al.2018).
Correlation analysis of the observation data of Xiaolangdi dam shows that the correlation coefficient of 2007 and 2008 is 0.922, and the significance is 0 and less than 0.05. Therefore, the observation data of Xiaolangdi dam is also applicable for this experiment.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International Conference on Geomatics in the Big Data Era (ICGBD), 15-17 November 2019, Guilin, Guangxi, China The observation data of the two dams over the years and the upstream water level and temperature are shown in the figure below: Figure It can be seen from figure 4 and figure 5 that the two dams due to the different geographical position (the Fengman Dam is located in Jilin City, Jilin Province, and the Xiaolangdi Dam is located in Luoyang City, Henan Province), resulting in different climate differences between the two dam sites. The temperature of the dam site and the difference of the upstream water level cause different directions and sizes of the displacement components generated by the influencing factors, and the weights of the influencing factors also change over time,but for a certain period of the weight can be considered to be relatively stable (HE .et al.2003), and under the joint action of various factors, it leads to the change of the displacement of dam crest. Due to the differences in the influencing factors of the two dams, it is considered that there are seasonal differences in the horizontal displacement of the dam crests. Therefore, the author selects these two group of data for experiments,and believes that the two group of data are representative.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International Conference on Geomatics in the Big Data Era (ICGBD), 15-17 November 2019, Guilin, Guangxi, China Figure.6 IMF diagram of the horizontal displacement observation of the Xiaolangdi dam crest

Prediction Experiment
Since the deformation value of the dam is small in one day, the observation data of the two dams are resampled to obtain the observation data with a time resolution of one week, and then EEMD decomposition is carried out respectively (The IMF of the observed displacement values of Xiaolangdi dam crest is shown in figure 6). The prediction experiments of IMF training SARIMA model are also carried out. The previous 215 data were used as training samples. The 10th, 15th, and 20th phases were selected as the prediction step size, and the precipitation data of the 60th phase of the 216th to 275th phases of the IMF were predicted by rolling (Taking 10 phases as the prediction step, and the data of 216-225 phases are forecasted in the first 215 phases, and the data of 226-235 phases are forecasted by 11-225 phases, and so on). After that, the prediction results of the IMF's of each order are superimposed to obtain The final forecast.
By introducing precision indexes, such as Mean Absolute Error (MAE), Mean Relative Error (MRE), Root Mean Square Error (RMSE), and by comparing with the prediction results using only the SARIMA model, the prediction results are analyzed and the model is evaluated. The mathematical expressions of the three evaluation indexes are as follows: Among them, n is the number of prediction phase, i X is the predicted value of phase i , and  i X is the observed value of phase i .

Analysis of Prediction Results
The prediction experiment was conducted on the observation data of the two dams, and the accuracy comparison of the prediction results was shown in the following It can be seen from table 2 and table 3 that the prediction accuracy of EEMD-SARIMA model is higher than that of SARIMA model under three prediction steps. This is because after the decomposition of EEMD, the factors affecting the dam are decomposed to a certain extent, which weakens the volatility of the data. After the EEMD decomposition of the time series of observation data, each IMF component has a certain physical meaning, rather than the superposition of all influencing factors, the influential factors of each IMF component is reduced, making the periodicity and seasonality of each IMF easier to grasp. Through the establishment of more specific to each IMF component model, can better reflect the deformation trend and deformation law, so as to better fit the settlement data to obtain higher precision prediction values, and the model is more in line with practical significance.
According to the above table, for the EEMD-SARIAM model, when the predicted step size is 10 phases, the predicted MRE values of the two dams are 8.24% and 7.10% respectively, and the mean value is 7.67%. When the predicted step size is 15 phases, the predicted MRE values of the two dams 12.76% and 9.23% respectively, and the mean value is 10.99%. When the predicted step size is 20 phases, the predicted MRE values of the two dams 16.03% and 10.86% respectively, and the mean value is 13.45%. With the increase of prediction step size, the prediction accuracy gradually decreases, indicating that the EEMD-SARIMA model is better for short-term prediction and less effective for medium-and long-term prediction. The MRE value of all predicted step sizes is 11.6%, indicating that the prediction effect is good, and the EEMD-SARIMA model can be used as a predictive model for dam deformation. For the SARIMA model, when the predicted step size is 20 phases, the MRE of the predicted values of the two dams are 29.49% and 17.73% respectively, and the mean value is 23.61%, and from the prediction process, the reason the prediction accuracy is low is that the trend of deformation can not be well grasped, resulting in a large deviation between the predicted data and the actual observed data, as shown in the following figure: Figure.7 Forecast of the two dams 256-265 In the above two cases, the trend of the original observation data of the prediction results of the SARIMA model is obviously different. Further correlation analysis of the observation data shows that the correlation coefficients are 0.7015 and 0.6882 respectively, which are basically predictive failures. The prediction results of the EEMD-SARIMA model are basically consistent with the trend of the original observation data, and the correlation analysis results show that the correlation coefficients are 0.9726 and 0.9919 respectively. Therefore, compared with the SARIMA model, the EEMD-SARIMA model can better fit the deformation trend of the horizontal displacement of the dam crest, which has strong applicability and stability. Further comparison of the prediction effects of the EEMD-SARIMA model. Kalman Filter Model and Grey Model are used to predict the data adopted in this paper (Observation data in the first 215 phases are used as training samples, and rolling prediction of data in the 216-275 phases is made with 10 phases as prediction step size). The prediction results are shown in figure 8:

Figure.8 Comparison of prediction results
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International Conference on Geomatics in the Big Data Era (ICGBD), 15-17 November 2019, Guilin, Guangxi, China As can be seen from the figure, compared with other models, EEMD-SARIMA model can better fit the change trend of data. Further analysis of the prediction accuracy shows that, for Fengman Dam, the RMSE, MAE, and MRE of Kalman Filter are 1.12mm, 0.91mm, and 13.97% respectively; while the accuracy indexes of EEMD-SARIMA model are 0.34mm, 0.49mm and 7.1% respectively. In contrast, the prediction accuracy of the model proposed in this paper is better; and the trend of the prediction results of the Gray Model is obviously different from the original trend. For Xiaolangdi Dam, the RMSE, MAE, and MRE of Kalman Filter are 30.12mm, 25.28mm, and 20.64% respectively; while the accuracy indexes of EEMD-SARIMA model are 4.77mm, 6.25mm and 8.24% respectively; and the trend of the prediction results of the Gray Model is obviously different from the original trend. In conclusion, compared with SARIMA model, Kalman Filter and Gray Model, EEMD-SARIMA model proposed in this paper has higher prediction accuracy and better fitting for deformation trend.

CONCLUSIONS
In this paper, the observation data of the horizontal displacement of the dam crest of Xiaolangdi Dam and Fengman Dam are analyzed and the prediction experiment is carried out. From the above analysis process, the author draws the following conclusions: 1) The time series of horizontal displacement of the crest has strong seasonal characteristics. Although the prediction value can be obtained when only using the SARIMA model for prediction, there are many factors affecting the deformation of the dam, and the seasonal characteristics exist difference of various factors, the prediction accuracy is not good. The EEMD-SARIMA model is used to decompose the observations first, so that the various influencing factors are separated, and the influence of the differences between the factors is reduced, thereby improving the prediction accuracy.
2) The EEMD-SARIMA model weakens the volatility of the data to a certain extent in the prediction of dam deformation, it has strong stability, good applicability and certain physical significance. Compared with Kalman Filter and Grey Model, the prediction accuracy of EEMD-SARIMA model is better, and the results show that this model is an effective method for dam deformation prediction.