RESEARCH ON GPS HEIGHT FITTING BASED ON LINEAR REGRESSION MODEL

This paper mainly expounds the parameter estimation method, the outlier diagnosis and the establishment of the optimal regression equation in the linear regression model theory, the analysis of the principle of the polynomial fitting model, the derivation of the algorithm process, and the research on the accuracy evaluation method.The GPS survey area is fitted and calculated. The fitting model is analyzed and compared in detail. The better parameter values and regression equation models of the planar region are estimated. The fitting accuracy meets the requirements of the fourth level measurement, which can be used in actual engineering. Replace the fourth level measurement in the application.


Introduction
Using the general linear hypothesis theory in the linear regression model, the significance of regression equations and regression coefficients in linear models, especially the gross error detection theory, is established, and a robust linear regression model is established to improve the prediction accuracy.
Because the purpose of modeling is to fit the elevation anomalies that have not been leveled, a reliable geoidal contour map of the study area is established. Fully understand the selection criteria of the evaluation regression equation, and according to different needs, according to different optimal equations, the optimal regression equation is established.
Emphasis is placed on the use of stepwise regression theory to discuss different linear model building methods to improve the applicability of the model.

regression model overview
Linear statistical model is a kind of highly practical model. It is also widely used in the processing of surveying data. Especially in the process of elevation fitting, many methods use it, such as moving surface method and polynomial fitting method. The theory mainly includes important parts such as parameter estimation, hypothesis analysis, linear regression, etc. The following mainly discusses three aspects:

2.1Research on theoretical modeling of parameter estimation
The study uses the theory of parameter estimation in linear regression model theory, such as regression diagnosis (Cook distance) and Box-Cox transformation to improve the accuracy and reliability of parameter estimation in linear transformation model.The estimation of the regression parameters, the parameter estimation mainly solves the regression diagnosis statistic by knowing the point data, and analyzes the abnormal points by comparing the sizes of the respective quantities, 1thereby ensuring the accuracy requirement of the elevation fitting.

Significance test of regression equation
The so-called significance test of the regression equation is to test the hypothesis: all regression coefficients are equal to zero, that is, the test If we conclude that we reject the null hypothesis H0, this means that we accept H1: at least one 0 i   . On the contrary, if the conclusion of the test is to accept the null hypothesis H0, this means all 0 i   , that is, for the error, The effect of any independent variable on the dependent variable is not important. The construction test statistic is: , we accept the alternative hypothesis H1, otherwise we accept the null hypothesis H0.

Significance test of regression coefficient
The significance test of the regression equation is a holistic test of linear regression. If we test the result of rejecting the null hypothesis, this means that the dependent variable Y depends linearly on the independent variable X1,~,Xp-1, which is the regression independent variable. Overall, but this does not exclude that Y does not depend on some of these arguments, Some i  could be equal to zero. Therefore, when the significance test of the equation is rejected, we also need to make a significant hypothesis test for each independent variable one by one, that is, the fixed i,1≤i≤p-1 is tested as follows: For models  otherwise, we accept the null hypothesis H0.

2.2.3Gross error detection theory
There are two general ideas for the method of gross error detection. One is to use the gross error as the parameter to be estimated, the idea of quasi-stationary adjustment is adopted to solve the rank-deficient problem and then directly obtain the gross error, and the other is to select some observations as the standard. For observations, it is proposed to calculate the parameters to be estimated by the least squares method, and treat the residuals of the non-quasi-observed values as gross errors. Assume that the error equation is: βx y e   (8) The original hypothesis of the data detection method is , it can be used as a standard normal distribution statistic: , there may be gross errors in i x .

3.Optimal regression equation design
For the regression equation, the so-called selection regression equation mainly consists of two parts. The first point is the choice of regression equation types, that is, their relationship is linear and nonlinear when solving specific problems.The second point is what the independent variables choose after the model has been determined.When we determine that the dependent variable and the independent variable that may affect it are suitable for a linear regression model, the result is that all independent variables, some even independent variables that have no effect on the dependent variable, are included in the regression equation, resulting in The amount of calculation becomes large, and the accuracy of the forecast also drops a lot. Therefore, when applying regression analysis to solve practical problems, it is very important to select an optimal subset of independent variables from the set of independent variables that maintain a linear relationship with the dependent variable.
Steps to specifically select the optimal regression equation: (1) The regression variable set X(1~n) is divided into two parts, one is an important set of independent variables, denoted as X1, (2)Specially write m(m=1,2,...,k) independent variables in X1 into the equation, use stepwise regression to filter the remaining variables in the remaining n-m, and control the total number of variables written into the equation. Within seven. For example, if the number of important independent variables is k=4, then the equations that can be established are 24-1=15.
(3) Perform statistical tests on the above equations, test the test and the live test, and then combine the results of the test to select the optimal regression equation.
As can be seen from the foregoing, the magnitude of the The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International Conference on Geomatics in the Big Data Era (ICGBD), 15-17 November 2019, Guilin, Guangxi, China residual squared RSS reflects the degree of deviation between the actual data and the theoretical model. It is an important criterion for evaluating the regression equation. Generally speaking, the smaller the RSS, the better the data and the model fit, assuming the full model is: Then the sum of squared residuals is: Since RSSq decreases as q increases, in order to prevent too many independent variables to be selected, we multiply the sum of squared residuals by a function that increases with a q as a penalty factor, which is recorded as: According to the nature of RMSq, we can select the subset of independent variables according to the principle that the RMSq is smaller, and it is called RMSq criterion.
The same similar guidelines are the Cp guidelines proposed by Mallows in 1964, which are: And the AIC guidelines proposed by Japanese statistician Akaike in 1974, namely: The selection principle of the three criteria is as small as possible. The optimal subset is selected by assuming the elevation anomaly and the full model between the X and Y independent variables, and then the optimal regression model is constructed to fit the elevation of the point to be solved.

Source of experimental data
The experimental data is derived from the GPS control network in a gentle riverside area. The control network has a rating of B and a normal elevation is measured at a second level. The specific values are shown in the  Table 1 Raw data of elevation anomalies The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International Conference on Geomatics in the Big Data Era (ICGBD), 15-17 November 2019, Guilin, Guangxi, China

Height fitting data calculation
There are 18 points in the survey area, and 7 points are selected as the fitting points, which are 5, 9, 11, 2, 15, 16, and 12 respectively. The remaining points are used as check points. Use Cass to indicate its distribution as:  D are also very small. Therefore, the 18th data is a data that has a great influence on the regression estimation, and needs special attention.After carefully comparing the data, I found that the transcript was correct, so the point should be removed.

4.2.2Box-Cox transformation calculation
(1) Before the data is normalized, we can use the box-cox transformation to get the following  Table 3 Correspondence between transformation parameters and RSS From the above table, we can see that when lamda=2, the residual squared sum RSS is the smallest, so we can approximate that lamda=2 is the optimal choice of transform parameters. At this time, the regression model selected according to this parameter can find the following results: The internal accuracy and the external accuracy are equal, u=m=1.0335 (the fitting accuracy is too bad, invalid) (2) After the data is normalized, we can get the following table:  Table 4 Correspondence between transformation parameters and RSS From the above table, we can see that when lamda=0.75, the residual square sum RSS is the smallest, so we can approximate that lamda=0.75 is the optimal choice of transform parameters.The correspondence diagram is:    Table 8 Elevation anomalies and residuals after fitting Among them, the internal accuracy is u=0.0069, the external conformity is m=0.0069, and the internal precision q=0.0002729. When the checkpoint is fitted according to the full model, the accuracy is m=0.0078. When we perform the box-cox transformation of the elevation fitting dependent variable, the centralization of the data sample easily leads to the final parameter selection of 1, which is the plane fitting.
At this time, you need to change the data processing method, that is, normalize the data. There is no direct and inevitable relationship between the normalization of the data and the selection of the box-cox transformation parameters. Different transformation parameters mean different fitting models, and the final conversion accuracy is different. In general, the accuracy of normalizing the data will be higher. For example, in the Box-Cox transformation, the optimal parameter selection after data centering is 1, and the fitting result according to this parameter is u = 0.0192, m = 0.0109, and the optimal parameter after normalization is 2 The fitting result is u = 0.0142 and m = 0.0081. Obviously, the accuracy of the fitting after the data is normalized is better.

CONCLUSIONS
In this paper, different GPS elevation fitting models based on linear regression theory are used to optimize the geoid-like surface and achieve high-precision conversion from high ground to normal high. Through the diagnostic statistic in the regression diagnosis, including the residual and cook statistic, the abnormal data in the control point can be accurately found, and the transcript is checked and eliminated, thereby reducing the influence on the fitting.
Box-Cox is a parameter transformation from a comprehensive perspective, which makes the error obey the normal distribution. By selecting the optimal parameter value by comparing the magnitude of the residual value, the complex nonlinear elevation anomaly problem is transformed into Linear relationship to deal with. This paper proves the feasibility and reliability of linear regression model in GPS elevation fitting with concrete examples, which can better reflect the trend and regularity, achieve effective GPS elevation fitting, and meet the accuracy requirements of general measurement in practical engineering.This has a good reference for the wide application of GPS elevation in practical engineering. However, the polynomial surface fitting model based on linear regression theory discussed in this paper mainly refers to the fitting application in a small range. For the measurement area with larger area and more complicated terrain conditions, the application and fitting effect of the model need Further Discussion.