Sampling Method Analysis and Quality Evaluation Strategy for Remote Sensing Big Data

Under the background of the increasingly unified management of natural resources, remote sensing big-data will become the main data source to support a number of major projects. How to sample the natural resources results efficiently and reliably in the process of quality evaluation is always a research hotspot when it comes to the natural resources results involving remote sensing big-data. A sequential quality evaluation model based on root mean square error (RMSprop) optimization algorithm is constructed by theoretical analysis with an numerical experiments to validate the effectiveness of this method.


INTRODUCTION
In the existing sampling process of Surveying and mapping product quality evaluation, the design goal of sampling is to minimize the overall risk. A risk minimization model is designed to extract a small number of samples to achieve the overall quality evaluation. This paper puts forward the idea of building a quality model of remote sensing large data results, and realizes the evaluation of the quality of remote sensing results by mining the data characteristics of the results. An important basis of this method is to verify the validity of multiple sequential sampling with small sample size. Therefore, this paper will mainly study the validity, advantages and experimental validation of this sampling method.
The production process of remote sensing data is generally stable. Assuming that the quality model of data results conforms to the basic independent and identical distribution, similar validity can be achieved by extracting small sample size data and increasing the number of sampling times. More sampling will produce more noise in quality evaluation results, so the RMSprop optimization algorithm which is suitable for this sampling method can largely suppress the noise caused by multi-batch small data sampling. To achieve effective evaluation through multiple sampling of small sample size and the algorithm adapted to the quality evaluation model, and at the same time to reduce the manual interpretation brought about by mass data quality inspection. (1), (2) The gradient of partial derivative of is obtained by calculating the loss function j( ), and the iteration of is achieved by negative gradient, and the objective function is fitted as shown in equation (3), (4). Since gradient descent requires calculation of each sample and each feature, under the assumption that there are m samples and n features, the computational cost of iteration is .
(3),(4) Stochastic gradient descent is an improved algorithm based on gradient descent. A new risk minimization function is constructed by rewriting the loss function. The loss function j( )can be rewritten to a single sample (5): The gradient updating of each can be rewritten to equation (6): Comparing the two optimization methods, the calculation amount of random gradient descent iteration is . When the data amount m is large, the random gradient descent has advantages in iteration speed and calculation amount. But because of the smaller sample size, the noise of random gradient descent is larger. In large data background, it can be overlapped at a faster speed. The random gradient descent is generally better than the calculation of all the data because there are more rounds.
In this paper, the idea of random gradient descent of small sample set is applied to the sampling process of quality is, a small sample size multiple sequential sampling evaluation method.

CHARACTERISTICS AND ADVANTAGES OF MULTIPLE SEQUENTIAL SAMPLING WITH SMALL SAMPLE SIZE
Sequential sampling adopts the strategy of uncertain population sampling quantity, and decides the next sampling mode according to the results of the preceding sampling.
Compared with the fixed sampling, the sequential sampling method can determine whether the sampling is adequate or not by combining the sampling process to obtain more stable results.
Under the background of remote sensing big data, the scale of data is getting larger and larger. Fixed sampling is facing unprecedented challenges. Because the large-scale projects that produce large-scale results data usually has long production cycle, many production units and complex quality characteristics, in the process of inspection for such results, in order to meet the needs of the project, the results inspection work will generally be carried out in stages, but because of many uncertainties in the actual project implementation process, the inspection work will be carried out in stages. Often occurs in the short term need for a large

Generating experimental data sets
The numerical simulation needs a high-dimensional array  The experimental results show that the smaller the sample The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International Conference on Geomatics in the Big Data Era (ICGBD), 15-17 November 2019, Guilin, Guangxi, China This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-3-W10-11-2020 | © Authors 2020. CC BY 4.0 License. size of single extraction, the greater the noise disturbance of the whole curve. The final results of the three experiments tend to converge, but the final accuracy of 1% sample size is 82.3%, the final accuracy of 2% sample size is 89.9%, which is lower than 92.3% of the 4% sample size. In other comparative experiments with less than 1% sample size, the final convergence will not be possible.

Experimental analysis
The experimental results show that after 1400 sampling times, this method effectively obtains four different variance gradients in highly coupled data by using 4% small sample size extraction method, and distinguishes the gradients of any sample with 92.3% accuracy. That is to say, it realizes the mining of batch sample features.
The comparative experimental results also confirm the limitations of the small sample size multiple sampling method, that is, with the reduction of sample size, there will be greater noise in the process of feature mining. But through the deep learning neural network method and root mean square error optimization algorithm adopted in this paper, the noise in the process of small batch sampling can be suppressed in an acceptable range and the data features can be effectively mined.
Because the feature data generated in this experiment is still far from the actual remote sensing data in complexity, there are still some practical problems in the application of large data for remote sensing, such as regularization method needed for data mining of remote sensing data features, minimum sample size, effective convergence rounds and product characteristics. Establishment of reliable models and parameters. These are all problems that need to be solved in practical engineering. However, numerical experiments in this paper have explored the feasibility of these methods in theoretical level and achieved expected results.

CONCLUSIONS
In the existing engineering practice, due to the small scale of data and the limitation of production mode, it is not yet mature to evaluate the quality of remote sensing data by