RESEARCH ON THE CONSTRUCTION OF REMOTE SENSING AUTOMATIC INTERPRETATION SYMBOL BIG DATA

Remote sensing automatic interpretation symbol (RSAIS) is an inexpensive and fast method in providing precise in-situ information for image interpretation and accuracy. This study designed a scientific and precise RSAIS data characterization method, as well as a distributed and cloud architecture massive data storage method. Additionally, it introduced an offline and online data update mode and a dynamic data evaluation mechanism, with the aim to create an efficient approach for RSAIS big data construction. Finally, a national RSAIS database with more than 3 million samples covering 86 land types was constructed during 2013-2015 based on the National Geographic Conditions Monitoring Project of China and then annually updated since the 2016 period. The RSAIS big data has proven to be a good method for large scale image interpretation and field validation. It is also notable that it has the potential to solve image automatic interpretation with the assistance of deep learning technology in the remote sensing big data era. * Corresponding author. Address:28 Lianhuachi road, Beijing, China. E-mail address: gaoyin@nsdi.gov.cn (Gao Yin)


INTRODUCTION
With the advent of the earth observation big data era, various fields such as ecological evaluation and land regulation have experienced a real demand for rapid remote sensing interpretation and field validation.Traditional field investigation has poor timeliness and high labor costs, making it difficult to implement wide range application and real time monitoring.
Remote sensing automatic interpretation symbol (RSAIS) is a ground photo with some helpful attributes that is able to provide precise field situations for image interpretation as well as accuracy validation.Recently, Peng Gong et al (2015) proposed a global wetlands validation sample dataset based on the Web of Science database.Additionally, they also created 3682 sample points of the Google Earth image, and called for building global wetland information sharing as well as a volunteer service platform.Jun Chen et al (2015) developed the world's first 30 meters land cover products known as GlobeLand30 and applied worldwide ground verification and accuracy evaluation of crowdsourcing.Those studies have vigorously promoted the process of RSAIS research, but the standardized and generalized theories and techniques for large scale and dynamic application has been lacking.
Based on the national geographical conditions monitoring mission of China, this study aims to propose a standard and streamlined RSAIS representation method.Additionally, it aims to develop a complete technical platform for massive data storage, update and quality optimization, and prepare its implementation for the Chinese RSAIS big data construction.

RSAIS data characterization method design
The traditional interpretation symbol usually possesses less attributes and is difficult to be applied in spatial region matching and automatic image interpretation.To meet the requirements of precise location matching, automatic interpretation and large scale application, a characterization system with refined attributes was designed, and the detail attributes are indicated in Table 1.The schematic diagram of ground geometric record and region matching are depicted in Figure 1.This method has met the requirements from two aspects: (1) reflects land cover with accurate geometric identity, (2) can be easily recorded and has potential for large scale automatic acquisition.

Attribute
Content and acquisition method photo in-site photo of target Throughout the process of actual data acquisition, the key criterion is to make the RSAIS more typical to represent local land cover characters.All the RSAIS data should be able to represent the characters of its land cover, and comprehensively reflect the general characteristics of the regional land cover types.In a certain area where the distribution of RSAIS data and real land cover are relatively consistent, the total count of acquired data should represent the overall characteristics of the region.This is not to mention that distribution should closely align with the trend of the land cover type distribution.

Distributed and cloud data storage architecture
For massive data storage, the latest elastic architecture consisting of distributed database and cloud storage method was designed.All RSAIS photos are stored in the distributed cloud storage file system, and attributed information are formatted and stored in the distributed database.The photo and attribute of each RSAIS is associated with a unique identifier.All files are considered as redundant backup storage based on cluster, which could support elastic expansion and concurrent access.The subdivision grid spatial index of GeoSOT is applied for high efficient spatial retrieval.

An offline and online update mode
An offline and online update mode is implemented for the data flexible update.In some national or regional territories or resources investigation projects, the RSAIS can be updated with offline batch update mode, which is conveniently made for large-scale as well as all other types of productions and updates.
In addition, a web based crowdsourcing update method is adopted for global, dynamic, and continuous updates.With sufficient volunteers, the RSAIS data of most regions and all land types can be updated with an acceptable frequency, which will automatically promote RSAIS big data.

A dynamic evaluation mechanism
A dynamic evaluation mechanism is applied as a means to improve the quality of while also promoting the value of RSAIS big data.Users are allowed to tag RSAIS online and openly score the RSAIS they used from several key aspects, such as correctness, typicality, photo quality, attribute quality and comprehensive evaluation.The evaluation data will build an open RSAIS evaluate library on the server side.The scores or tags of the updated RSAIS will be updated regularly, and RSAIS data with poor quality and accuracy will be reflected in a lower priority.

Workflow of RSAIS big data construction in China
The National geographic conditions monitoring is an important new mission of China and it is vital work in obtaining access to fundamental natural, ecological and human activities information of land surface.The basic goal is to acquire annual and nationwide land cover datasets from remote sensing images with resolution that is greater than 1 meter.Table 2: Sample counts of China RSAIS samples across 31 provinces

Massive RSAIS data display in two-dimensional and three-dimensional system
In order to display an accurate representation of the massive RSAIS data, both two-dimensional and three-dimensional display systems based on distributed parallel scheduling technology were built.The spatial distribution of some RSAIS data and photo samples in the two-dimensional display system are shown in Figure 3, and some other RSAIS data in threedimensional display system are shown in Figure 3.

Potential value of RSAIS big data
In the era of remote sensing big data, the RSAIS big data provides prior field knowledge support for image automatic interpretation, and possesses the basic potential for breaking through the worldwide image automatic interpretation problem from the mechanism.In addition, a number of case studies are conducted to provide scientific reference for expanding the application scenario of the RSAIS big data.It is possible to unearth a large amount of valuable information using big data calculation and analysis.
Reflect the links between land cover types and the ground field terrain feature: In the absence of field work, the RSAIS data can help to accurately distinguish some confusing land cover types from image.
Reflect the land cover types of similar geographical environment area: In the adjacent area or similar geographical areas where it is difficult to reach the field work, image interpretation can be implemented based on the similar spectral, texture and geographic correlation analysis.
Reflect the land spatial distribution and diversity characteristics: Based on spatial analysis and statistical calculation, it is possible to obtain the spatial distribution and density characteristics of some land covers between different administrative divisions all over the country.

Reflect the spatial-temporal characteristics of specific vegetation:
The RSAIS data is captured from different areas and times, and it is easy to analyze the morphological and image features of a specific land type such as broadleaf arbor in different seasons.

CONCLUSION
The RSAIS representation, storage, update and optimization techniques that are adopted through the practices in China have shown an efficient and scientific method for RSAIS big data construction.The dynamic updating RSAIS data of China is considered to hold important knowledge of land covers and supports an automatic image interpretation method that was developed based on multi-scale segmentation and deep learning classification.In our following work, prior national wide knowledge reflects potential in being able to solve image automatic interpretation with the assistance of deep learning technology.
Along with the national wide monitoring project conducted on a yearly basis, the RSAIS big data will gradually evolve into a spatial-temporal field prior knowledge big data, which contains a lot of field land covering knowledge across the country.This will also lay the foundation for the natural resources survey as well as for the geographical environment analysis and information mining.

Figure 3 .
Figure 3.The spatial distribution of the RSAIS data and photo sample in two-dimensional display system

Figure 5 .
Figure 5.The interface of the crowdsourcing update and open evaluation platform