RESEARCH ON GIS MASSIVE TRAFFIC DATA ANALYSIS PLATFORM BASED ON HADOOP

: In view of the limitations of storage and calculation of mass traffic data in traditional GIS platform, this paper uses efficient and scientific technical means to analyze the data, and proposes a Hadoop-based GIS mass traffic data analysis platform. The platform uses MapReduce as a distributed computing programming model to analyze massive data for urban traffic decision-making, and uses HDFS distributed file storage framework to store and manage massive traffic data at TB level or even PB level. Finally, the results are displayed by using geographic information system spatial visualization technology, and the impact of the data volume and the number of nodes in the cluster on the calculation time-consuming is analyzed and compared. The experimental results show that the use of distributed multi-node cluster can effectively improve the storage and computing efficiency of massive traffic data, and greatly accelerate the total task scheduling time.


INTRODUCTION
With the rapid development of economy, people's quality of life has been improved, and urban car ownership has increased rapidly.The emergence of GIS, GPS and other spatial information technology can monitor the real-time operation process of the traffic system, and provide drivers with accurate road operation conditions and the best driving route. Traffic text, image, video and other data collected by various types of vehicle networks have an unprecedented explosive growth, and these data gradually show the characteristics of large amount of data, multiple data types, low value density, fast processing speed and increased complexity, namely "4V+1C" [1] . Due to the large amount of urban traffic data and high real-time performance, the current intelligent transportation system extracts traffic data in real-time through the card, GIS and other equipment, but the traditional data storage and processing technology no longer meets its needs [2] . Therefore, how to make the data structure and storage capacity flexibly expand, real-time accurate and efficient access to, upload, aggregate and store traffic data has become a major problem. Big data technology represented by Hadoop can provide more accurate data analysis results to realize real-time storage and calculation of massive traffic flow data.
At first, the most commonly used traffic flow theoretical models are car-following theory [3] , vehicle kinematics model [4] , cellular automata model [5] , etc. Documents [6][7][8] elaborate traffic flow data mining algorithm more completely. Traffic flow data mining algorithm is mainly divided into frequent pattern mining algorithm, clustering analysis algorithm and classification analysis algorithm. Literature [9] proposes a method to integrate dynamic traffic conditions (DRC) such as traffic accidents into passenger sharing system to avoid unexpected delays caused by re-planning routes. Jiang Z et al . [10] proposed a Vehicle Cloud Computing (VCC) system. Because of the uncertainty of vehicle motion, task replication method was used to obtain the optimal strategy through value iteration in this system.
Because of Hadoop's powerful ability of storage and parallel computing, many scholars at home and abroad use Hadoop to realize the information mining and analysis of massive data. In order to deal with the imbalance of large data, López V et al. proposed a classification system algorithm based on fuzzy rules.
The algorithm uses MapReduce framework to allocate the calculation operations of the fuzzy model. At the same time, cost-sensitive learning technology is added to the design to deal with the uncertainty introduced in the large amount of data. Ignore the learning of undervalued classes [11] . Sheng Zihao[ 12] aimed at the frequent and occasional traffic congestion problems, the cross-validation method was used to input information flow into vector classifier, and then the statistical method was used to distinguish frequent or occasional traffic congestion. Literature [13] proposes a large data mining method for gliding trajectory based on MapReduce. Combining MapReduce, a distributed computing framework based on Hadoop platform, with mining algorithm, the trajectory characteristics of taxis are extracted and analyzed. Chen Xiaobo et al. [14] proposed a Least Square Support Vector Regression (LSSVR) model based on sparse hybrid genetic algorithm optimization to predict short-term traffic flow.
Many traditional GIS traffic management and analysis systems at home and abroad are unable to store and calculate massive data in data centers effectively [15][16][17] . Therefore, this paper combines Hadoop's distributed computing and storage, geographic information system and database technology, proposes a Hadoop-based GIS massive traffic data analysis platform, which integrates collection, screening, storage, display and other functions, and will display the data on the browser side.

MAPREDUCE WORKFLOW IN HADOOP
MapReduce model based on distributed environment has considerable advantages in parallel processing. It not only considers how to schedule work mechanism to achieve load balancing, but also considers how to ensure smooth communication between data. Users do not need to consider how to implement MapReduce when coding. They only need The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International Conference on Geomatics in the Big Data Era (ICGBD), 15-17 November 2019, Guilin, Guangxi, China to understand Map mechanism which can divide data sets and Reduce mechanism which aggregates results.
Therefore, the data file submitted by the user must be able to split into many blocks and complete its own computing tasks, as shown in Figure 1. The system is based on Hadoop system, including physical layer, resource pool, data acquisition layer, cloud storage and distributed computing layer and application interface layer. The whole platform structure is shown in Figure 2.

Massive Traffic Data Analysis Platform Based on Hadoop
In this paper, the overall design is mainly based on MVC [19]   (1) The corresponding format of data information to be processed is converted to the data processing format that Hadoop platform can support, and then the file reading class is called to read the data in batches.

Analysis Platform Based on Hadoop
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International