REVEALING SPATIAL VARIATION AND CORRELATION OF URBAN TRAVELS FROM BIG TRAJECTORY DATA

With the development of information and communication technology, spatial-temporal data that contain rich human mobility information are growing rapidly. However, the consistency of multi-mode human travel behind multi-source spatial-temporal data is not clear. To this aim, we utilized a week of taxies’ and buses’ GPS trajectory data and smart card data in Shenzhen, China to extract city-wide travel information of taxi, bus and metro and tested the correlation of multi-mode travel characteristics. Both the global correlation and local correlation of typical travel indicator were examined. The results show that: (1) Significant differences exist in of urban multi-mode travels. The correlation between bus travels and taxi travels, metro travel and taxi travels are globally low but locally high. (2) There are spatial differences of the correlation relationship between bus, metro and taxi travel. These findings help us understanding urban travels deeply therefore facilitate both the transport policy making and human-space interaction research.


INTRODUCTION
Human travel plays an important role in transportation planning, traffic management, and urban spatial structure analysis.Traditionally, human travel studies rely on questionnaire to acquire enough and accurate travel information, which usually require long time and large cost.For the sake of the development of spatial information technology, the acquisition of large-scale individual spatial-temporal data are constantly becoming reality.Alternative data sources, such as smart card card (SCD) (Zhou et al., 2017), mobile phone positioning data (Yue et al., 2016;TU et al., 2017), vehicle GPS data (TU et al., 2010;TU et al., 2016), even SNS check-in data from web crlawers has been emerging.Using such multiple sourced data to reveal human travel pattern has become a hot topic.Nowadays, there are lots of research using spatial-temporal data to analyze human travel.Researchs related to travel includes human mobility pattern analysis (Rhee et al., 2008, Wu et al., 2014), urban traffic flow simulation (Wu et al., 2012), critical transportation infrastructure (Fang et al., 2012), urban spatial structure analysis (Zhong et al., 2013), and urban land use (Liu et al., 2012).
The integration of information and communication technology (ICT) and spatial information technology helps to collect big trajectory data, which imply massive city-wide human travels.For example, TU et al (2012) extracted all taxi trajectories across the Yangtze River and evaluated the vulnerability of critical transportation service.Liu et al (2013) uncovered the association between human travel from taxi data and land use.These studies depended on one type taxi trajectories thus made an insufficient use of data.However, current human travel researchs just utilize one source of trajectory data.On the other hand, big trajectory data not only challenge the existing approaches using single data * Corresponding author but also bring new research opportunities (Kang et al., 2013).One valuable question is whether urban travels from multi sourced trajectories are consistent.The answer of this question will provide us with a comprehensive understanding of urban travel.Recently, Cao et al (2016) investigated the relationship between taxi travel and human travel from mobile phone positing data.However, the relationship between each mode, like buses, metro, and taxi are not investigated.
This study aims to reveal the spatial pattern and the correlation of urban travel from multi-sourced trajectory data.By fusing with bus GPS trajectory, smart card data, and metro GIS data, bus travel (B) or metro travel (M) of an individual were extracted.Taxi travel (T) were summarized from massive taxi GPS trajectories.Then, a correlation theory based evaluation framework was developed to investigate the consistent relationship between them.The results indicate that there are spatial differentiation of the correlation relationship between bus, metro and taxi travel.It provides valuable insights on the trajectory driven urban research and applications.

Study Area
The study was conducted in Shenzhen, China.It includes six administrative regions including Futian, Luohu, Nanshan, Yantian District, Baoan, Longgang, and four functional areas containing Guangming, Longhua, Pingshan, Dapeng (Figure 1).The transportation commission of Shenzhen Municipality (TCSM) provides bus, metro, taxi service for people.Up to Year 2014, Shenzhen metro system has five subway lines and 118 stations (including the 13 transferring stations) with the total length of about 178 km, covering six districts, including Luohu, Futian, Nanshan, Baoan, Longgang and Longhua.The bus system has 874 bus lines and 5265 bus stations.The taxi system has 16135 taxi cruising on the road to serve people's travel demand.According to the annual report of TCSM, the three transport systems complete more than 50% travels in the city averagely.

Dataset
The used dataset includes taxi GPS trajectories, bus GPS trajectories and smart card data.All of them were collected by TSCM during the period from September 24 to 30 in 2014.The taxi GPS trajectories record the locations of all taxis and their service statuses with a sampling interval of about 40 seconds.The bus GPS trajectory stores all buses' locations with a sampling interval of about 40 seconds.The smart card data indicates when an individual gets in or gets out a bus or a metro station.Totally, there are 9107567 taxi GPS records, 10757142 bus GPS records, and 2466341 smart card records.
Using these massive multiple sourced trajectory data, we extracted an individual travel by using one or two type of trajectory data.Each travel is a 7-tuple as equation (1).
T= (id, xo, yo, to, xd, yd, td) (1) Where id = the record id, xo = the longitude of the origin place of a travel, yo = the latitude of the origin place of a travel, to = the departure time of a travel, xd = the longitude of the destination place of a travel, yd = the latitude of the destination place of a travel, td = the arrival time of a travel.
The taxi travel are directly inferred from the taxi GPS data according to the changes of the service statuses.The metro travel are retrieved from smart card data.The bus travel is much difficult as the smart card data only contain the swapping time, instead of the location.By fusing bus trajectory and the swapping time, the real boarding bus station are interpolated.All in all, we got a week of taxi, metro, and bus travel in the Shenzhen city.Table 1 summarizes the total count of three ridership.It shows that there are 2.82 million taxi travel records, 4332914 bus travel records and 9.84 million metro travel records in the studied week.The scale will affect the reliability of spatial analysis result.
Without loss of generality, we used traffic analysis zone (TAZ) as the basic spatial unit, which is composed of the adjacent land parcels with similar nature.TAZ can better reflect the spatial distribution of traffic travels.The Shenzhen are divided into 491 TAZs.

Travel indicators
According to previous literature (Deng et al., 2000;Yao et al., 2006;Huang et al., 2008;Chen et al., 2009;Jang et al., 2012), transportation domain usually takes travel volume, travel distance, travel time, travel direction, travel purpose as indicators to describe popular human travel.Considering the extracted travel information, we chose four typical indicators to investigate the relationship between taxi, metro, and bus travels, including travel volume, travel time, and travel distance.Using the got travel records in Section 2.2, we summarized hourly travel indicators of taxi, bus, and metro respectively.In other words, TAZ based time series taxi, bus, and metro information were gathered for the furthre correlation analysis.

METHODOLOGY
The main idea of this study is to investigate the relationship of the multi-mode urban using the Pearson correlation analysis approach, that is, the correlation level of two mode human travels are measured.Furthermore, considering the spatial effect, both global and local correlation analysis are conducted.

Correlation analysis
Correlation analysis is a statistical method to study the relationship between random variables (Li et al., 2012).It refers to refers to the extent to which two variables have a linear relationship with each other.The correlation can be expressed by the correlation coefficient r, which is in the range of [-1, 1].The correlation coefficient value within the range (0, 1] indicate a positive correlation.
There are many approaches to measure correlation level, such as Pearson correlation product, Spearman's rank correlation, and Kendall tau rank correlation.The Pearson correlation coefficient is widely used measure of the degree of correlation between two variables.The specific formula is as follows: Where x = a random variable, y = a random variable, ̅ = the expected value of the random variable x,  ̅ = the expected value of the random variable y, n = the number of samples.
In this study, by assuming the independence of travel indicators, the random variables are the selected travel indicators in Section 2.3.

Global correlation
Considering the different to the difference of variables, the global correlation is divided into two categories: (1) Spatial correlation using the travel indicator statistics at each TAZ as variables; (2) Temporal correlation using the travel indicator statistics at a hour as variables.
The measure of global correlation is based on the travel volume (tv), travel time (tt), travel (td), that is,   = (, , ) as travel indicators.Given two travel modes, the correlation coefficient   , δ  , δ  of each indicator is firstly calculated with the Pearson correlation approach.Therefore, the correlations of travel indicators of bus, metro and taxi are as follows: Where B = bus , M = metro, T = taxi.
The elements on the diagonal of this matrix are 1.The elements on the non-diagonal are calculated using the Pearson correlation coefficient method.It should be noted that, assuming that the travel indicators have the same effect on travel, the products of each travel indicator' correlation coefficient can be equally weighted summed to get the global correlation coefficient matrix   −−

2
, which describes the global correlation relationship between bus, metro and taxi travel.
Where s = the selected travel indicator (tt, td, tv)

Local correlation
Local correlation focuses on the travel at TAZ scale.The local correlation calculation makes use of five travel indicators, including travel time (tt), travel distance (td), travel volume of origins (tvo), travel volume of destinations (tvd), and travel directions(tdr), where ,  are the same as above section.The additional tvo, tvd, tdr describe the travel in/out characteristics of a TAZ.
For each selected travel indicator, the correlation coefficient of two travel modes in a TAZ is firstly calculated.Following the philosophy of above section, correlation matrices of a TAZ is formatted by the equation ( 5).Also, the diagonal element is 1.The non-diagonal element is calculated using a travel indicator of two travel modes.Finally, the correlation matrix   −− 2 of the bus, metro and taxi travel in a TAZ is obtained by equation ( 6).(5)

𝑟 𝑖 𝑠
Where B = bus , M = metro, T = taxi.i = the index of a TAZ, s = the selected travel indicator.

Significance test
In order to evaluate the correlation level, it is necessary to test the significance of the correlation.When the sample size is bigger than 20, the t statistic can be used to test the significance of the correlation coefficient (Box, 1987).The t statistic is formatted as equation ( 7).
Where  = the Pearson coefficient  = sample amount In the study, regarding temporal correlation and local correlation, the sample number of each travel indicator are 24, then the degree of freedom df = n-2 = 22.According to the significance of the correlation coefficient, we know that at the 0.05 significance level, the threshold value  0.05 = 0.40.With respect to spatial correlation, the sample number is equal to the count of TAZs so the degree of freedom df = 491-2= 489.At the 0.05 significance level, the threshold value  0.05 =0.09.

Global correlation results
From the spatial perspective, in general, the bus, metro and taxi travel distribution are different.Totally, the spatial correlation coefficients of bus-metro, bus -taxi and metro -taxi travel are 0.27, 0.06, 0.09 respectively.Figure 3 displays the temporal distribution of travel indicator.
From the temporal perspective, the correlation coefficients are higher correlated than spatial correlation coefficients.The Pearson correlation coefficients for metro and bus, taxi and bus, taxi and metro, are 0.96, 0.27, and 0.29 respectively.But the coefficients of taxi and bus, taxi and metro is not significant.The correlation coefficients of each travel indicator between bus, metro and taxi are presented in Table 6.This result indicates the correlations of travel time for bus-taxi and metro-taxi are not significant.All three travel indicators of bus and metro are significantly correlated.

Local correlation results
The results show that the local correlations of the multi -mode travels are different from the global correlations.The TAZ scale correlation grades are different.As Table 7 shows, 86 TAZs are significantly correlated for bus and metro.Regarding bus and taxi, only 12 TAZs are significant correlated.However, no TAZs are with significant relationship between metro and taxi.

CONCLUSION
In the era of big data, current studies using single source of spatial-temporal trajectory to investigate urban travel pattern face great challenges.Understanding the relationship of multi-mode urban travel is an essential to deepen advance studies.In this paper, we presented a data-driven framework to test the correlation level between bus, metro and taxi travel.Global and local correlation were explored.The study shows that there are significant differences of different travel modes, although there are some place with significant correlation coefficients.Detailed conclusions are listed as below: (1) There are significant differences of different travel modes.
The global correlation of bus-taxi and metro-taxi are low, while the temporal global correlation of bus-metro is high but the spatial global correlation is low.The temporal global correlation coefficients and the spatial global correlation coefficients of bustaxi, metro-taxi and bus-metro are 0.25, 0.28, 0.96 and 0.09, 0.11, 0.27 respectively.
(2) The correlation level of bus-metro, bus-taxi and metro-taxi are different in different TAZs.The areas with high correlation coefficients of metro-taxi and metro-bus is along the metro lines.
The areas with high correlation coefficients of bus-taxi are scattered throughout the city.

Figure
Figure 1.Study area Figure 2 displays the workflow.Using spatial or temporal travel indicators, both spatial and temporal correlation level are tested.Then, using TAZ based travel indicators, local correlation is tested.Following them, significance level is tested.

Figure 3
Figure 3 displays the distribution of local correlation coefficients.This result indicates following suggestions: (1) The TAZs with high correlation coefficients of bus and metro appear along metro line, most of which are in line 1 and line 3.It is because that bus and metro cooperate to serve daily commuting travel.Both them are mainly driven by home-working cyclical rhythm.(2) Regarding the relationship of metro and taxi, bus and taxi, there are a few TAZs with high correlation values, although the global correlation is low as Section 4.1.(3) Most of highly correlated TAZs appear at the city center, Futian and Luohu, as Figure 5 shows.TAZ  ≠  || >  . B-M 491 87 86 B-T 491 465 12 M-T 491 87 0 Table 7.The summary of local correlation coefficients

Table 1 .
Summary of daily public ridership in Shenzhen, China

Table 4 .
Table 3 reports the correlation coefficients of two travel models.It suggests that bus and metro, metro and taxi are significant correlated at level 0.05, but they fall into a weak correlation.Metro and taxi are not significant correlated.Table 4 gives the Pearson correlation coefficients of two travel modes in for each indicator.The travel indicator level correlation results indicates that, comparing to travel time and travel volume, bus, metro, and taxi travel are is weaker correlated in travel distance.Spatial correlation coefficients of each travel indicator (B=bus, M= metro, T= taxi)

Table 6 .
Temporal correlation coefficients of each travel indicator (B=bus, M= metro, T= taxi)