MONITORING FARMLAND LOSS CAUSED BY URBANIZATION IN BEIJING FROM MODIS TIME SERIES USING HIERARCHICAL HIDDEN MARKOV MODEL

In this study, we proposed a method to map urban encroachment onto farmland using satellite image time series (SITS) based on the hierarchical hidden Markov model (HHMM). In this method, the farmland change process is decomposed into three hierarchical levels, i.e., the land cover level, the vegetation phenology level, and the SITS level. Then a three-level HHMM is constructed to model the multi-level semantic structure of farmland change process. Once the HHMM is established, a change from farmland to built-up could be detected by inferring the underlying state sequence that is most likely to generate the input time series. The performance of the method is evaluated on MODIS time series in Beijing. Results on both simulated and real datasets demonstrate that our method improves the change detection accuracy compared with the HMM-based method. * Corresponding author


INTRODUCTION
Urbanization has brought convenience to human living conditions.At the same time, it also leads to various environmental problems such as air pollution, water contamination, and local climate change, etc (Qiao et al., 2013).China has experienced rapid industrialization and urbanization since late 1970s.According to the National Bureau of Statistics of China 2010, urban areas have increased from 17.6% to 46.6% during the period 1978 to 2009, mainly converted from agricultural land.Large amount of farmland loss due to urbanization not only poses a great threat to the security of food provision, but also lead to severe ecological system degradation (Song and Deng, 2014).Therefore, monitoring the conversion of farmland to built-up land is one of the most important issues of land use/land cover change (LUCC) research.
Satellite remote sensing can be a powerful tool for detecting land use/land cover change.Most prior studies used postclassification method to detect urbanization-induced farmland change with bi-temporal or multi-temporal satellite images (Tian et al., 2014, Kraemer et al., 2015).However, this method has its limitations.On one hand, the accuracy of change detection is totally dependent on the accuracy of individual classification processes.The classification errors in the initial classification phase are compounded, leading to an unsatisfied post-classification comparison result (Ridd et al., 1998).On the other hand, it is not easy to discriminate true changes from seasonal variations in the vegetation (Coppin et al., 2004).The uncertainty of spectral characteristics due to various crop types and planting patterns makes it even harder to apply a uniform classification method to extract farmland.
Satellite image time series (SITS) record the continuous change process of the earth's surface, which have been widely applied to monitor urban land cover conversions (Xue et al., 2014, Guo and Gong, 2016, Song et al., 2016).Through spectral-temporal trajectory analysis, real land cover changes can be separated from other seasonal changes.In a previous study, we proposed a continuous change detection and classification algorithm based on SITS using hidden Markov model (HMM), named HCCDC (Yuan et al., 2015).It demonstrated that the HMM is a powerful tool to distinguish changes in time series of different land cover types.In this study, a novel method is developed that uses a hierarchical hidden Markov model (HHMM) to identify areas of farmland occupied by built-up land.HHMM is a generalization of the HMM which provides a way to model complex multilevel sequences.It can model multi-level structures that naturally exist in many domains, such as speech, text, and handwriting, and has been extensively studied in those fields.However, it is rarely used to model the land use/land cover change process.Specifically, the objectives of this study are (1) to establish an HHMM that describes the farmland change process and (2) to derive position and timing of farmland change at a per-pixel basis by model inference, and (3) to apply this method in monitoring urbanization-induced farmland loss from MODIS time series.

STUDY AREA AND DATA
As a case study, the proposed method is applied to monitoring urban encroachment onto farmland in Beijing using 10-year MODIS time series for 2001 to 2010.As the capital city of China, Beijing has undergone fast urbanization in recent decades, where agricultural land has been declining dramatically every year, mostly converting to urban or related uses (Tian et al., 2014).
A total of 230 MODIS 16-day composite 250m (MOD13Q1) images are used for experiments.All the spectral reflectance bands (blue/red/NIR/MIR) are adopted.The long time span, frequent revisit time, high data quality, and simple data preprocessing make MODIS imagery suitable for long-term land cover change detection.Although the spatial-resolution of MODIS images is relatively low, they are able to validate the methodology.
MODIS images are pre-processed to remove cloud and snow pixels with the VI Quality band, then Fourier regression fitting is implemented on each spectral band to reconstruct highquality time series datasets.ESA Global Land Cover (GlobCover) map version 2.3 for 2009 (Bontemps et al., 2015) is used as ancillary data for getting training samples (Figure 1).Assuming no farmland change has occurred in 2009, we use the annual time series as training samples for farmland and built-up.
Figure 1.GlobCover2009 map in the study area.

Hierarchical Hidden Markov Model
The HHMM, first introduced by Fine et al (1998), is derived from the HMM and provides better modelling of domains with hierarchical structures.The HHMM generalize the standard HMM to include a hierarchy of hidden states.An HHMM consists of two kinds of states: internal states and production states.Each internal state has its own sub-states.An internal state in the high-level recursively activates one of its sub-states until a production state in the lowest level is reached.A transition between high level states is activated only when the lower level model reaches the final state.Observations are only emitted by the production states.
The HHMM is able to correlate structures occurring relatively far apart in observation sequences, while maintaining the simplicity and computational tractability of simple Markov processes.In addition, it is capable of handling statistical inhomogeneity commonly exist in many complex time series datasets (Fine, et al., 1998).

Semantic Analysis of Farmland Change Process
Different land cover types have inherent phenological patterns, which cause distinct seasonal variations in the spectral-temporal trajectory of a pixel.According to this idea, we decompose the farmland change process into three hierarchical levels, as shown in Figure 2.Here only the change from farmland to built-up land is considered, but this is not the limitation of the algorithm itself.
The top level is the land cover level, which shows annual land cover change, namely, unchanged farmland, farmland-to-builtup, and unchanged built-up.The dotted lines in Fig. 1(a) indicate valid change directions (a change from built-up to farmland will not tend to occur).The second level is the vegetation phenology level that describes phenological phases of vegetation growth for each land cover type.It should be noted that due to the existing of different kinds of vegetation within each land cover type, the actual intra-annual phenological changes of farmland and built-up are diverse.The bottom level is the SITS level denoting the temporal evolution of the spectral reflectance or spectral index.During each phenological phases, time series can be considered as a stable signal.
Figure 2.An illustration of the farmland change process.

HHMM structure Definition
A three-level HHMM is proposed to model the multi-level semantic structure of farmland change process.Figure 3 illustrates the model structure of the proposed HHMM.
Level 1 is the root state 0 q that initiates a new stochastic sequence generation.Level 2 (the top HMM) and Level 3 (the bottom HMM) represent the land cover level and vegetation phenology level, respectively.The top HMM has two internal states, i.e., 1 1 q and 1 2 q , corresponding to farmland and built-up, respectively.The root state always initiates 1 1 q because the initial land use type of the input time series is farmland.The arrow lines in the top HMM represent inter-annual land cover changes.Once the state in the top HMM enters 1 2 q , it will stay in 1 2 q and never return to 1 1 q .The states in the bottom HMM (except for the final state) are implicitly related to crude phenological phases of farmland or built-up.They are production states and emit observations of the satellite images.The final state 2 e q controls the termination of the stochastic state activation process.
The same number of states is set for both bottom HMMs without loss of generality.A left-right model topology with no skip path is chosen to accommodate the intra-annual phenological variations.The probability of observations emitted from the state in the bottom HMM is modelled with a Gaussian mixture model (GMM), which has been proven to closely approximate any probability density functions (PDFs) using a mixture of a finite number of Gaussian components.

HHMM Parameter Learning and Inference
In addition to the model structure, an HHMM is characterized by the state transition probability between the internal states and the output probability distribution of the production states.
According to the model definition, the vertical transition probabilities have been determined.In this study, the undetermined parameters of the HHMM are derived through the following three steps.
1) Train two individual HMMs by Baum-Welch algorithm for farmland and built-up separately.Then the parameters of all the bottom HMMs are obtained.
2) Estimate the transition probability from farmland to built-up according to the statistical data of the study area.In order to reduce omission error, this parameter is recommended to set higher than the estimate value.Then the transition matrix of the top HMM is acquired.
3) Assign the corresponding parameters to the HHMM.
Once the HHMM is established, we can detect farmland change by inferring the hidden state sequence in the top HMM that is most likely to generate the input SITS.First, the time series of a pixel is input into the HHMM.Then the most probable state in the top HMM at each time step is estimated by the generalized Viterbi algorithm.If it converts from 1 1 q to 1 2 q , the pixel is considered to have changed from farmland to built-up.The year in which the state converts is the change year detected.

Experiment on Simulation Dataset
For method validation, simulation data are adopted.In this experiment, the number of states in the bottom HMM is set to 5, and the number of Gaussian mixture components (GMCs) is set to 10.
The simulation dataset includes 1000 simulated changed and unchanged time series, separately.To generate a changed time series, one sample each from the training sets of farmland and built-up are chosen randomly at a time, and then they are connected end to end to construct a two-year time series.In the same way, unchanged farmland samples are generated as well.Experimental results on the dataset indicate a 95.9% change detection rate with a 4.4% false alarm rate.

Experiment on Real Dataset
The performance of the proposed method is evaluated on a validation set for different model structures.The validation set is composed of a total of 500 pixels, in which 250 pixels are farmland areas that are persistent throughout the period, and 250 pixels convert from farmland to built-up.The test pixels are collected by visual interpretation from Google Earth highresolution images (Figure 4).This validation set has been used in our previous study (Yuan et al., 2015) and the results for both algorithms are compared.The performance of HHMM using different numbers of states and GMCs in the bottom HMM is evaluated.As shown in Figure 5, the change detection rates in all the cases are higher than 92.4% with the false alarm rates lower than 3.6%.The best result is obtained using 8 states and 30 GMCs, with a change detection rate of 98.4% and 4.4% false alarm rate.In comparison with the HCCDC algorithm, of which the change detection rate is 94.80% with a false alarm rate of 0.40%, the proposed method improves the change detection rate in charge for a small increase in the false alarm rate.The confusion matrices for HHMM and HCCDC are shown in Table 1 and Table 2 respectively.The overall accuracy of the proposed method is also slightly better than the HCCDC algorithm.
In the HCCDC algorithm, the commission errors mainly come from the switching of crop types.In comparison, this method is more robust to interannual vegetation changes within farmland.Figure 6 illustrates an unchanged farmland time series and its corresponding state sequence inferred by the HHMM.Though the crop type of the pixel has changed in 2008, it is always identified as farmland in the HHMM (with different state durations in the bottom HMM).This might be due to fact that the method uses the time series of various crops to train a multimodel HMM, in order to model complex phenological changes in farmland.At the same time, the method can make full use of the prior knowledge of farmland change process the as well as the information of the entire time series to detection changes.
The method is applied to detect farmland changes in Beijing during the time of analysis.5 states and 10 GMCs are used in the experiment.Experimental results on the simulated and real data demonstrate good performance of the proposed method.

CONCLUSION
In this study, we proposed a novel method to monitor urban encroachment onto farmland using SITS.In this method, the farmland change process is modelled by a three-level HHMM.
A change from farmland to built-up is detected by inferring the underlying state sequence that is most likely to generate the input SITS.
The performance of the method is evaluated on both simulated and real changed MODIS time series.The experimental result demonstrated that our method outperforms the HMM-based method with a lower change detection rate.It is because our method is more robust to interannual vegetation changes within farmland.
However, this study also has limitation.Conversions from farmland to other land use types are ignored in this method, in order to simplify the model design.As further research, more land use types are going to be integrated in the model.

Figure 3 .
Figure 3.The HHMM of depth 3 constructed for the farmland change process.

Figure 4 .
Figure 4. MODIS time series of a changed pixel and its corresponding area on the Google Earth imagery.
Figure 7 shows the change detection results in Changping and Shunyi districts in Beijing.Cropland areas were changed dramatically in those regions.In order to highlight the changed pixels (shown in yellow), they are overlapped with a Landsat TM image acquired in August 8, 2010.Most of the changed pixels are detected from the visual point of view.The commission errors are mostly due to conversions from farmland to non-built-up land, such as forest.This is because our model simplifies the process of farmland change.

Figure 5 .
Figure 5. Performance comparison between models trained with different number of states and GMCs.

Figure 6 .
Figure 6.A farmland time series and its corresponding state sequence in the bottom HMM.

Figure 7 .
Figure 7. Chang detection results in Changping and Shunyi districts in Beijing.

Table 1 .
The confusion matrix for HHMM with 8 states and 30 GMCs.

Table 2 .
The confusion matrix for HCCDC.