MULTITEMPORAL CROP TYPE CLASSIFICATION USING CONDITIONAL RANDOM FIELDS AND RAPIDEYE DATA

The task of crop type classification with multitemporal imagery is nowadays often done applying classifiers that are originally developed for single images like support vector machines (SVM). These approaches do not model temporal dependencies in an explicit way. Existing approaches that make use of temporal dependencies are in most cases quite simple and based on rules. Approaches that integrate temporal dependencies to statistical models are very rare and at an early stage of development. Here our approach CRFmulti, based on conditional random fields (CRF), should make a contribution. Conditional random fields consider context knowledge among neighboring primitives in the same way as Markov random fields (MRF) do. Furthermore conditional random fields handle the feature vectors of the neighboring primitives and not only the class labels. Additional to taking spatial context into account, we present an approach for multitemporal data processing where a temporal association potential has been integrated to the common CRF approach to model temporal dependencies. The classification works on pixel‐level using spectral image features, whereas all available single images are taken separately. For our experiments a high resolution RapidEye satellite data set of 2010 consisting of 4 images made during the whole vegetation period from April to October is taken. Six crop type categories are distinguished, namely grassland, corn, winter crop, rapeseed, root crops and other crops. To evaluate the potential of the new conditional random field approach the classification result is compared to a manual reference on pixel‐ and on object‐level. Additional a SVM approach is applied under the same conditions and should serve as a benchmark. * Corresponding author.


INTRODUCTION
In 2008 the German RapidEye system in its constellation of five satellites launched.Each satellite has five spectral bands (blue, green, red, red edge and near infrared) with a GSD of 6.5 m.This system and many other recently launched high resolution optical remote sensing satellites allow a higher availability of multitemporal image data.These data can be used for enhancing the classification accuracy and for analyzing land cover changes.While in the latter case, one has to deal with potential class transitions, in the first case the class remains unchanged.Nevertheless the appearance of the individual classes at several epochs might change and that can be used to achieve higher classification accuracy.
In this work we present two CRF-based approaches for multitemporal crop type classification of high resolution optical remote sensing data.The first one, called CRF all , is a common CRF approach with a 2D-grid graph structure.For this approach, all extracted features of all images are concatenated in one feature vector per pixel.The second approach, CRF multi , is based on an extension of the CRF concept by an additional temporal interaction potential.Here each pixel of each image is represented as one node in a 3D-grid.The decision for a pixel belonging to one class is based on the extracted features at pixel site and on spatial and temporal context.Although this approach was originally developed for change detection, in this work we investigate its potential for multitemporal crop type analysis where no class transitions occur during one vegetation period.For both approaches no existing land cover map is required.Nevertheless, finally the results are overlaid to cropland objects of an existing GIS to evaluate the geometric quality.The approaches are demonstrated for a set of four multispectral RapidEye-images.Additionally a SVM-classification that serves as a benchmark is performed.
In chapter 2 an overview on related work on multitemporal crop type classification and on CRF is given.Next the data and extracted features for our tests are described.Chapter 4 deals with the principle of CRF and our implementation.A brief overview on the applied SVM can be found in chapter 5. Finally chapter 6 shows the results of our experiments.

Multitemporal crop type classification
In the field of multitemporal crop type classification three main directions of approaches can be observed: In the first category powerful classifiers or combinations of several classifiers that are developed originally for single image classification are used.The temporal dependencies are not modeled in any way and are only implicit contained.To use a time series of images simultaneously, the multitemporal images are simply stacked to one image.The contributions of Bruzzone et al. (2004), Gislason et al. (2006) and Waske and Braun (2009) as well as our approach CRF all belong to this category.Bruzzone et al. (2004) propose a multi classifier system and apply three classifiers in parallel, one of them considering the k nearest neighbors of each pixel.Decisiontree-ensembles (Random-Forest) (Gislason et al., 2006) (Waske and Braun, 2009) 2008) calculates NDVIprofiles that are evaluated using combinations of different thresholds.Lucas et al. (2007) proposes a rule based classification applying the software eCognition.The approach leads to a quite complex rule basis that is difficult to handle.De Wit and Clevers ( 2004) apply a pixel-wise Maximum Likelihood classification and combine the result with an evaluation of the NDVI profiles concerning the phonological behavior.
The third category of approaches integrates temporal dependencies to statistical models.This approach should be continued in our work using conditional random fields with the approach CRF multi .To compare the results of the developed statistical methodology a SVM-classifier as state of the art is chosen.Examples for the third category are Feitosa et al. (2009) and Melgani and Serpico (2004).Feitosa et al. (2009) model temporal dependencies by Markov chains for detecting land cover transitions in Landsat images, but spatial context is not taken into account.In Melgani and Serpico (2004) the Markov Random Field (MRF) framework is extended by a temporal energy term based on a transition probability matrix in order to improve the classification results for two consecutive images.

Conditional random fields
The interaction between neighboring image sites (pixels or segments) in MRF is restricted to the class labels, whereas the features extracted from different sites are assumed to be conditionally independent.This restriction is overcome by Conditional Random Fields (CRF) that were introduced by Lafferty et al. (2001) for classifying one-dimensional data.CRF provide a discriminative framework that can also model dependencies between the features from different image sites and interactions between the labels and the features.They were first applied on image data by Kumar and Hebert (2003) for the detection of man-made objects in terrestrial images.
In remote sensing CRF have been used for the classification of settlement areas in high-resolution optical satellite images (Zhong and Wang, 2007) (Hoberg and Rottensteiner, 2010) and for generating a digital terrain model from LiDAR (Lu et al., 2009).Roscher et al. (2010) classified crop types amongst other land cover classes in monotemporal Landsat data and achieved an accuracy of approx.70% for the crop type classes.Hoberg et al. (2010) applied CRF on multitemporal high resolution data in order to enhance classification accuracy in no change areas as well as to detect changes.In most approaches using CRF on imagery the graph is constructed as a regular grid with nodes representing pixels or square patches.In contrast, Wegner et al. (2011) use an irregular graph derived from a mean-shift segmentation for their CRF-based approach for building detection based on features generated from aerial images and airborne InSAR data.

RapidEye data
RapidEye consists of a constellation of five satellites launched in 2008.Each satellite has five spectral bands (blue, green, red, red edge and near infrared) with a GSD of 6.5 m and a dynamic range of 12 bit.Parts of the used time series are depicted in Figure 1.For the experiments RapidEye orthophotos automatically preprocessed are taken.The preprocessing includes orthorectification, resampling to a pixel size of 5 m and applying an automatic atmospheric correction.The images are taken from April to October 2010.

Features
The spectral bands green, red, red edge and near infrared are crucial during classification of vegetation classes.Only these bands are taken as input for feature extraction.Inside a filter radius of 5 pixels resulting in a filter window of 11*11 pixels the mean value is calculated during feature extraction.Four available images of 2010 lead to a total of 16 features.For both approaches (CRF and SVM) the same features are taken as input to receive a direct comparison of both approaches.
Because the focus of this work is not on feature selection and tests with gradient-based features did not improve our results, we decided to choose only these simple features.

Principle
In many classification algorithms the decision for a class at a certain image site is just based on information derived at the regarded site, where a site might be a pixel, a square block of pixels in a regular grid or a segment of arbitrary shape.In fact, the class labels and also the data of spatially and temporally neighboring sites are often similar or show characteristic patterns, which can be modeled using CRF.In monotemporal x y y y yy classification, we want to determine the vector of class labels x whose components x i correspond to the classes of image site i  S from the given image data y by maximizing the posterior probability P(x | y) (Kumar and Hebert, 2006): x y y y (1) In Equation 1, N i is the spatial neighborhood of image site i (thus, j is a spatial neighbor to i), and Z is a normalization constant, called the partition function.The association potential A i links the class label x i of image site i to the data y, whereas the term I ij , called interaction potential, models the dependencies between the labels x i and x j of neighboring sites i and j and the data y.The model is very general in terms of the definition of the functional model for both A i and I ij ; refer to (Kumar and Hebert, 2006) for details.(2) In Equation 2, y t and y k are the images observed at epochs t and k, respectively.The association potential A i t is identical to A i for epoch t in Equation 1.The second term in the exponent, IS ij t , is identical to I ij for epoch t in Equation 1, but it is called spatial interaction potential in order to distinguish it from the third term in the exponent, the temporal interaction potential IT i tk .K i is the temporal neighborhood of image site i at epoch t, thus k is the time index of an epoch that is a "temporal neighbor" of t.The temporal interaction potential models the dependency of class labels at consecutive epochs and the observed data.In our case the image sites are chosen to be pixels and thus are ordered in a regular grid (Figure 2).We model the CRF to be isotropic and homogeneous, hence the functions used for A i t , IS ij t and IT i tk are independent of the location of image site i.

Implementation
The image data are represented by a site-wise feature vector f i t (y t ) that may depend on the whole image at epoch t, e.g. by using features at different scales in scale space (Kumar and Hebert, 2006).
For our task of mutlitemporal crop type classification we apply two CRF-based approaches: CRF all : The extracted features f i t of all images at each site are concatenated in one feature vector f i all .So all information of one site is merged and no temporal structure (e.g.order of images) is existent any more.This allows the use of standard classification techniques, here the application of a common CRF approach with a 2D-grid structure as described in Equation 1.It should be noted that it would not be possible to detect class changes between epochs with this approach.Of course this is not wanted in this case.
CRF multi : Implementation of the approach as described in Equation 2and Figure 1.Each image site i at each time t is represented as one node in the graph with its feature vector f i t .In general this approach allows class transitions between epochs, in this work we have to avoid this by appropriate definition of the temporal interaction potential.
In Equation 3, E t fc and  t fc is the mean and co-variance matrix of the features of class c, respectively.For CRF multi they are determined from the features f i t (y t ) in training sites individually for each epoch t and each class c.With t=1 Equation 3 is also applied for CRF all but using the concatenated feature vector f i all .The spatial interaction potential IS ij t is a measure for the influence of the data y t and the neighboring labels x j t on the class x i t of site i at epoch t.For both approaches CRF all (with t=1) and CRF multi we applied the identical implementation and parameters to ensure comparability of the results.The data are represented by site-wise vectors of interaction features  ij t (y t ) (Kumar and Hebert, 2006).We use the component-wise differences of the feature vectors   2consists of the four neighboring image sites in a regular grid (Figure 1).The temporal interaction potential IT i tk that we use for CRF multi models the dependencies between the data y and the labels x i t and x i k of site i at epochs t and k.We apply a bidirectional transfer of temporal information, so the temporal neighborhood K i of x i t is chosen to contain the two elements x i t-1 and x i t+1 .Due to seasonal effects on the vegetation and due to different atmospheric and lighting conditions it would not be sufficient to derive IT i tk just from the difference of features vectors as for (5) Using the difference measure ν ic tk (y t , y k ) it is possible to support the decision for a site i to belong to a class c, if the features at that site show a typical development for that class between epochs t and k.This is realized in Equation 8 when computing The temporal interaction potential IT i tk is set to zero if the classes assigned to x i t and x i k differ (Equation 8), so class transitions can be avoided.The only parameter of our temporal model is ε (Equation 9).Using ε the effect for one site having a different development than the average class development of any class can be adjusted.It could be estimated from training data, but currently it is set by the user.
Exact inference is computationally intractable for CRF (Kumar and Hebert, 2006).In (Vishwanathan et al., 2006), several methods for parameter learning and inference are compared.In this paper we use Loopy-Belief-Propagation (LBP) (Nocedal and Wright, 2006), which is a standard technique for performing probability propagation in graphs with cycles.

SVM
As support vector machines (SVM) are successfully applied in numerous remote sensing applications (Mountrakis et al., 2011) a SVM-classifier should serve as a reference classification operator.

Principle
In principle, the SVM is a binary classifier.Therefore, samples of two classes are used to train a model that separates these classes in feature space.Being a maximum margin classifier, the SVM maximizes the space between cluster borders.Based on this, separating hyperplanes are defined by support vectors which are a subset of the training vectors.In order to allow separation of non-linearly separable data, a common approach in machine learning is applied: The feature space is mapped to a higher dimensional space, where classes become linearly separable.This is done by applying kernel methods (Hofmann et al., 2008).In our case, more than two classes have to be discerned.There are several strategies to transfer two-class problems to multiclass problems (Vapnik, 1998, Schölkopf andSmola, 2002).The most common are one vs.one and one vs.rest.The first one uses a voting scheme that accumulates the number of wins in a pairwise classification of each two classes.The second classifies each class against all others.The highest output determines the winning class.Here, the approach of (Chang and Lin, 2001) is applied with the one vs.one strategy using the concatenated feature vector f i all .It performs similar to one vs.rest strategy for classification, but is faster in the training process.The output of the classification process is a pixel-wise classification result.For more detail, in (Burges, 1998) a comprehensive tutorial about SVM is given.

RESULTS
Both CRF-based approaches as well as the SVM-classifier are applied to the data described in chapter 3.1.

Description of classes
The choice of crop type classes is based on expert knowledge about the most important categories of agricultural crops.

Description of manual reference
A manual reference is used that was made in the same year 2010 like the satellite images.It consists of 121 separate fields of an area of about 322ha and has been acquired by field walking.The portion of the individual crop types of the whole area is 12% grassland, 21% corn, 28% winter crop, 11% rapeseed, 11% root crops and 17% other crops.In the process an existing GIS was used to define the borders of single fields.The manual reference builds the training and evaluation sample for the test classifications.

CRF vs. SVM
For our tests we applied the cross-validation method by separation the learning sample into two equal parts.Tables 1-4 show the results for the two CRF-approaches and the SVMclassification.Overall 129001 pixels were classified.
Both of the CRF-based approaches slightly outperform the SVM classification with CRF all being best.For all approaches the overall accuracy is far over 80%, only the class grassland is classified with lower accuracy in each case.The classification results for a section of 21 fields are displayed in Figure 3.
In a next step, the majority of pixels belonging to a class in each reference fields was determined.This gives an idea of how good this approach is suited for classifying complete fields, ignoring classification errors at their borders.Applying CRF all 108 of the 121 reference fields were classified correctly (89.3%), with CRF multi 102 fields were correct (84,3%).
In general there are two main reasons for misclassifications: At first some fields show an "untypical" appearance for their class, e.g.most of the fields of one class are already harvested at one time of image acquisition but on some fields the crop is still present.Second some fields are very slender.So by using our feature extraction in an 11*11 window, their characteristics become blurred.

Comparison to GIS data
To evaluate the results concerning geometric accuracy we superimposed them with a national GIS dataset more specifically the German Authoritative Topographic Cartographic Information System (ATKIS).Among other sources ATKIS data are collected using aerial photography with a resolution of 20cm or 40cm supported by ground truth data, and set to be used in scale between 1:10.000 and 1:25.000.Objects of interest are point, line and area based objects listed at (AdV, 1997) with a minimum mapping unit of 0.1 ha to 1 ha.The geometric accuracy is 3m. Figure 4 illustrates that the class borders of the CRF-based approaches fit to boundaries of the GIS-dataset quite well.

CONCLUSION
In this work we presented two CRF-based approaches for multitemporal crop type analysis and tested them on RapidEye data of 4 epochs.Even with using just very few simple features, we achieved a classification accuracy of far over 80% for six crop type classes (grassland, corn, winter crop, rapeseed, root crops and other crops) with both approaches.Both of them performed better than a SVM-classification that served as a benchmark with the "classic approach" CRF all being slightly better than CRF multi .Nevertheless the CRF multi approach generally has a higher potential for any kind of multitemporal analysis.Because of its flexibility in the definition of the temporal interaction potential, it is also applicable for tasks of change detection or multi-scale analysis..

Figure 2 .
Figure 2. CRF multi graph structure.Red node: processed primitive; orange nodes: spatial neighbors; green nodes: temporal neighbors In the multitemporal case, we have M co-registered images.The components of the image data vector y are site-wise data vectors y i consisting of M components y i t , where y i t is the vector of the observed pixel values image site i at epoch t  T and T = {1,… M}.The components of x are vectors x i = [x i 1 ,… x i M ] T , where x i t describes the class of image site i at epoch t  T. For each image site we want to determine its class x i t for each time t from a set of C pre-defined classes.In order to model the mutual dependency of the class labels at an image site at different epochs, the model for P(x | y) in Equation 1 has to be expanded: Figure 3. Reference data and classification results in comparison Figure 4. CRF multi -classification superimposed by GIS cropland object borders (black borders) 6.5 Detailed results of CRF multi The association potential in CRF multi is the context-free result of a separate Maximum-Likelihood-classification for each epoch (Equation 3).A section of the ML-classification result for the four individual epochs is displayed in Figure 5 a)-d).For most of the pixels the classification results for the epochs differ, sometimes three or even four different classes are assigned.Moreover the classification result within many fields is inhomogeneous.Overall the classification accuracy for the single epochs is 59.6%.a) b) where R is the dimension of the vectors )|| is the Euclidean norm of µ ij t (y t ).This is a very simple model that penalizes local changes of the class labels if the data are similar.The only model parameter is .It could be determined from training data if fully labeled training images were available, but currently it is defined by the user.The neighborhood N i of image site i over which IS ij t has to be summed in Equation Whereas the class winter crop consists in detail of the classes barley, oat, rye, wheat and triticale.Other crops include in our case the classes asparagus and strawberries.

Table 1 .
Confusion matrix for CRF all -classification

Table 2 .
Confusion matrix for CRF multi -classification

Table 4 .
Overview on overall accuracy and kappa coefficient