DYNAMIC TIME WARPING FOR CROPS MAPPING

: Dynamic Time Warping (DTW) has been successfully used for crops mapping due to its capability to achieve good classification results when a reduced number of training samples and irregular satellite image time series is available. Despite its recognized advantages, DTW does not account for the duration and seasonality of crops and local differences when assessing the similarity between two temporal sequences. In this study, we implemented a Weighted Derivative modification of DTW (WDDTW) and compared it with DTW and Time Weighted Dynamic Time Warping (TWDTW) for crops mapping. We show that WDDTW outperformed DTW achieving an overall accuracy of 67 %, whereas DTW obtained an accuracy of 57%. Yet, TWDTW performed better than both methods obtaining an accuracy of 88%.


INTRODUCTION
The earth's population exceeds 7 billion (Brelsford et al., 2018). This population growth puts pressure on the food supply systems across the globe. Therefore, food security related challenges are high on the agenda of the Sustainable Development Goals (SDG) (United-Nations, 2015), in particular SDG Goal 2-Zero Hunger. To be able to address these challenges, decision-makers require accurate and spatially explicit information on crops (Fritz et al. 2015). Such information is of paramount importance for evidence-based action and policy formulation. The importance of remote sensing data for generating this information is widely recognized.
Spatially and temporally explicit information on cropland extent and crop types is a prerequisite for monitoring the multiannual dynamics of crop production (Bégué et al. 2018) and associated water consumption (Meier et al. 2018). Research has been carried out in recent years to develop automated methods for crop mapping. Yan and Roy (2014) proposed an object-based method to delineate crop fields in multi-temporal satellite images available through Web Enabled Landsat Data project. Waldner et al (2015) described a cropland areas classification method that relies on expert knowledge to derive features that are expected to remain the same from one year to another. The efficiency and transferability of this method were tested in Argentina, Belgium, China, and the Ukraine. Salmon et al. (2015) mapped three classes of agricultural land use (rainfed, irrigated, and paddy croplands) by means of machine learning techniques and by combining data from a number of different sources including MODIS imagery, agricultural census data, meteorological measurements, and surface moisture measurements. Xiong et al. (2017) created a cropland areas map of the African continent using the Google Earth Engine and a MODIS dataset. Dahal et al. (2018) produced a 250 m spatial resolution cropland map using MODIS NDVI, weather, and climate information, together with elevation data. Important research has been carried out for crops mapping from multi-temporal satellite images using machine learning classifiers such as random forest (Wang et al., 2019), or deep learning (Zhong et al., 2019). These classifiers are, however, challenged by the following problems: 1) lack of a large number of training samples required for achieving good classification results; 2) large variability of crops phenology across geographic regions and time, which prevents the reusability of these classifiers from one region to another and from one year to another; 3) lack of a consistent multi-temporal satellite image stack to be processed due to the presence of clouds.
The increasing availability of high-resolution satellite image time series has created new premises for crops mapping. In an ideal scenario, we can compare two temporal sequences representing the growth pattern of target crops using simple metrics such as the Euclidean distance. In reality though, the temporal sequences for the crops of interest may not line up on the time axis because of the different climatic conditions or different agricultural practices. In this case, we need to warp the measurements on the time axis for a better alignment. Dynamic Time Warping (DTW) is an efficient solution for a non-linear alignment of two temporal sequences (Sakoe, Chiba, 1978). DTW obtained good crops classification results with a reduced number of training samples and with multi-temporal images that present gaps caused by the presence of clouds (Belgiu, Csillik, 2018;Csillik et al., 2019;Guan et al., 2018;Maus et al., 2016;Petitjean, et al., 2011). DTW has initially been developed for speech recognition applications (Sakoe, Chiba, 1978), and it has been adopted in other domains including medicine (Tsevas, Iakovidis, 2010) or robotics (Johnen, Kuhlenkoetter, 2016). DTW is used to assess the similarity between the phenological patterns generated from, for example, Normalized Difference vegetation Index (NDVI, of the labelled samples with the phenological patterns computed for the remaining unlabelled pixels. When aligning two phenological patterns, the following factors might affect the similarity assessment results: (1) noise: the presence of clouds might introduce noise in the evaluated phenological patterns; (2) amplitude scaling: two phenological patterns representing the same crop might have different amplitude, i.e. different NDVI values; (3) time-shifting: the phenological pattern of the same crops might be shifted because of the difference in the time of cultivation; (4) time scaling: one phenological pattern of a crop is shorter than the other one.
Despite its recognized advantages, DTW has some limitations for agriculture applications that need to be carefully addressed. First, DTW does not account for the crop growth phase, i.e. duration and seasonality (Maus et al., 2016). Second, DTW addresses the differences of the spectral measurements (e.g. the peak on the NDVI axis is higher on the test sequence than on the reference sequence) by warping the values on the time axis (Sakoe, Chiba, 1978).
The above-mentioned problems have been addressed by introducing different modifications of DTW. For example, Maus et al. (2016) proposed the so-called Time-Weighted Dynamic Time Warping (TWDTW) method that makes use of a time constraint to calculate the distance between two phenological patterns. Due to this time constraint, it accounts for the crop growing phase. TWDTW has successfully been applied for land cover classification in a tropical area (Maus et al., 2016). The authors reported an overall accuracy of 87.32%, as compared to 70% obtained by DTW. Belgiu, Csillik (2018) applied TWDTW for crops mapping from Sentinel-2-based NDVI layers in three agricultural landscapes from Romania, Italy and USA and obtained an overall accuracy of 96%. Guan et al. (2018) applied successfully an open-boundary locally weighted DTW distance method to NDVI time-series from MODIS for cropland mapping in Southeast Asia.
In contrast to DTW and TWDTW, Weighted Derivative Dynamic Time Warping (WDDTW) (Jeong et al., 2011) takes into account the first derivative of the evaluated sequences and penalizes the high phase differences. Jeong et al. (2011) compared WDDTW and DTW using time series datasets from different application domains and concluded that WDDTW outperformed DTW in all evaluated cases.
The goal of this paper is to implement and evaluate to what extent TWDTW and WDDTW can overcome the challenges encountered by DTW when classifying crops. It is the first study dedicated to implementing the weighted derivative variation of DTW for classifying crops from satellite image time series.

STUDY AREA AND DATASET
The study area is situated in Imperial County, southern California, USA. Seven crops were investigated, each of them occupying at least 3% of the study area: alfalfa, fallow, other hay, onion, sugar beet, winter wheat, and lettuce ( Figure 1).
We used the red and near-infrared spectral bands (band 4 and band 8A with a spatial resolution of 10 m) acquired by Sentinel-2 MultiSpectral Instrument (MSI) to calculate the NDVI for eighteen images (September 2016-September 2017). These images were downloaded from Copernicus Open Access Hub. We used the atmospherically corrected satellite images as input in our study.

METHODOLOGY
A three-step methodology was implemented in this study: (i) generation of NDVI-based temporal sequences; (ii) implementing DTW, TWDTW and WDDTW; (iii) evaluating the results using standard classification accuracy metrics.

Generating the NDVI stack
The NDVI generated from 10 m resolution Sentinel-2 red and near-infrared spectral bands was used to compute the temporal phenological patterns of the target classes: NDVI was used in this study because of its efficiency for vegetation phenology studies (Yan, Roy, 2014).
The DTW distance matrix D is obtained as the recursive sum of the minimum distances: TWDTW adopts a weight into the calculation of DTW distance. The weight is controlled by a Modified Logistic Weight Function (MLWF) which takes into account the phase difference between two matching points. For example, if a point in one sequence has a larger time lag than a point in another sequence, the weight between these two points will be larger (Maus et al., 2016). The weight can be fine-tuned when measuring the similarity between two sequences. This can be done by defining different values of two MLWF parameters, i.e. half-length of the sequence and the level of penalization for the points with larger phase difference. Thus, TWDTW relies on a weight ω given to the dbase, The WDTW distance wd is calculated as follows: The weight ω is defined using a modified logistic weighted function: , WDDTW transforms the original temporal sequences generated using e.g. NDVI into higher level features containing the shape information of that sequence. The equation for transforming data point ui in sequence U is presented below: where n = length of sequence U.
The weighted version of WDDTW is obtained as follows (Jeong et al., 2011): where u i d and v j d are the transformed sequences from the sequences U and V with lengths n and m, respectively.
The three methods evaluated in this study have been implemented in Python and are available to interested users per request.

Training and validation samples
The training and validation samples used for this study have been generated using the CropScape -Cropland Data Layer (CDL) for 2017. CDL data are collected by the United States Department of Agriculture (USDA), National Agricultural Statistics Service (Boryan, 2011). Given that the spatial resolution of CropScape data is 30 m, we decided to resample the Sentinel-2 spectral bands from 10 m to 30 m. The spatial distribution of the target crops is depicted in Figure 2.
CropScap data are obtained by classifying high-resolution aerial images collected throughout the year across the USA (USDA, 2017). Therefore, the published data are not error free. The classification accuracy (user's accuracy and producer's accuracy) is presented in Table 1 (USDA, 2017). Except for Alfalfa, a class with a producer's accuracy of 90%, all classes had to be carefully investigated before using them for this study.
To improve the quality of this sample set, we removed the wrongly classified samples by implementing the methodology presented by Belgiu, Csillik (2018) and generated randomly 100 samples per class. We computed their phenological profiles, i.e. growth cycle, assessed the differences between these resulting profiles and the crop calendar and discarded all samples that deviate from the pattern of the represented crops.
In the end, we had 50 training samples per class.  Figure 2. Spatial distribution of target crops in the study area.

Class
The legend was adopted from the USDA (USDA, 2017) The validation samples were generated using the same procedure as explained above. Careful attention was paid to the spatial distribution of these data and exclude the agricultural plots used to generate the training samples.

Classification accuracy
Given the limitation of the widely used kappa index for remote sensing applications as reported by Foody (2020), we used overall accuracy together with user's accuracy and producer's accuracy metrics to evaluate the quality of the classification results obtained by using the three methods (Congalton, 1991).

RESULTS AND DISCUSSION
TWDTW outperformed both DTW and WDDTW, obtaining an overall accuracy of 88% as compared to 67% and 57% accuracy obtained by WDDTW and DTW respectively.
TWDTW performed better than the other two evaluated methods because of its capability to reduce considerably the misclassification rate of the crops with a large time lag on the temporal NDVI sequences. For example, the temporal patterns of the investigated crops ( Figure 1) show that winter wheat and onions have similar duration and amplitude of their NDVI values. The only difference consists in the fact that the peak of vegetation of winter wheat occurs one month earlier than those of onions. DTW and WDDTW do not account for this difference and, therefore, the classification errors for these two crops are high. TWDTW, on the other hand, obtained a higher user's accuracy for winter wheat (82%) as compared to DTW and WDDTW which obtained an user's accuracy of only 30% and 40%, respectively. TWDTW achieved good classification results as quantified by both user's and producer's accuracy for the following classes: alfalfa, fallow, other hay, and sugar beets (Figure 3 and Figure  4). The best classified classes are fallow (producer's accuracy of 100% and user's accuracy of 96%) and alfalfa with a producer's accuracy of 94% and an user's accuracy of 90%.
Onions yielded a producer's and user's accuracy of 76% and 70% respectively. WDDTW outperformed DTW by 10% (67% vs. 57%). It performs better than DTW for the classification of sugar beets (producer's accuracy of 80%) or double cropping such as lettuce (Figure 2 and Figure 3), obtaining a producer's accuracy of 64% and an user's accuracy of 70%.
DTW is computationally very demanding. To overcome this limitation, the input satellite images time series should be reduced to those images that are relevant for the investigated crops. Previous studies proved that identifying the optimal temporal window for crops mapping does not only reduce the computational time, but it also improves the classification results (Meng et al., 2020). The optimal temporal window can be selected using either a data-driven or a knowledge-based approach, i.e. using the crop calendar of the cultivated crops in the investigated agricultural regions. The quality of the DTW-based classification results depends on the quality of the training samples. Collecting training samples through fields campaign for a large number of cops is a timeconsuming and expensive task (Maxwell et al. 2018). Therefore, we need efficient solutions to generate training samples to support the production of timely and reliable crop information. Transfer learning (Tuia et al., 2011), crowdsourcing initiatives (Fritz et al. 2009), and utilization of existing inventories to guide the labeling of the new training samples ) are promising solutions to address this challenge.

CONCLUSIONS
This study evaluated the performance of DTW and two of its variations, TWDTW and WDDTW for crops mapping from Sentinel-2 time series mapping. TWDTW proved to be the most efficient among them due to the additional time constraint that can be defined when matching two temporal sequences representing the crops growth cycles. In future work, we plan to investigate a knowledge-based selection of the optimal temporal window for each crop when applying WDDTW.