MAPPING IRRIGATED AREAS USING RANDOM FOREST BASED ON GF-1 MULTI- SPECTRAL DATA

The irrigation districts need high-resolution spatial distribution information of irrigated fields to manage irrigation water effectively and achieve sustainable water resources management, especially for fragmented croplands such as China. However, most irrigated area mapping methods by remote sensing are based on MODIS time series with a relatively low resolution of 250-1000m. To fill this gap, this study attempted to use pixel-based random forest to map irrigated areas based on two multi-spectral images from GF-1 satellite with a resolution of 16 m in an irrigated district of China, during the winter-spring irrigation period of 2018. Accuracy of the retrieved 16-m map was assessed by accuracy error matrix using 210 ground-truth samples. The result had an overall accuracy of 93.33% with a Kappa Coefficient of 0.9164. The 16-m resulting map shows that the area of irrigated wheat, rain-fed wheat, irrigated fruit tree, and fallow croplands in the study area were 52066.48 ha, 12932.33 ha, 18104.32 ha, and 4641.25 ha respectively, accounting for 52.57%, 13.06%, 18.28% and 4.69% of the total study area, which are basically consistent with those obtained from field investigations. Compared with SVM, the random forest results are more accurate with fewer misclassifications. The pixel-based random forest for irrigated area mapping at high resolution can obtain more refined spatial distribution of irrigated areas than lowresolution images, which is suitable for fragmented croplands. Besides, this method can effectively distinguish irrigated crops from rain-fed crops, proving the classification ability of random forest in high-resolution irrigation area mapping only by two images. * Corresponding author: songwl@iwhr.com


INTRODUCTION
Irrigation plays an important role in crop growth, affecting agricultural yields and price stability especially in arid to semiarid regions where irrigation is the main source of water needed for crop growth. Agricultural irrigation consumes more than 70% of the world's available fresh water resources, and affects atmospheric convection, rainfall distribution and local climate by changing the distribution of surface water and groundwater (Ambika et al., 2016;Deines et al., 2017;Thenkabail, 2010). But over-exploitation of water resources for irrigation has depleted groundwater aquifers and reduced annual river flows (Postel, 2003). Moreover, the population growth and rapid economic development will consume more limited fresh water resources, exacerbating the problem of irrigation water shortage. Therefore, irrigation districts need better planning for irrigation to limit water budgets, avoid waste and relieve the crisis of water resource. In this regard, the spatiotemporal distribution characteristics of irrigated areas are important for sustainable water resources management, improving irrigation water efficiency (Abuzar et al., 2015;Deines et al., 2017;Pervez et al., 2010;Thenkabail et al., 2004). On the other hand, agricultural drought has shown a tendency of frequency and recurrence under the combined effects of global climate change and highintensity human activities, which will directly affect food production and then threaten social stability and sustainable development (Dai, 2013;Rockström et al., 2012;Sheffield et al., 2007). Considering that irrigation is the most effective way to mitigate the adverse effects of agricultural drought, the spatial distribution and spatiotemporal changes of irrigated croplands are of great significance for food security, scientific water resources management during droughts, and drought monitoring in the context of climate warming (Alexandridis et al., 2008;Gamo et al., 2013;Meier et al., 2018;Mutlu et al., 2008;Zhang et al., 2015). Remote sensing has become an important method for mapping the spatial distribution of croplands with its advantages of wide coverage, short revisit frequency and low cost (Gumma, 2011;Ouzemou et al., 2015;Wardlow et al., 2008). It can be used to classify crops, identify irrigated areas, and understand crop growth, which play an important role in irrigation district management . In 2006, the World Water Management Institute (IWMI) mainly used spectral matching techniques (SMTs) based on unsupervised ISOCLASS k-means classification and image segmentation by precipitation, temperature and elevation using MODIS time series data to obtain the first global irrigation area map by remote sensing at the resolution of 10 km (Thenkabail et al., 2006). But the object-based SMTs can't take full advantages of high-resolution images, which are not applicable to small croplands mapping. Since then, a series of annual irrigated area maps based on MODIS time series have been released at regional or national scale (Biggs et al., 2006;Dheeravath et al., 2010;Teluguntla et al., 2015;Thenkabail et al., 2005;Thenkabail et al., 2009;Wardlow et al., 2014). At present, most irrigated area mapping methods by remote sensing are based on MODIS time series data with a relatively low resolution of 250-1000m (Dong et al., 2010;Lin et al., 2008;Shahriar et al., 2014;Teluguntla et al., 2017) which is not conclusive to its effective application at field scale. However, the croplands in China are relatively small, scattered, and the planting structure is complex. The Chinese irrigation districts need high-resolution spatial distribution information of irrigated areas to manage irrigation water effectively and achieve sustainable water resources management for the growing demand of irrigation water caused by population growth and economic development in the future (Teluguntla et al., 2018;Xiong et al., 2016). But MODIS is difficult to meet the needs of small croplands mapping. For fragmented croplands in China, high-resolution images such as the Landsat data (with a spatial resolution of 30m) and GF-1 data (with a spatial resolution of 16 m) can better resolve smaller or scattered croplands, providing accurate location and improving the accuracy of the irrigation area mapping . Nevertheless, it is hard to obtain high-resolution time series data due to the limitation of the weather and the revisit cycle of satellites, which make it difficult to map irrigation areas. Consequently, this study was focused on how to accurately map irrigated areas at high resolution based on a small amount of images. Random forest is a supervised classification method with fast training speed and high accuracy (Belgiu et al., 2016;Liaw et al., 2002). In terms of crop classification, random forest has achieved good results and shows the potential to map irrigated areas (Machwitz et al., 2010;Xu et al., 2019;). Here the pixel-based random forest was explored for irrigated area mapping at a spatial resolution of 16m by only two images and its accuracy was assessed by ground-truth samples, demonstrating the ability of mapping precise irrigated areas which can effectively distinguish between rain-fed and irrigated croplands.

Study area
Donglei Irrigated District (PhaseⅡ ) (109°10'E to 110°10'E and 34°41'N to 35°N) is located in Weinan City, Shaanxi Province, covering three counties of Fuping, Dali and Pucheng (Fig. 1). The topography is higher in the northwest than the southeast with the elevation of 385～600m. The study area is dominant by the semi-arid climate with the annual ET of 1700-2000 mm and the precipitation of 519 -552 mm which is concentrated from July to September. Thus, irrigation is the main source of the water for crop growth. Donglei Irrigated District (Phase Ⅱ ) draws water from the Yellow River to irrigate croplands by canal. The time from October to April is the main irrigation period called winterspring irrigation in which winter wheat is the main crop. Typically, winter wheat is mainly irrigated 2-3 times but 3-4 times occasionally during drought.

Data
There are rain-fed areas of winter wheat in the study area, the harvest time of which is generally about ten days earlier than the irrigated winter wheat. This difference can effectively distinguish between irrigated wheat and rain-fed wheat. Therefore, in order to increase the accuracy of classification results, the images before and after the harvest of rain-fed wheat were chosen as the input data of the random forest algorithm. In this study, two 16-m images from GF-1 satellite on March 29, 2018 and May 12, 2018 respectively were selected during the winter-spring irrigation period of the study area in 2018, each of which composed of four bands (blue: 0.45-0.52 μm, green: 0.52-0.59 μm, red: 0.63-0.69 μm, NIR: 0.77-0.89 μm) downloaded from the China Center for Resources Satellite Data and Application (www.cresda.com/CN/). After pre-processing, the surface reflectance of four bands for each image was calculated and then composited as the input data layer for random forest algorithm. In order to map irrigated areas and verify the accuracy of the results, field surveys in 2018 were conducted to collect 420 ground-truth samples for training and validation, recording GPS, crop types, and whether they were irrigated. The ground-truth samples were gathered randomly from irrigated wheat, rain-fed wheat, irrigated fruit tree, fallow croplands and others.  is efficient on large data sets, having high prediction accuracy and good anti-noise performance for image classification with good stability (Belgiu et al., 2016). Random forest uses bootstrap to create a regression tree cluster by sampling a part of the sample set with replacement and the final classification result is obtained by voting. The basic process is: (1) Use bootstrap sampling method to randomly select K training samples from the original samples.
(2) K decision tree models are constructed for each of the K training samples, and the K classification results are obtained. The input variable of each decision tree is to randomly extract M features from N features. (3) Determine the final classification structure by voting based on the K classification results.
The trained random forest classifier established of 1000 trees by 210 training samples was applied to input composites of surface reflectance from the two multi-spectral images and labelled each pixel as either irrigated wheat, irrigated fruit tree, rain-fed wheat, fallow croplands or others, resulting in the spatial distribution map of irrigated areas and other LULC.

Irrigated areas of croplands
This study used the random forest classifier established by ground-truth samples to map the spatial distribution of irrigated areas during the winter-spring irrigation period with a spatial resolution of 16 m in 2018, as shown in Figure 3 and Figure 4. It showed that the total irrigated area consisted of irrigated wheat and irrigated fruit tree was 70170.80 ha. Specifically, the irrigated area of wheat was 52066.48 ha, accounting for 52.57% of the total study area. It means that wheat is the most important crop planted during the winter-spring irrigation period, which is consistent with the situation learned from field visit. Irrigated fruit tree was planted in an area of 18104.32 ha, accounting for 18.28% of the total study area, which mainly distributed around Pucheng town or in the Dali system. In addition, there were croplands in fallow covering 4641.25 ha where no crop was planted. Moreover, there were some croplands that cannot be irrigated by canals due to the high terrain, namely rain-fed areas in the northeast of study area, having an area of 12932.33 ha, which means that the random forest method can distinguish irrigated wheat from rain-fed wheat. Due to the fragmentation of croplands in China, this method can obtain the more refined spatial distribution of irrigated areas than low-resolution images.

Accuracy assessment
An accuracy assessment of the 16-m resulting map derived by pixel-based random forest was conducted using an accuracy error matrix based on 210 randomly distributed validation samples from field visit, as shown in Table 1. The accuracy error matrix provided an overall accuracy of 93.33%, with a Kappa Coefficient of 0.9164, indicating that irrigated areas were mapped accurately. The extraction effect of irrigated fruit tree is good, with producer's accuracy of 92.50% (errors of omission = 7.50%) and user's accuracy of 94.87% (errors of commission = 5.13%). For irrigated wheat, the producer's accuracy and user's accuracy were 96.00% and 90.57% respectively (errors of omission = 4.00%; errors of commission = 9.43%). Moreover, the producer's accuracy and user's accuracy of croplands in fallow were 95.00% and 95.00% respectively (errors of omission = 5.00%; errors of commission = 5.00%). For rain-fed wheat, the producer's accuracy is 97.50% (errors of omission = 2.50%), and user's accuracy is 90.70% (errors of commission = 9.30%). Furthermore, the accuracy of others was relatively lower with producer's accuracy of 85.00% (errors of omission = 15.00%) and user's accuracy of 97.14% (errors of commission = 2.86%). In general, the random forest classification results are accurate for irrigated area mapping. For example, the partial enlarged details show that this method can distinguish croplands from roads and buildings, and map broken and isolated small fields, which is suitable for Chinese small scattered croplands with complex planting structure, reducing classification errors caused by mixed pixel (Fig. 5).

Class Commission Error
User's Accuracy

DISCUSSIONS
The field investigation found that some fallow croplands were left unused and a large number of weeds were grown, making it difficult to identify and easily confused with rain-fed wheat. There were some confusion between irrigated wheat and fruit tree because of interplanting, which caused the major classification errors for irrigated area mapping. The validation results suggested that the area of fallow croplands and rain-fed wheat were slightly larger than the true area due to the misclassification of others, which affected the final classification accuracy. On the other hand, when selecting input data, the image during the time when rain-fed wheat is harvested while irrigated wheat is not harvested is crucial, which can effectively improve the classification accuracy. Support vector machine (SVM) is a commonly used supervised classification method, which is widely used in agricultural monitoring and has achieved good classification results. Based on SVM classifier, the same training samples were used for classification and the same verification samples were used to assess the accuracy of the classification results, which were shown in Table 2. The accuracy error matrix provided an overall accuracy of 90.00%, with a Kappa Coefficient of 0.8748, which is lower than that of random forest. Comparing the classification results by random forest, the omission errors of irrigated wheat and fallow croplands by the support vector machine results are significantly larger, which means the extraction effect of irrigated wheat and fallow croplands is not as accurate as that by random forest. For SVM results, more irrigated wheat were misclassified as irrigated fruit trees, and the confusion between fallow croplands and others were more serious. In conclusion, the classification results by random forest are better.  Table 2. Accuracy assessment by ground-truth samples

CONCLUSIONS
Based on the multi-spectral reflectance from two GF-1 images on March 29 and May 12, 2018, this study established a random forest classifier using ground-truth samples to map irrigated areas at a resolution of 16 m during the winter-spring irrigation period of 2018 in Donglei Irrigation District. The 16-m resulting map derived by pixel-based random forest was assessed by the accuracy error matrix, having an overall accuracy of 93.33%, with a Kappa Coefficient of 0.9164 which indicated that irrigated areas were mapped accurately. The area of irrigated wheat, rain-fed wheat, irrigated fruit tree, and fallow croplands in the study area were 52066.48 ha, 12932.33 ha, 18104.32 ha, and 4641.25 ha respectively, accounting for 52.57%, 13.06%, 18.28% and 4.69% of the total study area. The results are basically consistent with those obtained from field investigations. Compared with SVM, the classification results by random forest are more accurate with fewer misclassifications. The classification method of irrigated area based on pixel-based random forest is suitable for mapping high-resolution irrigated areas especially for fragmented croplands, reducing errors caused by mixed pixels. Besides, this method can effectively distinguish irrigated crops from rain-fed crops, proving the classification ability of random forest method in high-resolution irrigation area mapping only by two images.