DEEP LEARNING FOR REMOTE SENSING IMAGE CLASSIFICATION FOR AGRICULTURE APPLICATIONS

This research examines the ability of deep learning methods for remote sensing image classification for agriculture applications. Unet and convolutional neural networks are fine-tuned, utilized and tested for crop/weed classification. The dataset for this study includes 60 top-down images of an organic carrots field, which was collected by an autonomous vehicle and labeled by experts. FCN8s model achieved 75.1% accuracy on detecting weeds compared to 66.72% of U-net using 60 training images. However, the U-net model performed better on detecting crops which is 60.48% compared to 47.86% of FCN-8s.


INTRODUCTION
Weed controlling is one of the main problems that farmers must deal with. Weeds profoundly affect farm productivity by invading crops and smother pastures and significantly decrease the quality of the harvested crops (Milberg et al., 2004). Herbicides are widely used globally to enhance food production; however, it can cause harm to the environment and the ecosystem (Horrigan et al., 2002). In the traditional weeds control approaches, the herbicide is applied over the whole field, even for the area without weeds where no treatment is required. Precision agriculture techniques should regularly monitor crop growth to maximize yield while minimizing the use of resources such as chemicals and reducing the side effects of herbicides on the environment (Duckett et al., 2018). Thus, accurate weeds detection and mapping, and their local treatment are essential steps to improve weed and crop control in modern agriculture.
Recently, unmanned aerial systems (UAVs) have become suitable platforms for acquiring data for crop and weeds monitoring. UAVs can acquire high-resolution imagery at a low cost, and less dependence on weather conditions (Hashemi-Beni et al., 2018, Gebrehiwot et al., 2019, Vinh et al., 2019. Several methods have been proposed for weed recognition over the last decades. Vegetation index, such as the normalized difference vegetation index (NDVI) is one of the methods to segment weeds in an agricultural field (Dyrmann et al., 2014). The main challenge of this approach is dealing with overlapping plants to separate weeds and crops. Texture-based models have shown great performance in detecting and discriminating plants from images with overlapping leave (Pahikkala et al., 2015). Machine learning-based approaches have gained attention for detecting weeds and crops (Murawwat et al., 2018). Murawwat, et al. (2018) use a support vector machine (SVM) classifier to classify carrot crops and weeds. They used 72 samples for training and eight samples for testing and achieved more than 50% of classification accuracy. The challenge of the traditional ML approach such as SVM or RF classifiers is that feature extraction is not automatic and handcrafted features generation is a time-consuming stage. Some studies prove that unlike conventional machine learning methods, deep learning can efficiently deal with the limitations of handcrafted features for classifying weed and crops by extracting the features directly from the input data (Lee et al., 2015). Recently, there has been considerable progress in the classification of remote sensing data using deep learning for different applications including agricultural tasks. Some studies used a convolutional neural network (CNN) in agricultural applications such as weed and crop classification (Mortensen et al., 2016, Potena et al., 2016, Di Cicco et al., 2017. Mortensen et al. (2016) used the VGG-16 CNN model to classify weeds using mixed crops of an oil radish plot with barely, weed, stump, grass, and background soil images. Potena et al. (2016) presented a perception system for weed crop classification that uses shallow and deeper CNNs. The shallow CNN is used to detect vegetation, while the deeper one is used to classify weeds and crops. Di et al. (2017) used a SegNet to procedurally generate large synthetic training datasets randomizing the key features of the target environment (i.e., crop and weed species, type of soil, light conditions). Hashemi-Beni, et al. (2020) applied the u-net model to detect and discriminate crop and weeds using a small dataset. They applied data augmentation techniques such as random cropping, random rotation, and reflection to improve the classification results.
This research provides a comparison of the U-Net and FCN-8s models for segmenting weeds from Crop/Weed Field Image Dataset (CWFID), which was introduced in Haug et al. (2014).

DATASET AND ANNOTATION
The CWFID (Haug et al., 2014) was used to train and test the U-Net classifier in this study. The CWFID dataset comprises field images in top-down view that were collected with an autonomous field robot Bonirob in an organic carrot farm in 2013. The CWFID includes 60 images with a size of 1296 x 966 pixels. All data acquisition was carried out during field tests, and images were acquired while the crop was in the growth stage and one or more true leaves were present. These images were labeled with a vegetation segmentation mask and crop/weed labeled image (In total: 162 and 332 crop and weed plants, respectively). A sample image of this dataset is shown in Figure 1. The CWFID was downloaded from https://github.com/cwfid. Figure 1 shows one sample example of the training and its corresponding labeled image. The black, green, and red colors represent soil, crops, and weeds, respectively.

METHODS AND DATA PROCESSING
3.1 Network Architecture 3.1.1 U-net Architecture: The U-Net is a type of CNN architecture proposed by Ronneberger et al. (2015) for biomedical image segmentation. The U-Net architecture is based on the fully convolutional network, and it was modified to give accurate classification results with fewer training data. As shown in Figure 2, the architecture of U-Net looks like a letter 'U' that justifies its name. The U-Net architecture has two structures: a shrinking path (left side) and an expanding path (right side). The shrinking path (also called the encoder) used to extract and capture the context in the image and the expanding path combines the feature maps. Then, more precise predictions of the pixel points on the edge can be obtained. The encoder is just a traditional stack of convolutional and max-pooling layers. The expanding structure (or decoder) is used to enable precise localization using transposed convolutions. Thus, it is an end-to-end fully convolutional network that only contains convolutional layers and does not contain any dense layer because of which it can accept the image of any size.

FCN-8s architecture:
The FCN was proposed by Long et al. (2015) to train an end-to-end for semantic segmentation. In this model, VGG16 fully connected based classification layers were replaced by convolutional layers to maintain the 2-D structure of images. VGG-16 is a CNN architecture proposed by Simonyan et al. (2014) to investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. As shown in Figure 3, because of the gradual upsampling of the scoring layer and merging of features from earlier layers, a fine label map is obtained using the FCN-8s. For this study, a fully convolutional neural network (FCN) with a stride of 8 (FCN-8s) model was fine-tuned. The FCN-8s is composed of locally connected layers, such as convolution, pooling, and upsampling, without having any dense layer. This allows reducing the number of parameters and computation time.
Given that all connections are local, FCN-8s can work on any image size. Figure 3. FCN-8s architecture (Skovsen et al., 2017)

Training U-Net and FCN-8s
In this study, the U-net and FCN-8s models fine-tuned and trained using Stochastic Gradient Descent (SGD) with a learning rate of 0.001, and a maximum epoch of 16. A 10-fold cross-validation method was used to estimate the ability of the U-net and FCN models on unseen data. For this purpose, we randomly partitioned the training images into 10 equal parts. At each run, the union of 9 parts was put together to form a training set, and the remaining 1-part used as a test set to estimate the classification errors. The above steps are repeated ten times, using a different fold as the testing set each time. Finally, the mean error from all folds was used to estimate the potential of the U-net and FCN-8s models. It took approximately 8 hours to cross-validating U-Net and FCN-8s using a single GPU (NVIDIA Quadro M4000).

RESULTS AND DISCUSSION
In this study, a confusion matrix was used to analyze the classification results of the U-net and FCN-8s models. The qualitative classification results of U-Net and FCN-8s are shown in Figure 4, and the detailed results on how the U-Net and FCN-8s models performed for each class (for soil, weeds, and crops classes) are described in Table 1 and Table 2, via the confusion matrices. The U-Net and FCN-8s models achieved an overall accuracy of 75.2% and 72.1%, respectively for separating weeds from crops.  The results also show that about 33% and 24% of weeds in U-net and FCN-8s, respectively were misclassified as crops. This is because weeds and crops have a similar spectral response, which makes it hard to separate them using U-Net and FCN-8s classifiers solely from optical imagery.

CONCLUSION
Mapping the location of the weeds and locally treat those areas are essential to improve weed control capacity. This study compared the performance of two well-known deep learningbased methods to distinguish crops from weeds using highresolution imagery. The U-Net and FCN-8s models were finetuned to classify the CWFID into three classes (soil, crops, and weeds). The fine-tuning and transfer learning technique allowed us to overcome the problem of small dataset size. The U-Net and FCN-8s models achieved an overall accuracy of 75.2% and 72.1%, respectively for classifying the CWFID dataset into three classes using 60 training images. In future research, we will incorporate crop geometry constrains to the model to improve classification accuracy.

ACKNOWLEDGEMENT
This material is based upon work supported by the National Science Foundation under Grants No. 1832110 and No.1800768.