EVALUATION OF MACHINE LEARNING TECHNIQUES IN VINE LEAVES DISEASE DETECTION: A PRELIMINARY CASE STUDY ON FLAVESCENCE DORÉE

Vine culture is influenced by many factors, such as the weather, soil or topography, which are triggers to phytosanitary issues. Among them are some diseases, that are responsible for major economic losses that can, however, be managed with timely interventions in the field, viable of leading to effective results by preventing damage propagation. While not all symptoms might present a visible evidence, hyperspectral sensors can tackle this aspect with their ability for measuring hundreds of continuously sparse bands that range beyond the eye-perceptible spectrum. Having such research line in mind in this work, a hyperspectral sensor was applied to analyse the spectral status of vine leaves samples, collected in three chronologically distinct campaigns, while costly and destructive laboratory methods were used to track Flavescence Dorée (FD) in the same samples, for a ground truth information. Regarding data processing, machine learning approaches were used, in which several classifiers were selected to detect FD in vine leaves hyperspectral images. The goal was to evaluate and find most suitable classifier for this task.


INTRODUCTION
The sustainable production of vines envisages challenges induced by many factors -such as the weather, soil or topography -that are hard to control, from which can be highlighted phytosanitary issues in general and diseases in particular, responsible for major economic losses in the Agriculture industry worldwide (Martinelli et al. 2015). In 1990s, new discipline called Precision viticulture was developed. It aims to adjust vineyard management to the spatial variability that is present in the field in order to make its economic and environmental sustainability more efficient (Santesteban 2019). Disease management usually involve inadequate plant protection products (PPPs) administration, due to the lack of proper detection techniques and decision support systems capable of identifying affected areas and quantifying the real needs, in a timely manner. In fact, the well-timed detection would have an positive effect for optimal diseases control actions and plant growth management strategies (Akila, Deepan, 2018). Savary et al. (2012) also pointed out in their study that early detections increases treatments effectiveness. In an ecological perspective, PPPs management aims to reduce negative environmental impacts, as well. The study of (Popp et al., 2013) demonstrates the real consequences of unmanaged pesticide spraying products on consumers health. The traditional in-field visual observation by an agronomist or other similar professional and related laboratory test are only practicable for small areas because it is very time consuming and demanding. Moreover, the subjectivity of visually identifying phyto-pathological problems can lead to misclassification and wrong conclusions even by experienced agricultural experts (Akila, Deepan 2018). Nevertheless, available non-invasive optical sensors that can access to reflectance properties of a plants in wide range of the electromagnetic spectrum in narrow bands has been emerging, with the potential for more effective disease detection (Chen et al., 2019;Lowe et al., 2017;Mahlein et al., 2012;Thomas et al., 2017;Xie et al.,2015;Zhou et al., 2019). Among them, hyperspectral sensors can be highlighted, whose principle is comparable to the one behind common RGB or multispectral cameras, i.e., measuring the amount of light that reach the sensor and store their formation. However, they differ in the range of electromagnetic spectrum they can capture, as well as in the number and width of the bands associated. RGB camera only measures three wide bands in the visual part of spectrum, multispectral typically from tree to fifteen bands, while hyperspectral camera measures up to several hundreds of narrow bands ranging much more than the eye can meet. Besides, it is known that when symptoms become visible, in most cases, disease development already reached a middle or late stage (Lowe et al, 2017). On the other hand, hyperspectral image -also known as hyperspectral cube -contains two dimensions: spatial and spectral information. (Thomas et al. 2017). The detailed spectral information in hyperspectral images allows an association to physiological processes in plants. Hyperspectral reflectance pattern for such characterization was already validated along with destructive methods in previous works (e.g. determination of photo pigments (Zhao et al., 2016)). Furthermore, hyperspectral imaging is an objective method in contrast to visual interpretation which makes it suitable for implementation in automated systems (Mahlein, 2015;Virlet et al., 2017;Walter et al., 2015). The hyperspectral imaging can be used for a different scale application. In the work carried out by Thomas et al. (2017), the scales are divided into tissue, leaf, single plant and canopy. Tissue is the most detailed scale with resolution in millimetres; leaf scale operates with centimetres resolution. Both of those scales are commonly acquired in laboratory environment. Plant scale is defined up to 30cm and canopy up to 50cm resolution. However, even bigger scales can be considered, such as field and landscape, namely while resorting to hyperspectral sensor-capable UAV, airplane or satellite (Adão et al. 2017). For each application the trade-off between spatial resolution and measurement throughput needs to be well-thought-out. Several studies that were detecting disease in bigger scales are (Han 2013;Izzuddin et al. 2018;Huang et al. 2007). In this study, Flavescence Dorée (FD) is classified in leaf scale. It is a bacterial grapevine disease caused by phytoplasma, spread in many counties in Europe (Chuche, Thiéry 2014). The symptoms of the disease start to appear in late summer and remain visible until mid-autumn. Typical transformations include white variety leaves turning into yellow colour nearby leaf veins and red caste leaves turning into dark red colour, inbetween leaves' veins spaces. Progressively, they curl into triangle shapes in both castes (Albetis et al. 2017). However, the goal of this study is to use a ML approach and evaluate selected classifiers based on their performance in classification of FD, even in a stage where no visible symptoms are present. Two Vinhão leaves with no visible symptoms of the disease can be seen in Figure 1, in which one of them is actually infected with FD (confirmed by laboratory tests). Machine learning (ML) approach demonstrated to be very useful to handle hyperspectral image analysis, mainly because of its capability to organize the relationship between the reflectance values and desired information while being robust against the noise and uncertainties in spectral and ground truth measurements (Gewali et al., 2019). This fact was demonstrated in a wide variety of studies (Preidl & Doktor, 2011;Schneider et al., 2010;Verrelst et al.,2012), where in ML approach outperformed the traditional methods. Generally better that classical method such as spectral matching, ML can also handle a spectral and ground truth variability and noise (Schneider et al. 2010). This combination is booming in precision plant protection because number of applications shown promise for early disease detection (Golhani et al., 2018). However, main disadvantages include: highly dependency on patterns of variables, and features to be extracted, as well as the need for some classifiers to be trained several times before real application (Zhang et al. 2015).

Data collection
The leaves were collected in three campaigns, carried out in July, August and October of 2018, in which visual symptoms of the disease were picked. Campaigns took place in Minho region, Portugal (see Figure 2). The three campaigns were planned to acquire samples that would capture the behaviour of the disease in different stages of the development.

Figure 2. Vine leaves collection
The collection was done from two vine varieties: Vinhão and Loureiro, from red and white cultivars, respectively. Collected leaves were examined by a Nested polymerase chain reaction (Pelt-Verkuil et al., 2008)

Pre-processing
After the samples collection, a pre-processing of the data was carried out. Acquired samples were in raw data cube format requiring calibration, which was done in special SpectralView software, a part of the hyperspectral sensor package. Afterwards, the background from hyperspectral image was removed by applying Spectral Angle Mapper (SAM) algorithm, through the calculation of spectral angle error, in radians, between two vectors, background and leaf pixels.

Sample extraction
Each hyperspectral leaf sample was composed of several pixels (~= 250 000 pixels), each one characterized by a spectral signature. Due to the lack of positional information about leaf area disease incidence, two masks for selecting the spectral reflectance were used (see Figure 3). First mask (Figure 3 -left) represented the whole leaf. To reduce the computational burden, 10 000 spectral signatures representing one leaf were randomly selected for creating the first dataset. Under the orientations of plant science experts, masks around the (low) midrib -nearby leafstalk -was created (Figure 3 -right). Having in mind data balancing, 100 randomly selected spectral signatures per leaf was used for creating a second dataset. Figure 3. Masks for dataset production: left part shows the whole leafbased mask; at right, is an example depicting the low midrib-based mask

Selected classifiers
Classifiers can be viewed as labelling learning systems. They make few presumptions for classification without any prior knowledge of the data pattern. The goal of a classifier in this study is to estimate the presence of FD disease on vine leaves by analysing reflectance signatures, considering a classification problem. The first classifier was Logistic Regression (LR). This popular classifier is known for being suitable for binary classification problems (Hoffman 2019). Linear discriminant analysis (LDA) is a supervised dimensionality reducing method that finds directions that maximally separate the different classes while minimizing the spread within one class (Fisher 1936;Tavernier et al. 2019). Quadratic discriminant analysis (QDA) is an univariate statistical method. In QDA it is assumed that the measurements in each class have normal distribution, disregarding that the covariance of each class is the same (Eker et al., 2015). Multi-layer Perceptron (MLP) is a supervised learning algorithm that resorts to backpropagation for training. It can distinguish non-linearly separable data (Van Der Malsburg 1986). Naive Bayes (NB) is simple probabilistic classifiers based on applying Bayes' theorem with strong (naive) independent assumptions between the features (Maron 1961). Random forests classifier (RF) is a meta estimator that fits a number of decision tree classifiers on a meta estimator that fits a number of decision tree classifiers on more than one subsamples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting (Breiman 2001). Decision trees classifier (DT) is a predictive modelling approach that uses a decision tree to go from observations about an item to conclusions about its target value (Quinlan 1986). K-Nearest Neighbours (KNN) is one of the most known and used methods for supervised pattern recognition. (Coomans and Massart 1982). The sample is classified by a plurality vote of its neighbors and is assigned to the class that is most common among its k nearest neighbors (Altman 1992).

RESULTS
Results were calculated from two different dataset that were created by the selected masks (see Figure 3).

Predictions within campaign
First predictions were estimated for unseen samples from the same campaign as training samples. Because of the limited number of samples and to avoid biased estimate, standard kcross-validation resampling procedure was applied. The parameter k was selected to 10, which split the training dataset into 10 subsets. This approach provides a reasonable estimation of a classifier performance on unseen samples. In, Best results in October campaign were achieved by MLP and DT. The former achieved 85% accuracy when using samples from the whole leaves and 84% when using samples from middle part of the leaf, while the latter achieved 87% and 83% using same datasets. Worst results can be spotted for NB and QDA when using the dataset composed of middle area spectral signatures. Worst classification accuracies, comparatively to October, were obvious in August campaign, not reaching over 77%. In July campaign, the overall accuracies were higher than in August, and similar to October.
The results for Loureiro can be seen in Table 3. From the table is clear that classification accuracies in October calculated by using second dataset were higher for all the classifiers than by using samples from the whole leaf. LR and LDA achieved 94% of accuracy. However, the same trend was not obvious in August campaign. Top-2 classifiers were again LR and LDA but achieved by using samples from the whole leaf. Very poor results were achieved in July, where most of the classifiers did not reach 60% of accuracy.  Table 3. Prediction accuracies for Loureiro in three campaigns
In Overall prediction accuracies to previous campaign in Loureiro caste were low. Highest scores were achieved in prediction from October to August, by LR, LDA, MLP and KNN, with about 70%. Prediction from October to July and from August to July did not surpass 65%.

CONCLUSION
The presented results showed that classification of FD using hyperspectral vine leaf images and ML approach is possible with reasonable accuracies while predicting within a same campaign, especially in the latest (October), where the disease is most developed. In Vinhão variety, the best classifiers were DT with 87% calculated using the samples from whole leaf, and MLP with 85% calculated from the same dataset. In Loureiro variety best performance was achieved by LR and LDA with 94% using the low midrib-based dataset. Minor differences were observed between mentioned datasets, despite the discrepancy of the number of samples collected for each dataset (1000x wider). The greatest difference can be identified in the results associated to the earliest campaign (July), where the prediction accuracies in Loureiro variety were poor. Several hypotheses might be considered as eventual justifications for such observation. One of them regards to disease's early stage in that particular campaign, which smooths characteristics while posing challenges to classification between "healthiness" and "infection", in a spectral perspective. Another one, can be related to representativity issues of the selected samples for training, more concretely, the uncertainty factor associated to pixels' FD labelling on leaves confirmed as infected through laboratory tests, a potential deception inducer for the classifier. Selecting all the samples from a leaf would not tackle this issue either, due to the eventual increased number of wrong labels that would be in use. Finally, the "curse of dimensionality", also known as Hughes phenomenon, caused by the high number of features and the limited number of training samples might be in the origin of the lately mentioned poor results. In that sense, the success of supervised ML is, of course, intimately related to dataset labelling consistency and correctness. Unsupervised learning might be complementary to this task, enabling userbased categorization of organized data clusters. Among predictions in Vinhão variety, campaigns prediction backwarding from October to August was best evaluated by QDA with 86%, using the low midrib-based dataset. Best prediction from October to earliest campaign in July was achieved by KNN with 76%, using the whole leaf-based dataset. In August to July prediction backwarding, best performances were achieved by MLP and QDA with 88%, calculated from the low midrib-based dataset. Prediction backwarding in Loureiro had diverse outcomes. From October to August, the best classifier was KNN with 73%, using the whole leaf-based dataset. However, regarding October to July predictions, none of the classifiers reached over 58%. In August to July predictions, only two classifiers reached over 62%: KNN and QDA. To achieve higher prediction accuracies for the task of plant disease detection, more robust and verified datasets of healthy and diseased leaves are needed. In future work, dimensionality reduction techniques will be explored, due to the redundancy induced by the hundreds of contiguous spectral bands composing hyperspectral imagery.