ESTIMATING BIOCHEMICAL PARAMETERS OF TEA (CAMELLIA SINENSIS (L.)) USING HYPERSPECTRAL TECHNIQUES

Tea (Camellia Sinensis (L.)) is an important economic crop and the market price of tea depends largely on its quality. This research aims to explore the potential of hyperspectral remote sensing on predicting the concentration of biochemical components, namely total tea polyphenols, as indicators of tea quality at canopy scale. Experiments were carried out for tea plants growing in the field and greenhouse. Partial least squares regression (PLSR), which has proven to be the one of the most successful empirical approach, was performed to establish the relationship between reflectance and biochemical concentration across six tea varieties in the field. Moreover, a novel integrated approach involving successive projections algorithms as band selection method and neural networks was developed and applied to detect the concentration of total tea polyphenols for one tea variety, in order to explore and model complex nonlinearity relationships between independent (wavebands) and dependent (biochemicals) variables. The good prediction accuracies (r2 > 0.8 and relative RMSEP < 10 %) achieved for tea plants using both linear (partial lease squares regress) and nonlinear (artificial neural networks) modelling approaches in this study demonstrates the feasibility of using airborne and spaceborne sensors to cover wide areas of tea plantation for in situ monitoring of tea quality cheaply and rapidly.


INTRODUCTION
Tea consumption is rising in recent years, for the special flavour and the possible beneficial effects on human body.Consequently, it has become increasingly important to be able to give reliable estimates of the tea quality (Yan, 2007).
Traditional methods to determine tea quality is mainly handled by tea experts, which may bring inconsistent and subjective results, or based on wet chemical analysis, which is time and labour consuming.Being effective and quantitative, the development of new techniques using hyperspectral remote sensing data has offered possibilities to estimate and monitor vegetation quality in space and time (Knox et al., 2011;Mutanga and Kumar, 2007).
Hyperspectral remote sensing techniques have been developed from a laboratory-based near infrared spectroscopy (NIRS) technique (Curran et al., 2001).The narrow sensitive band range (10 nm or less) makes it possible to detect subtle variations in the reflectance spectra, which are caused by differences in biochemical composition and physiology of vegetation (Davey et al., 2009;Schlerf et al., 2010).In recent years, researchers have extended the technique of reflectance spectroscopy to measure biochemical parameters of vegetation by field spectrometer or airborne or spaceborne sensors, trying to explore the chemical variation of vegetation in a spatial context (Curran, 1989;Schlerf et al., 2010;Skidmore et al., 2010).
Tea polyphenols compose of four main substances as catechins, flavonoids, anthocyanins and phenolic acids, accounting for 20-35% of the total dry matter.It contributes greatly to tea taste and quality.In practice, people only pluck the young tender buds and leaves for producing tea product with high-quality.Compared with older leaves, this part of tea plant contains the optimal ratio of polyphenols and amino acids, which forms the special taste of tea beverage (Mitscher and Dolby 1997).
This research aims to estimate the concentrations of main tea quality-related compounds (total tea polyphenols) using reflectance spectroscopy for tea plants at canopy level.Both linear (partial least regression) and nonlinear (artificial neural network) regression methods have been attempted.To detect whether the spectral-chemical relationships exist for the whole tea species, partial least squares regression was performed to establish the relationship between reflectance and biochemical contents across different tea varieties.Furthermore, a hybrid approach of neural network and successive projection algorithms (variable selection) has been applied for the estimation of total tea polyphenols for one tea variety planting in a greenhouse.

Study area and Data sets
The research was conducted in the Huazhong Agriculture University in Wuhan, China (latitude 30°28'41"N, longitude 114°21'48"E).Part of the data was colleted in the tea garden of the university, while another part of the data was colleted from a greenhouse experiment (Figure 1).Six different varieties of tea including Fuding dabai (FD), Fu yun 6 (FY), E cha 1 (EC), Tai cha 12 (TC), Huang dan (HD) and Mei zhan (MZ) in the tea garden were selected as study objects, to detect whether the modelling methods can be extended to various tea varieties.The tea bushes are so dense that soil background is barely seen from the canopy above.For each tea variety, eight samples were randomly collected.Thus, in a total, 48 (8x6) samples were obtained.
For greenhouse experiment, young plants of Fuding Dabai tea were planted in the greenhouse under controlled conditions.To stretch the chemical variation in the sample, eight soil treatments with different levels of available soil nutrient were designed (Table 1) .Each soil treatment had eight repetitions， and a total of 64 samples (8*8) were collected for the greenhouse experiment.

Canopy spectral measurement
On a cloud-free sunny day, canopy reflectance was measured using ASD FieldSpec Pro FR spectrometer (Analytical Spectral Devices).The spectrometer covers a range from 350-2500 nm with sampling intervals of 1.4 nm between 350 nm and 1000 nm, and 2 nm between 1000 nm and 2500 nm.The fiber optic was handheld approximately 10-20 cm above the top of the canopy.To avoid bidirectional reflectance distribution function (BRDF), the pots were rotated 60º after every ninth measurement of the canopy.Before taking a canopy measurement, the radiance of a white spectralon panel was measured for normalization of the target reflectance.
After the canopy measurements were finished, one bud with three or four leaves of tea bushes in the field were clipped.For tea plants growing in the greenhouse, four or five pots together were regarded as one observation and the tea leaves were plucked, to make sure enough tea leaves for chemical analysis in the laboratory.The weight of the fresh leaves for each sample unit has to been at least 40 grams to satisfy the need for wet chemistry analysis.

Biochemical Assay
Standard wet chemistry methods were used to determine the concentrations of total polyphenols.The leaves were steamed for three and a half minutes to destroy enzyme activity causing oxidation of the tea (Yamamoto et al. 1997) before drying in an oven at 80°C.Next, the dried leaves were ground using an electric mill.Total tea polyphenols were determined by the ferrous tartrate colorimetry method and spectrometry at 540nm (Iwasa and Torii 1962).

Spectral pre-processing
The bands regions 350 nm-400 nm, 1350-1420 nm, 1800-1970 nm and 2300-2500 nm displayed high levels of noise due to atmospheric absorption, and were excluded from the data.Before data analysis, the reflected spectra of 64 observations were mean-centered by subtracting their means (Araújo et al., 2001;Cho et al., 2007).

Partial least square regression (linear regression approach)
Partial least squares regression (PLSR) combines the features of principal component analysis and multiple regressions.It compresses a large number of variables to a few latent variables (PLS factors).It is particularly useful when the size of independent variables is much larger than that of dependent variables.PLSR reduces the problem of over fitting found with the multiple regression (Card et al., 1988;Curran, 1989).
Partial least squares regression was performed to establish the relationship between reflectance and biochemical contents across different tea varieties.

A hybrid approach (nonlinear regression approach)
For one tea variety growing in the greenhouse, the neural network approach were applied to build the spectral-chemical relationship using nonlinear regression way.A one hidden layer feed-forward, error-back propagation artificial neural network were adopted in this research, for this algorithm has been frequently and successfully used in previous studies (Skidmore et al. 1997).To find the optimal number of nodes in the hidden layer, we investigated the training and test accuracies using different number of neurons (1-20) in the network (the maximum number was designed no more than 20 to keep the model parsimony and save the calculation time).Levenberg-Marquardt optimization method was used to train the networks in which the parameters of networks were adjusted adaptively (Lera and Pinzolas, 2002;More, 1978) and an earlier stop technique was applied in this study to avoid overtraining (Lin and Chen, 2004).
Before running the neural network model, an effective variable selection method named successive projections algorithm was applied to spectral data (350-2500 nm) after pre-processing.It is a forward selection approach.The purpose of this algorithm is to select wavebands containing minimally redundant information, so that collinearity problems caused by hyperspectral data can be minimized.
The available data (64 samples) were randomly divided into three groups: the training dataset (n = 32, 50% of the sample), the validation dataset (n = 16, 25% of the sample) and the test dataset (n = 16, 25% of the sample).The performance of the ANN model was evaluated by the root mean square error of prediction (RMSEP) between the predicted and measured concentration based on the test dataset (Mutanga et al., 2004).To speed up the training process of neural networks models, the input data of chemical concentrations were normalized between 0 and 1 (Mutanga et al., 2004).

RESULTS
Table 2 shows the measured concentrations for total tea polyphenols by varieties and soil treatments.All values are reported on a dry-matter basis.The range of the chemical data accords with the values which have been previously reported.For tea polyphenols measured for greenhouse experiment , the combination of higher level of nitrogen, phosphorus and potassium resulted in the maximum concentration and vice versa. No.
Mean (mg g -1 ) Minimum (mg g -1 ) Maximum (mg g For different tea varieties, using partial least squares regression, observed versus predicted concentrations of tea polyphenols for both training (N=30) and test (N=18) data are shown in Figure 2. The satisfactory accuracy of prediction was obtained at canopy level : based on the independent data set, total tea polyphenols were estimated with high r2 values (> 0.8) and low RMSEP values (RMSEP = 13.68 mg g-1 , RMSEP/mean = 6.63%).
Figure 2. Scatter plots describing the measured and predicted total tea polyphenols for training and test using canopy spectra (mean centred).r 2 is coefficient of determination between model predictions and measured chemical concentrations on test data set, and RMSEP is the root mean square error of test data prediction.
Figure 3 presents relationships between the predicted and measured biochemical concentrations using a hybrid of neural networks and SPA variable selections (SPA-ANN): on test data set, using the wavebands selected by the successive projections algorithm, the neural networks with optimal settings yielded coefficient of determination r2 of 0.82, for the prediction of total tea polyphenols in the greenhouse experiment, with a root mean square error of 4.30 mg g-1 (3.0% of the mean).
Figure 4 shows the optimal choice of the number of wavelength selected by successive projections algorithm.According to the criterion of the root mean square error of validation, the best choice of 12 wavebands has been selected for the prediction of total tea polyphenols.In an order of importance (from most to least), wavelengths selected by SPA for the prediction of total tea polyphenols are 2001 nm, 2206 nm, 1424 nm, 1799 nm, 1439 nm, 1426 nm, 689 nm, 1971 nm, 1428 nm, 1435 nm, 1422 nm and 1502 nm.
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B8, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia Measured (mg g -1 ) Figure 3. Relationships between the predicted and measured total tea polyphenols using a hybrid of neural networks and SPA variable selections, according to the test dataset (n=16).
Figure 4. Choice of the optimal number of wavelength (circled positions) by successive projections algorithm for the prediction of tea polyphenols.The criterion is to find a minimum number of wavelengths for which the errors is not significantly larger than the lowest one.This is determined by an F-test.

DISCUSSION AND CONCLUSION
The utility of reflectance spectroscopy for predicting tea quality-related biochemicals at canopy level is demonstrated in this paper (figure 2 and 3).In previous studies, some phenolic substances of tea, including total tea polyphenols, catechins, epigallocatechin gallate (EGCG) and epicatechin (EC) have been successfully estimated using near infrared spectra for dried tea powders and dried leaves (Chen et al., 2008;Luypaert et al., 2003).Our results revealed that foliar chemical concentrations of tea (total tea polyphenols) can be retrieved not only from the spectra of dried powders, but also for living tea plant material.
Prediction accuracy decreased using canopy spectra, compared to reported results using powder spectra.The variability of spectra reflectance of dried tea powders are mainly correlated with the amounts of chemical compounds, as the effect of absorption by water is reduced considerably and the effect of leaf cell structure could be impaired (Curran et al., 2001;Kokaly and Clark, 1999).At the canopy level, reflectance variability is due to additional factors such as LAI (leaf area index), foliar water content and canopy architecture (Gitelson et al., 2003;Kokaly et al., 2009).This may be the main reason of relatively lower prediction accuracy for tea plants.
Our study showed that partial least squares regression is an effective method to retrieve biochemical parameters from canopy spectral reflectance of tea plant (figure 2).Predictive models based on partial least squares regression produced satisfactory accuracy.This results is consistent with those of Darvishzadeh et al. (2008) who in a field experimental study on green grass reported a better predictive performance of PLS regression analysis compared with for biophysical and biochemical parameters estimation.PLS has the potential to exploit the rich information content of hyperspectral data.
It also demonstrated that the quality of tea can be predicted with satisfactory accuracy from the hyperspectral data at canopy level, using artificial neural networks in combination with successive projections algorithm (figure 3).Based on the optimal wavelengths selected by the successive projections algorithm, neural networks worked well for the prediction of total tea polyphenols using the canopy spectra of tea plants: the relative root mean square errors (RMSEP/mean) were less than 10% on an independent test dataset.The goal of SPA solution is to find a small representative set of spectral variables with an emphasis on the minimization of collinearity (figure 4).Our results confirm recent studies that have successfully applied the successive projections algorithm for the predicting biochemical concentrations in vegetation science (Liu and He, 2009).
Considering that data collected in the field or in a greenhouse were under natural atmospheric and illumination conditions, this research has demonstrated that there is potential to use reflectance spectra to predict in situ tea quality in space and time.As our experiment was carried out at canopy level using field spectrometer, when using airborne or spaceborne hyperspectral remote sensing, the retrieval of biochemical parameters for tea plants may be more difficult, as biochemical absorption features may be affected by complex environmental factors such as atmospheric and topographic effects .
The following conclusions were drawn from this study: (1) Our results suggest that biochemical components (total tea polyphenols) of tea quality can be quantitatively estimated from canopy spectroscopy.The canopy spectra may have the potential to predict the foliar biochemical concentration of tea.
(2) When up-scaling to canopy level, predicting total tea polyphenols was achieved with lower accuracy compared to reported results in literatures in which dried or ground powder spectra have been used.
(3) partial least squares regression is able to locate surrogate spectral features for estimating the concentration of fresh leaf biochemicals of tea.(4) A novel integrated approach proposed in our study, involving a forward selection algorithm (successive projections algorithm) to choose the optimal number of wavelengths and neural networks can be used for a better simulation of nonlinear relationship between biochemical concentration and spectral signatures of tea canopy.
In summary, the successful chemical estimation from canopy spectra shows the possibility of using hyperspectral remote sensing (air or space-borne sensors) to predict tea quality quantitatively and non-destructively at landscape or regional scales before its plucking, based on the methodology described in this paper.

Figure 1 .
Figure 1.Location of the Huazhong Agriculture University, Wuhan, China (left part of the figure).The right part shows the pictures of the tea garden (top) and the greenhouse setup before fertilization (bottom) in the university.

Table 2 .
Descriptive statistics of the total tea polyphenols measured in the laboratory