ESTIMATION OF MAIZE NITRATE CONCENTRATIONS USING EO-1 DATA AND A NON-LINEAR REGRESSION MODEL

Nitrogen compounds such as nitrates are considered the most important limiting factor for crop productivity. Monitoring the status of this element in crops has moved from destructive to non-destructive approaches. Remote sensing with ever evolving technologies has taken the lead on different crops across the world. This study assessed the potential of EO-1 data (Hyperion) to estimate nitrate concentrations in maize (Zea mays) leaves. The image was captured over the study area after the 11 week of planting. The random forest algorithm was useful for band selection to reduce data redundancy in the imagery, and regression analysis for nitrate predictions. Maize nitrate concentrations were detectable with key contributing wavebands as 752, 1043, 681, 851, 1820, 762, 862, 640, 1850, 609, 589, 569 and 650nm. From this list, a subset corresponding to previously identified bands was used to develop vegetation spectral ratios. There was improvement in accuracy of predictions from using: all selected wavebands, all developed ratios, and selected ratios as independent variables for the model with 752-681 contributing the most to an R = 0.90; and RMSEP = 0.15. Therefore, selected bands of Hyperion to develop ratios could be used to monitor spatial variation of nitrate concentrations in maize from canopy level.


INTRODUCTION
Crop productivity is challenged among other factors by sustainable management regimes which are to be guided by timely monitoring of the nutrient status of the plants at critical growing stages (Fageria, 2009). Plant nutrient assessment has traditionally been done through numerous manual field visits through to tedious laboratory work. The process is laboured intensive and destructive as the many visits translate to collection of plant tissue samples (roots, stems and/or leaves) for analysis. Over the years, non-destructive methods have been developed and being applied to monitor plant nutrient status such as nitrogen. Some of the methods included the use of chlorophyll metres to measure the concentration of the pigment in plants and relating it to their nutrient status (Blackmer & Schepers, 1994). The concentration of the chlorophyll pigment is largely nitrogen (N) and usually enhanced through fertilisation that is required to be applied not just at the right time but also the right amounts required by the different crops (Fageria, 2009). The right application is important both for economic and environmental reasons. The nutrient variability spatially through soil quality is a confounding challenge to the right application of fertilizers as it requires a thorough assessment of N in the soil and/or crops, for which such assessments could be costlier over large areas. The application of remote sensing provides potential to rapidly and less costly to assess or monitor such nutrient status in not just soils but crops alike (Tilling et al., 2007).
The relationship between N and chlorophyll has been established (Lee et al., 1999) and has been proven as a linear relationship (Houlès et al., 2007;Ziadi et al., 2008;Haboudane et al., 2008). The use of remote sensing in assessing N through the chlorophyll content at canopy level has been attempted in the light of growth stages and other agronomic factors (Strachan et al., 2002;Chen et al., 2010). In a critical review Patane and Vibhute (2014) concluded on the concept of this relationship though with variations results not just from different techniques applied but also different crop biochemical constituents.
Hyperspectral data complexity has seriously challenged the management of hyperspectral data and usually overcome by selection of most relevant spectral wavelengths or bands (Clevers & Jongschaap, 2003;Thenkabail et al., 2004). The selectivity is on the basis that sunlight energy reaching vegetation surfaces is either absorbed or reflected back with the latter (reflectance) measured by a remote sensor at different wavelengths ( Figure 1). The reflectance is more in the visible region (VIS) being influenced by the presence of leaf tissue pigments such as chlorophyll that should relate to leaf N status (Haboudane et al., 2002;Rodriguez et al., 2005). Chlorophyll a and b have been found to relate to the two absorption regions on the VIS being the blue (400-500nm) and red regions (600-670nm). Chlorophyll pigments are harboured in the chloroplast VIS which contains about 75% of total plant N (Lawlor, 2001). These absorption regions can serve as a measure of the N or its related compounds such as nitrate content in plants where the higher the absorption, the higher the chlorophyll content and so too will be the N content. In relation to this, hyperspectral data has been used over canopies of agricultural fields to characterise the variation of the chlorophyll pigment at an accuracy of 92% (Gitelson et al., 2005).
In transition from the VIS to the near-infrared (NIR) region is the red edge ( Figure 1). This is also a region of interest for crop biochemical studies. This region lies between the 680-760nm and the reflectance have correlated positively to both leaf and canopy N status (Barnes et al., 2000;Cho & Skidmore, 2006). The next portion on the electromagnetic spectrum is the NIR region stretching from 700 to 1300nm and contributes to crop nutrient status by the leaf internal structure which reflects high energy levels as opposed to the absorptions in the VIS. The region lying from the 1300-2500nm is known as the shortwave infrared (SWIR) (Figure 1) which has more absorption regions as a result of water content in the leaves (Thenkabail et al., 2004). However, some wavelengths found in the region reflect energy accounted for by the protein and starch content in the leaves (Murray & Williams, 1987). The visible and nearinfrared region (VNIR) has been described as the chlorophyll absorption feature (Kumar et al., 2003). This region shows through its reflectance a strong relationship with nitrogen content in plants (Mutanga & Skidmore, 2003;Zhao et al., 2005). However, a contrary situation has been illustrated of a weak relationship between the nitrogen content and canopy reflectance of crops such as rice under controlled conditions (Stroppiana et al., 2006) which have not deterred other recent researchers from obtaining success with this region in rice nitrogen contents through different sensors (Wang et al., 2013). There have also been other studies across other agricultural crops with varying but acceptable levels of accuracies (Ngie et al., 2014).
The wavelength range characteristics has been investigated as and another alternative approach for more efficient and accurate modelling or predicting N status in plants. The spectra measured from the visible and NIR regions (400-900nm) calibrated for plant N status resulted in an R 2 of 0.71 with error of prediction as 0.38% (Hansen & Schjoerring, 2003). In a similar study using spectra measured at the 530-1100 nm the results were better having R 2 of 0.81 with error of prediction as 0.27% (Alchanatis et al., 2005) and a subsequent increase wavelength range from 400-2500nm yielded an R 2 of 0.89 with prediction error of 0.64% (Morón et al., 2007). However, it should be noted that some of the differences could have accrued from the different analytical methods applied on the data sets or sensors used to measure the reflectance.
While some studies applied ratios of the reflectance value measured at the red and NIR region to detect plant nutrients (Jackson et al., 1981), other utilised single wavebands (Stroppiana et al., 2006). Most of the above cited studies have made use of reflectance values from single wavebands or optimising wavebands that could predict plant nitrogen content. It should be noted that one advantage in the use of vegetation indices has been the reduction in variations resulting from canopy geometry, irradiance and shading. They have also assisted in minimizing soil background effect on canopy reflectance (Jackson & Huete, 1991). Hence, the selection of wavebands for the creation of vegetation indices or ratios as input variables into predictive models using space-borne hyperspectral studies on biochemical content on field crops under irrigated or controlled conditions is relevant than using just single wavebands (Jain et al., 2007); (Zhu et al., 2008). Also most of these studies have utilised handheld spectroscopic devices to measure reflectance to assess N status except for a few that have utilised airborne and satellite-borne sensors (Huang et al., 2004;Oppelt & Mauser, 2004;Vigneau et al., 2011).
Hence, this study used EO-1 data through a non-linear regression models in assessing nitrate concentrations in maize growing under field conditions from canopy level. This is done in a subtropical area where moving through the fields can be challenged by its terrain. This is done through firstly, a selection of wavebands of importance in relation to maize nitrate content. Secondly, to use these selected wavebands to develop spectral vegetation ratios as input (with varied numbers) into a nonlinear regression.  (Botha et al., 2007). Its summers are hot and dry with scarce episodes of rainfall while the winters are frosty and cold. The local geology is primarily sedimentary rocks that form the basis on superficial deposits of rich agricultural soils mostly of sandy clay loamy texture (Botha et al., 2007). The cultivation of field crops in summer such as maize is done once in a farming season between December and July. The cultivation is usually supported through flooding-type irrigation schemes.

Field and laboratory analyses
The field was planted with maize and variable fertilize applications into three categories of low, medium and high or normal nitrate content. After the 11th week of planting with almost fully ground coverage, thirty leaf samples (10 across each nitrate category) were randomly collected (noting the coordinates of sample point) through excision on the third fully expanded leaf on the maize plants. The leaf samples were packaged in sampling bags and taken to the laboratory for chemical analyses. The samples were oven dried to eliminate moisture and the midrib region of the leaves excluded before crushing in a pulveriser. A sample solution was prepared from the crushed leaves as collected from the field.
The solutions were used to quantify the amount of nitrate in the maize leaves through ion chromatography (IC). The IC is a technique to analyze solutions containing complex mixtures of ions. In addition, it was considered as a rapid and sensitive technique for separation of anions. Plant tissue extraction with water was preferred since it reduces the challenges that are linked with safety, disposal, or masking of ions that might occur in extractions with acids as proven with eluent of corn leaf sap (Masson et al., 1996). The Dionex™ Potassium Hydroxide Eluent Generator Cartridge (EGC-KOH) system was used with a flow rate of 0.25mL per minute to analyse the anions and results recorded.

Image acquisition and pre-processing
Hyperion images were acquired over the study area after tasking on the United States Geological Society (USGS) website during the period of field visits. A cloud free image was obtained on the 2 nd of April 2014 on the 171/080 path/row scene. There is usually a challenge of illumination to some of the Hyperion bands while others suffer from overlaps between the two spectrometers and end up without values or reflectance values set to zero during the Level 1B pre-processing (Datt et al., 2003). In reviewing the available bands of the acquired Hyperion images, over 44 bands out of the 242 were without reflectance values. These consisted of bands 1-7, 56-78 and 225-242 with the remaining 196 bands radiometrically corrected and calibrated to at-sensor radiance (Beck, 2003;Green et al., 2003).
The images were geometrically referenced to a Landsat ETM + image (19 January 2012) which was already georeferenced (Universal Transverse Mercator (UTM), zone 35 South). In minimising the effects of systematic noise in the image, a destreaking algorithm proposed by Datt et al. (2003) was applied to reduce the stripping effect. After which the radiance images were then corrected atmospherically and transformed to reflectance at canopy level using the MOD-TRAN based Fast Line-of-sight Atmospheric Analysis of Spectral Hypercube (FLAASH) algorithm that is built within the Environment for Visualising Images (ENVI version 5.0) software package. Through derivation of atmospheric properties including water vapour, surface albedo and others, FLAASH provides a welladjusted input for the atmospheric correction (Thenkabail et al., 2013). The images were resampled to their initial spatial resolution of 30m, the nearest neighbour algorithm was performed wherein for an indicator of a good geometric correction, the root mean square error (RMSE) considered was at less than a pixel (Ferencz et al., 2004). Through the use of the collected ground truth points during sample collection, the maize canopy reflectance spectra were extracted from single pixels for statistical analysis.

Statistical analysis
The relationship between biochemical concentrations in plants and their measured reflectance maybe indeed nonlinear (Miglani et al., 2008). The nonlinear regression models predicted nitrogen concentrations in vegetation with air-borne hyperspectral data set at canopy level with higher R 2 values than the linear models. In predicting sugar cane leaf nitrogen concentrations using reflectance values from a space-borne sensor, a non-linear model again outperformed the multiple linear regression models with higher R 2 values (Abdel-Rahman et al., 2013). Hence, the choice of applying a non-linear model in this study was established and being the random forest (RF) ensemble.
The RF is a machine learning nonlinear algorithm that operates through two major parameters which include the ntree and mtry.
The ntree is the number of trees that are to be used in an ensemble while the mtry is the number of variables that is being chosen randomly at each split. Recursive partitioning is used to divide data into regression trees and average the results of all trees. The operations are guided by the fact that every regression tree is developed to maximum size independently without pruning through bootstrapping samples from the training (2/3 of total samples) and testing (1/3 of total samples) data sets. Regression trees can at each node be combined and choosing randomly a subset of input variables (mtry) from which to calculate the split (Breiman, 2001).
The out-of-bag (OOB) error estimation derived through data predictions that are being considered in each tree is a means to evaluate performance of the RF ensemble. For each variable in the regression tree, the OOB error is calculated as the difference in mean square error of the data used to develop regression trees and that of the OOB. The comparison of the OOB errors to the previous or original ones yields values that indicate the importance of the variable wherein it shows how the error varies when a variable is permuted and all the other variables left unchanged (Prasad et al., 2006). However, for the predictive model, the accuracy for each run was by measuring the root mean squared error of prediction (RMSEP) which is considered more stable than the OOB (Abdel-Rahman et al., 2013).
The ensemble has the capability of performing variable of importance selection which reduces the high dimensionality in hyperspectral data sets and the predictive modelling. The process is to select a subset of relevant variables for use in model construction. The process is guided by the fact that the data contains many redundant wavelengths as identified by its low spectral ranges (Thenkabail et al., 2004;Kokaly, 2001). The original RF ensemble by Breiman (2001) can perform this task of selecting variables but has been criticised for bias selection (Strobl et al., 2007). The cforest function was introduced with the RF ensemble that performs the selection but with minimal bias (Strobl et al., 2009).

Maize leaf nitrate concentrations
The maize nitrate concentrations measured in the laboratory from leaves collected during field visits that was about 13 days (12/03/2014) to the passing of the EO-1 sensor, recorded minimum and maximum amounts as 0.2% and 1.4% of total anions respectively with standard deviation of 0.3% for the three different fertilization categories. There was no scenario of 0% nitrate measured for any of the leaves collected which was affirmation to the application of N fertilizers during planting across the entire field to enable germination of all seeds. The categories of fertilization were created after germination through top dressing with regulated nitrate amounts. The amount of nitrates in the maize leaves measured in the laboratory corresponded with the different samples from the various categories.

Hyperion wavelength selection
The most important 26 wavelengths of the Hyperion image to maize nitrate content were identified for this study (Figure 3). The wavelengths were from the red region, NIR and the early mid-infrared regions of the electromagnetic spectrum. There were six wavelengths amongst the 26 selected that appeared with high OOB errors and corresponded to those identified in previous studies (Thenkabail et al., 2004;(Miglani et al., 2008). These included 609, 640, 650, 681, 752 and 1043 that corresponded to bands 26, 29, 30, 33, 40 and 90 according. It should be noted that the order of these wavelengths here was not derived from the OOB error listing as will be seen below ( Figure 3) but rather sorted according to the values (highest to the lowest of the 26).
Figure 3: 26 wavelengths selected for maize canopy nitrate content prediction from Hyperion image According to Thenkabail et al. (2004) the six identified bands in this study belonging to the region 599-650 are relatively sensitive to biomass; 671-681 are chlorophyll absorption region; 742-752 which is the red edge region is sensitive to vegetation stress and/or dynamics; and the 1003-1053 is related to plant moisture status, biomass and even the leaf area index (LAI) which are all functions of the biochemical status of the plants. The first three regions have jointly been labelled chlorophyll absorption bands and are closely related to nitrogen concentration in plants (Huang et al., 2004;Curran, 1989). In this study, some of the waveband regions in the most relevant (bigger OOB error values) 26, there was the selection of some around the protein absorption region at 1648 nm, which is in proximity to 1645 identified by Murray and Williams, (1987). The 752 nm was also identified by Vigneau et al. (2011) in addition to the other studies stated above as important in estimating N concentration in wheat plants.

Development of vegetation spectral ratios and nitrate assessment
The identified six wavelengths were used to develop all possible combinations of spectral ratios. This was done on spectra using the formula for the normalised difference vegetation index (NDVI) (Rouse et al., 1974). Hence, they were NDVI-based ratios. All the developed NDVI-based ratios or vegetation spectral ratios used as independent variable input into the predictive RF model after which there was another selection of variables based on their contribution in the permutation ( Figure  4). Once again the results presented are the top most 26 ratios ranked by their OOB errors. Those ranked most important were created from the red edge bands including the 640, 650, 681 and the 752nm. Figure 4: Selected vegetation-based ratios from 6 X 6 identified bands The red edge-based ratio (752-681) and others are selected as the most important variables in the RF ensemble for nitrate content assessment in maize plants. The bands involved are related to those in the modified chlorophyll absorption in reflectance index (MCARI) that was designed to reduce the influence of soil background on photosynthetic active radiance (Daughtry et al., 2000). The index has been refined to improve sensitivity to the non-photosynthetic materials below crop canopy and established the transformed chlorophyll absorption in reflectance combined with the optimised soil-adjusted index (Haboudane et al., 2002). All these indices were made up of bands within the red edge region and have potential in the estimation of biochemical content of crops (Miglani et al., 2008;Bannari et al., 2008;Wang et al., 2017). The red edge position has been proven to be influenced by N concentration in the leaves and other critical crop parameters such as water status (Schlemmer et al., 2005); (Schlemmer et al., 2013). However, the selected variables in this study were nowhere around the known water absorption regions (970, 1450 and 1950nm). This means there was no influence from water absorption in the performance of the models (Figure 4). It would be of interest to perform the experiments and vary not just N concentrations but other critical parameters like water status since both scenarios could occur in a field.
The 752-681 vegetationbased ratio identified in this study, though with related wavebands in previous studies did not also match some of the ratios or indices reported in the past. In assessing the potential of EO-1 in estimating the chlorophyll content of wheat through a wide range of chlorophyll indices, also realised the most performing index was the normalised difference pigment index (NDPI) that is developed from reflectance values centred around wavelengths 430 and 680nm which belonged to the VIS and red edge region (Bannari et al., 2008). The NDPI was also identified as best input to the model testing potential of the VIS/NIR spectroscopy in predicting nitrogen concentrations in pear orchards (Wang et al., 2017). The common interest of the above studies and the present one is the fact that the wavebands comprising both ratios are from a similar region of the electromagnetic spectrum and the difference in specific bands could be attributed to difference in leaf structure or conditions under which spectra was collected. However, the studies all point in the direction of interest to the relationship between nitrogen or chlorophyll concentration in plants to spectral performance.

Maize leaf nitrate prediction
The prediction of maize leaf nitrate content using the imaging spectroscopy from canopy level was done through the RF ensemble as a non-linear regression model. The accuracies of the predictive models varied not just with the type of independent variable added but also the number of these variables being added. For instance in comparing the performance of the selected spectral bands and the developed vegetation spectral ratios, the latter recorded a higher R 2 of 0.82 with an RMSEP of 0.29 whereas the former had an R 2 of 0.76 and an RMSEP of 0.17 ( Figure 5(a.) & (b.)). The implication was that with the selected spectral bands of importance in predicting maize leaf nitrate content, the error margin was smaller than the unselected spectral ratios. The selected vegetation spectral ratios also used as independent variables for the predictive model illustrated results of better accuracy with R 2 = 0.86 and RMSEP = 0.12 ( Figure 6 (c.)). The results were better than those obtained from all the possible vegetation spectral ratios. To further test on variable performances, the cforest ranking of the vegetation spectral ratios identified a subset of the top 10 most important ratios. Figure 6: The relationship between measured and the predicted maize nitrate concentration from the RF model using: (c.) Selected 25 vegetation spectral ratios and (d.) selected top 10 ratios These top 10 ratios were mostly developed from the red edge bands with the exception of 1043nm. These 10 vegetation spectral ratios outperformed all the other variable sets implemented in this study in predicting maize leaf nitrate content at canopy level with an R 2 of 0.905 but again with an RMSEP of 0.151 ( Figure 6 (d.)). The implication of this RMSEP value is that the predicted values of maize leaf nitrate content using the 10 ratios was further away from the measured amount than with the 26. This was quite interesting to realise that while narrowing down the variables to selected ones of importance could only get better results to an extent. There have been studies showing the optimal wavelength or band numbers for studies though on discriminating crop types and did show that there is a point of asymptote in the number of these wavelengths being 12 (Thenkabail et al., 2000), 22 (Thenkabail et al., 2004) or 26 (Miglani et al., 2008). While the first scenario was performed on a shorter spectral range (maximum of 1100nm), the recent ones were on the full range up to 2500nm. Further research will however be required to establish the optimal number of variables for predictive models of maize leaf nitrate content since that was not within the scope of this study.

CONCLUSIONS
The imaging spectroscopic data from EO-1 Hyperion images predicted maize leaf nitrate concentrations from canopy level under field condition. This was possible through the dual functionality of the random forest algorithm in reducing redundancy by selecting variables of importance and regression models for predictions. The selections of variable aided in improving the accuracy of the models. The spectral vegetation ratio developed with bands 752 and 681nm was relevant in contributing to the most accurate model for maize leaf nitrate concentrations with R 2 of 0.90 and RMSEP of 0.15. It consolidates the importance of the red edge region in assessing plant biochemical status.