INTEGRATED QUALITY EVALUATION OF THE IN-SITU NETWORKING MEASUREMENTS AND UPSCALING USING GAUSSIAN PROCESS REGRESSION

The flourished development of wireless sensor network technology sheds light to the effective and inexpensive collection of in-situ networking measurements. This will contribute to the temporal validation of coarse resolution remote sensing products. However, the quality evaluation of the in-situ networking measurements and upscaling is still problematic. This study proposed an evaluation method based on Gaussian Process Regression (GPR). Specifically, the qualities of networking measurements and upscaling were evaluated through the relevance of each plot, and the pixelwise coefficient of variation of the scaling results. Both of which can be generated by GPR. The preliminary results demonstrated the potential of the proposed method on quality evaluation of upscaling. Its potential on measurements (per se) quality evaluation will be analysed future. * Corresponding author 1. INTRUDUCTION Direct validation of coarse resolution remote sensing products need the support of in-situ measurements (Camacho et al., 2013; Fang et al., 2012; Yan et al., 2016). Moreover, an upscaling process is also needed to interpolate the discrete measurements and get a spatially explicit reference map (Morisette et al., 2006). Traditional in-situ measurements are often collected through field campaign which is often labor-intensive and timeconsuming because of its manual nature (Breda, 2003; Jonckheere et al., 2004; Weiss et al., 2004). The flourished development of low-cost near-surface remote sensing sheds light to the effective and inexpensive collection of in-situ measurements (Campos-Taberner et al., 2016b; Ryu et al., 2014). Wireless sensor network (WSN) is one of near-surface remote sensing systems, which comprises an array of sensor nodes and a wireless communications system (Qu et al., 2014). The sensor nodes can be located according to site-specific spatial sampling strategies to capture the surface heterogeneity. Therefore, WSN can provide unattended networking observations, which are important for temporal validation. Although the validation protocol has long been proposed (Morisette et al., 2006), a scientific question is still not clear: Is the reference map per se credible, and how to evaluate its credibility? For the in-situ networking observations from WSN, we should address an additional question: Is the automated, unattended observations accurate or robust enough to generate high resolution reference parameter maps. The above scientific questions bring in the following main objective of this study: to propose an integrated framework to evaluate the quality of the in-situ networking observations and upscaling. The evaluation method is based on Gaussian Process Regression (GPR). Performances of the proposed method on the quality evaluation of upscaling were tested over a crop site. The in-situ networking observations and the corresponding high resolution NDVI were provided by the LAINet and the CACAO fused NDVI images, respectively. The evaluation of the LAINet observations through GPR is still in progress. 2. DATA COLLOCTION 2.1 LAINet observations The research was conducted in a 5 km × 5 km region (centred at ~ 40°22′N, 115°46′E ) near Huailai, northern China (Figure. 1), which is one of core observation fields of the Validation network for Remote sensing Products in China (VRPC) (Ma et al., 2015). The selected size was tailored to match MODIS pixels. Figure 1. Map of the study area overlaid by the MOD15A2 product pixels (nominal resolution of 1 km × 1 km). The displayed image corresponds to a color composite (bands 5-4-3) of Landsat-8 OLI image acquired on August 23, 2013. The points represent the locations of the plots in the LAINet observation system deployed in the study area. The LAINet observation system (Qu et al., 2014), based on WSN technology, was used to obtain temporally continuous The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W7, 2017 ISPRS Geospatial Week 2017, 18–22 September 2017, Wuhan, China This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-W7-567-2017 | © Authors 2017. CC BY 4.0 License. 567 field measurements. It can measure the leaf area index (LAI) automatically and continuously. LAINet consists of Below Node (BN), which is deployed below the canopy and records transmitted radiation, the Above Node (AN), which is used to record downward radiation above the canopy, and the Central Node (CN), which is used as a data reception and control node. Communication among the nodes is achieved through Zigbee protocol, while data exchange between the CN and the remote Data Server (DS) is completed through the General Packet Radio Service (GPRS) network. The two types of measurement nodes (AN and BN) have the same hardware configuration and software functions to ensure their consistent response to radiation, which is the prerequisite to calculate gap fraction. Because the downward radiation above the canopy can be considered to be spatially homogeneous at local space, a small number of measurements can represent the 5 km × 5 km study area, so one AN was deployed, and the AN consists of three quantum sensors. The LAINet deployed in our study area contained 12 plots, and the locations of those plots were determined by the SMP (Sampling strategy based on Multi-temporal Prior knowledge) sampling approach (Zeng et al., 2015). SMP can capture the spatiotemporal variation of vegetation growth. The LAI estimation algorithm is based on the gap fraction theory, and uses beam radiation at different solar zenith angles to measure multi-angle gap fraction. When calculating gap fraction in a certain solar zenith angle for each BN, the average of the measurements from the three quantum sensors equipped in the AN was taken as radiation reference (EA); meanwhile, the average of the measurements from the nine quantum sensors equipped in the BN was taken as the transmitted radiation under the canopy (EB), and the ratio between them (EB/EA) was seen as the gap fraction. Our study period is between July 1 (DOY 182) and September 14 (257), 2013 when a LAINet observation system were deployed and operated in the study area. Due to the unexpected instrument failure (e.g., dead battery and communication failure) and weather condition (e.g., cloudy and rainy weather), not all measured LAINet values were valid. Therefore, the number of available field LAI measurements on each day is not the same (less than or equal to 12). In addition, the estimated LAI values from each plot were averaged over eight-days interval consistent with the compositing periods of MODIS eight-days LAI products. This averaging procedure reduces the random errors and facilitates the comparison to MOD15A2 LAI product. The eight-days composited measurements were hereafter denoted by the first day (in the form of Day Of Year, DOY) of the compositing period. The number of processed LAI measurement is 12, 12, 11, 10, 6, 8 and 8 on DOY 185, 193, 201, 209, 217, 225 and 233, respectively, with a total number of 67 during the study period.


INTRUDUCTION
Direct validation of coarse resolution remote sensing products need the support of in-situ measurements (Camacho et al., 2013;Fang et al., 2012;Yan et al., 2016).Moreover, an upscaling process is also needed to interpolate the discrete measurements and get a spatially explicit reference map (Morisette et al., 2006).Traditional in-situ measurements are often collected through field campaign which is often labor-intensive and timeconsuming because of its manual nature (Breda, 2003;Jonckheere et al., 2004;Weiss et al., 2004).The flourished development of low-cost near-surface remote sensing sheds light to the effective and inexpensive collection of in-situ measurements (Campos-Taberner et al., 2016b;Ryu et al., 2014).Wireless sensor network (WSN) is one of near-surface remote sensing systems, which comprises an array of sensor nodes and a wireless communications system (Qu et al., 2014).The sensor nodes can be located according to site-specific spatial sampling strategies to capture the surface heterogeneity.Therefore, WSN can provide unattended networking observations, which are important for temporal validation.
Although the validation protocol has long been proposed (Morisette et al., 2006), a scientific question is still not clear: Is the reference map per se credible, and how to evaluate its credibility?For the in-situ networking observations from WSN, we should address an additional question: Is the automated, unattended observations accurate or robust enough to generate high resolution reference parameter maps.
The above scientific questions bring in the following main objective of this study: to propose an integrated framework to evaluate the quality of the in-situ networking observations and upscaling.The evaluation method is based on Gaussian Process Regression (GPR).Performances of the proposed method on the quality evaluation of upscaling were tested over a crop site.The in-situ networking observations and the corresponding high resolution NDVI were provided by the LAINet and the CACAO fused NDVI images, respectively.The evaluation of the LAINet observations through GPR is still in progress.

LAINet observations
The research was conducted in a 5 km × 5 km region (centred at ~ 40°22′N, 115°46′E ) near Huailai, northern China (Figure .1), which is one of core observation fields of the Validation network for Remote sensing Products in China (VRPC) (Ma et al., 2015).The selected size was tailored to match MODIS pixels.The LAINet observation system (Qu et al., 2014), based on WSN technology, was used to obtain temporally continuous field measurements.It can measure the leaf area index (LAI) automatically and continuously.LAINet consists of Below Node (BN), which is deployed below the canopy and records transmitted radiation, the Above Node (AN), which is used to record downward radiation above the canopy, and the Central Node (CN), which is used as a data reception and control node.Communication among the nodes is achieved through Zigbee protocol, while data exchange between the CN and the remote Data Server (DS) is completed through the General Packet Radio Service (GPRS) network.
The two types of measurement nodes (AN and BN) have the same hardware configuration and software functions to ensure their consistent response to radiation, which is the prerequisite to calculate gap fraction.Because the downward radiation above the canopy can be considered to be spatially homogeneous at local space, a small number of measurements can represent the 5 km × 5 km study area, so one AN was deployed, and the AN consists of three quantum sensors.The LAINet deployed in our study area contained 12 plots, and the locations of those plots were determined by the SMP (Sampling strategy based on Multi-temporal Prior knowledge) sampling approach (Zeng et al., 2015).SMP can capture the spatiotemporal variation of vegetation growth.
The LAI estimation algorithm is based on the gap fraction theory, and uses beam radiation at different solar zenith angles to measure multi-angle gap fraction.When calculating gap fraction in a certain solar zenith angle for each BN, the average of the measurements from the three quantum sensors equipped in the AN was taken as radiation reference (EA); meanwhile, the average of the measurements from the nine quantum sensors equipped in the BN was taken as the transmitted radiation under the canopy (EB), and the ratio between them (EB/EA) was seen as the gap fraction.
Our study period is between July 1 (DOY 182) and September 14 (257), 2013 when a LAINet observation system were deployed and operated in the study area.Due to the unexpected instrument failure (e.g., dead battery and communication failure) and weather condition (e.g., cloudy and rainy weather), not all measured LAINet values were valid.Therefore, the number of available field LAI measurements on each day is not the same (less than or equal to 12).In addition, the estimated LAI values from each plot were averaged over eight-days interval consistent with the compositing periods of MODIS eight-days LAI products.This averaging procedure reduces the random errors and facilitates the comparison to MOD15A2 LAI product.The eight-days composited measurements were hereafter denoted by the first day (in the form of Day Of Year, DOY) of the compositing period.The number of processed LAI measurement is 12,12,11,10,6,8 and 8 on DOY 185,193,201,209,217,225 and 233, respectively, with a total number of 67 during the study period.

NDVI data
Many vegetation indexes (VI) have been used to predict fine resolution LAI map.The normalized difference vegetation index (NDVI), which is one of the most utilized vegetation in LAI retrieval, was selected in this study.
Spatio-temporal matching between field measurements and NDVI maps must be ensured to reduce uncertainty caused by the spatial heterogeneity and the dynamics change of vegetation.To obtain fine-resolution NDVI maps with high enough temporal sampling to match the field measurements from LAINet, the CACAO method (Verger et al., 2013) was used to blend frequent MODIS NDVI data with fine-resolution OLI NDVI data in our study area.See (Yin et al., 2017) for further details.
The NDVI value corresponding to each LAINet measurement was extracted from the fused NDVI images, according to the locations of the LAINet plots and the dates.In total, 67 NDVI-LAI pairs were established as the training dataset of the GPR model.

DESCRIPTION OF THE EVALUATION FRAMEWORK
GPR has been recently introduced as a powerful regression tool (Rasmussen and Williams, 2006).The model provides a probabilistic approach for learning the relationship between the input (NDVI, in this study) and output (reference LAI, in this study) with kernels.This model has been widely used in biophysical parameters retrieval (Campos-Taberner et al., 2016a;Verrelst et al., 2012a;Verrelst et al., 2012b).
In this section, we first review the general formulation of GPR for regression problems, then define its potential in the integrated quality evaluation of the in-situ networking observations and upscaling.
The GPR model establishes a transformation from the input to the output of the form: where {xi} N i=1 are the NDVI used in the training phase, αi is the weight assigned to each one of them, and K is a kernel function evaluating the similarity between the test NDVI x and all N training NDVI.In this study, a scaled Gaussian kernel function was used, where v is a scaling factor, B is the dimensions of the input (B = 1, in this study), σb is a dedicated parameter controlling the spread of the relations for each dimension of the input, σn is the noise standard deviation and δij is the Kronecher's symbol.Model hyperparameters θ = {v, σb, σn } and model weights αi can be automatically optimized by Type-II Maximum Likelihood, using the marginal likelihood (also called evidence) of the observations (LAINet measured LAI in this study).
For training purposes, we assume that the observed variable is formed by noisy observations of the true underlying function y=f(x)+ε.Moreover we assume the noise to be additive independently identically Gaussian distributed with zero mean and variance σn.
and the predictive variance (confidence intervals): (5) Note that the predictive mean is a linear combination of observations y, while the predictive variance only depends on input data and can be taken as the difference between the prior kernel and the information given by observations about the approximation function (Verrelst et al., 2012b).
The GPR model has potential to be readily applied to integrally evaluate the quality of the LAINet observations and upscaling, with the following two advantages.First, the obtained weights αi gives the relevance of each plot (see Eq. ( 1)), and a plot with a high weight means high credibility (can be used to evaluate the observation quality).Second, a GPR model can provide a pixelwise uncertainty level (see Eq. ( 5)) for the scaling LAI map (can be used to evaluate the upscaling quality).

PRELIMINARY RESULTS AND DISCUSSION
In this section, we will show the potential of the GPR model on the evaluation of the upscaling results.The evaluation of the LAINet observations through GPR is in progress.
The regression result between the training NDVI-LAI pairs obtained using GPR and the predicted LAI values plus and minus twice the standard deviation (corresponding to the 95% confidence region) are shown in Figure 2. In terms of the uncertainty, several distinct regions can be easily observed.
When NDVI < 0.5 or > 0.9, no training NDVI data are available, and the predicted LAI values have high uncertainty; when NDVI > 0.5 and < 0.9, where most of the training samples lie, the uncertainty in the predicted LAI values is relatively low.Therefore, the uncertainty given by GPR depends on the representativeness of the training dataset.Intuitively, the predicted value is credible if the GPR has ever seen the test value in the training phase.Otherwise, a high uncertainty will be returned if the test value is unfamiliar for the GPR.Although GPR can cope well with the strong nonlinearity of the functional dependence between the LAI and NDVI values, it displays a nonphysical trend.When the NDVI is very low, the LAI should approach zero.Second, the regressed curve should rise rapidly with the NDVI for high NDVI values, because of the saturation effect of the NDVI.However, the predicted LAI values level off when the NDVI is lower than 0.5 or higher than 0.9.This result is caused by the data-driven nature of GPR, and additional training samples with NDVI values lower than 0.5 or higher than 0.9 should be added to enhance the generalization ability of the trained GPR model.The limited sampling of the training dataset at low and high NDVI values would not lower the usability of the derived GPR model, because a high uncertainty is given for these specific NDVI values.
A point-by-point comparison between the observed LAI values from LAINet and the predicted LAI from GPR has also been implemented (Figure 3).In general, the GPR displays excellent accuracy, and the R 2 and RMSE values of the GPR model are 0.72 and 0.45, respectively.GPR performed better than the empirical regression modeling approach implemented in a previous work (Yin et al., 2017), which resulted in an R 2 value of 0.72 and an RMSE value of 0.55; see Fig.Besides the LAI time series, the GPR also generated the pixelwise CV time series, which indicate the quality of the resulting LAI.It can be seen from Figure 4 that the quality of the LAI maps also revealed an obvious temporal pattern, opposite to that of LAI series.On DOY 185, the LAI map showed a low quality, especially in northwest of our study area where many pixels having CV greater than 50.After DOY 185, the quality of the upscaling LAI map continuously improved, Then after DOY 225, decreased again.In fact, the quality of the upscaling LAI inherits from the representativeness of the plots in LAINet.The SMP sampling strategy was designed to capture the spatio-temporal variation of vegetation growth and showed a good sampling efficiency (Zeng et al., 2015).But when implementing the SMP, the LAI was proxied by NDVI, SR and EVI (Zeng et al., 2015), which will show larger dynamic range in lower LAI value, because of the background disturbance.Therefore, VIs less sensitive to background influence should be used in future when determining the plot locations through SMP.
The temporal variation of the average LAI values and the uncertainty within our study area has also been analysed and is shown in Figure 5. Generally, these two variables show nearly symmetrical patterns, i.e., lower NDVI values are associated with higher uncertainties and vice versa.The negative correlation between the uncertainty and the NDVI values results from the under-sampling of the training dataset for low NDVI values (see Figure 2).Note that high NDVI values (>0.9) were also under-sampled, whereas the high NDVI values are not associated with large uncertainties.This result occurs because there are very few pixels that have an NDVI value higher than 0.9 in our study area.

CONCLUSIONS
Quality evaluation of the in-situ networking observations and upscaling is the prerequisite for their proper application.This study propose an integrated framework based on Gaussian Process Regression (GPR) which can evaluate their quality together.Performances of the proposed method on the quality evaluation of upscaling were tested over a crop site.The in-situ networking observations and the corresponding high resolution NDVI were provided by the LAINet and the CACAO fused NDVI images.Results show that GPR can give pixelwise coefficient of variation (CV) indicating the quality level of the scaling results.Through the analysis of CV time series, we found that the quality of the upscaling LAI map have an obvious temporal pattern, which caused by the low representativeness of the plots for low LAI value.The preliminary results showed the necessity to use backgroundinsensitive VIs when sampling plots, demonstrating that our proposed evaluation method can provide feedback for the optimization of the networking observations.In future studies, the potential of GPR on quality evaluation of the in-situ networking observations per se will be conducted, and will be presented during the session.

Figure 1 .
Figure 1.Map of the study area overlaid by the MOD15A2 product pixels (nominal resolution of 1 km × 1 km).The displayed image corresponds to a color composite (bands 5-4-3) of Landsat-8 OLI image acquired on August 23, 2013.The points represent the locations of the plots in the LAINet observation system deployed in the study area.

Figure 2 .
Figure 2. Regression result of the training dataset using GPR.The shaded area represents the pointwise mean plus and minus twice the standard deviation for each NDVI value (corresponding to the 95% confidence region).
7(b) in the paper cited above.

Figure 3 .
Figure 3.The comparison between the observed LAI values from LAINet and the predicted LAI values from GPR.The dashed line represents the 1:1 line.

Figure 4 .
Figure 4. High spatial resolution LAI maps and the corresponding quality evaluation results.The quality of each pixel is represented by its CV.Both LAI and CV were generated from the GPR model using LAINet measured LAI and CACAO reconstructed NDVI.The white parts represent non-vegetation land cover types.

Figure 5 .
Figure 5.The temporal variation of average LAI and uncertainty values in our study area.