ABOVEGROUND BIOMASS ESTIMATION USING RECONSTRUCTED FEATURE OF AIRBORNE DISCRETE-RETURN LIDAR BY AUTO-ENCODER NEURAL NETWORK

Aboveground biomass (AGB) estimation is critical for quantifying carbon stocks and essential for evaluating carbon cycle. In recent years, airborne LiDAR shows its great ability for highly-precision AGB estimation. Most of the researches estimate AGB by the feature metrics extracted from the canopy height distribution of the point cloud which calculated based on precise digital terrain model (DTM). However, if forest canopy density is high, the probability of the LiDAR signal penetrating the canopy is lower, resulting in ground points is not enough to establish DTM. Then the distribution of forest canopy height is imprecise and some critical feature metrics which have a strong correlation with biomass such as percentiles, maximums, means and standard deviations of canopy point cloud can hardly be extracted correctly. In order to address this issue, we propose a strategy of first reconstructing LiDAR feature metrics through Auto-Encoder neural network and then using the reconstructed feature metrics to estimate AGB. To assess the prediction ability of the reconstructed feature metrics, both original and reconstructed feature metrics were regressed against field-observed AGB using the multiple stepwise regression (MS) and the partial least squares regression (PLS) respectively. The results showed that the estimation model using reconstructed feature metrics improved R by 5.44%, 18.09%, decreased RMSE value by 10.06%, 22.13% and reduced RMSEcv by 10.00%, 21.70% for AGB, respectively. Therefore, reconstructing LiDAR point feature metrics has potential for addressing AGB estimation challenge in dense canopy area. * Corresponding author


INTRODUCTION
Forest aboveground biomass (AGB) is a key biophysical parameter for assessing the health and productivity of vegetation ecosystems (Swatantran et al., 2011;Ediriweera et al., 2014).Forest aboveground biomass is usually used for carbon stock estimation, climate change and ecological modelling and has been increasing studied (Naesset et al., 2013).The accuracy of these models is largely dependent on the accuracy of AGB.Therefore, rapid and accurate estimation of forest AGB is critical for improving the accuracy and applicability of these models.However, how to accurately estimate forest AGB is a challenge issue so far.Conventional field methods for estimating forest AGB, such as forest inventories or destructive sampling, are the most reliable and accurate (Huang et al,. 2013).However, field measurements are often labor intensive, time consuming and costly, and the scope of the survey is also limited and cannot be investigated in a large scale (Ahmed et al., 2013;Ene et al., 2012).Therefore, direct observation methods are not applicable in large study areas.
Remote sensing techniques can provide effective solutions to rapidly and repetitively collect land surface information at regional scales and then estimate forest AGB through these acquired data (Sun et al., 2011).Numerous studies have performed forest AGB estimations using optical remotely sensed data (Jin et al., 2009) and radar data (Gao et al., 2013).Forest AGB cannot be directly acquired through remote sensing data, it is estimated through empirical relationships established between the vegetations indices derived from remotely sensed data and the field measured biomass data.However, optical remotely sensed data is less sensitive to forest vertical structure due to the low penetration into forest canopies (Tao et al., 2014).Although Radar is capable of penetrating through cloud and forest canopies, it exits one phenomenon in study areas which high biomass or high canopy density can lower the estimation accuracy of vegetation parameters (He et al., 2013;Lu et al., 2012).
Different from optical sensors and radar instruments, LiDAR is an active remote sensing technology and can rapidly acquire three-dimensional point clouds of objects with high vertical and horizontal accuracies (Lefsky et al., 1999;Qin et al., 2015;Tsui et al., 2012).On the other hand, LiDAR technology can provide effective solutions to estimate forest AGB at the landscape or regional levels.Moreover, airborne LiDAR shows its great advantages which can be able to penetrate forest canopies and accurately obtain vertical structure parameters (Kobal et al., 2015).It is therefore regarded as the best remote sensing technique for accurately estimating forest AGB (Kankare et al., 2013).Many studies have successfully estimated forest biomass using airborne discrete-return LiDAR data.Luo et al (2017) extracted vertical structure features from discrete-return LiDAR point cloud, and these vertical structure features are based on the normalized vegetation point cloud.Lin et al (2014) evaluated the fusion of airborne discrete-return and fullwaveform LiDAR data in estimating AGB of subtropical forests.In their study, airborne discrete-return LiDAR derived metrics are extracted from normalized vegetation point cloud, but the No matter what structure feature metrics are extracted from the point cloud data, it is necessary to establish the precise normalized vegetation point cloud data.Normalized vegetation point cloud is the elevation data of point cloud relative to the surface topography.If the canopy density is large, the probability of the LiDAR signal penetrating the canopy is lower, and the number of point clouds reaching the ground is very small, which reduces the estimation accuracy of vegetation parameters (Luo et al., 2017).To get the accurate normalized vegetation point cloud data, we need to establish precise DTM.If the ground points are few, the DTM needs to be interpolated in a large range.The result is relatively imprecise, which leads to the inaccuracy of the structure features acquired.Feature reconstruction has been widely used in image field.In this paper, we propose a strategy of first reconstructing LiDAR feature through Auto-Encoder neural network and then using the reconstructed feature to estimate forest aboveground biomass.The specific objectives of this study were to: 1) establish biomass prediction models using direct features and reconstructed features respectively; 2) assess the potential of first reconstructing LiDAR feature then estimating biomass for improving biomass estimation accuracy.

Study Area
The study area is located in Qilian Mountain Dayekou, Zhangye City, Gansu Provinces of northwest China (Figure1).The weather is mainly affected by the high latitude air circulation.The study area is the main body of the ecosystem in the Qilian Mountains.It is a study area chosen by an airborne, satelliteborne and ground-based remote sensing experiment, Waterhed Allied Telemetry Experimental Research (Li et al., 2009).The elevation ranges from 2500m to 3800m above sea level with slopes below 15 degrees.It covers three vegetative climate zones and significantly affects soil/water conservation and biodiversity protection.The forest belongs to a coniferousdominated forest with dense canopy cover.The dominant tree species is Picea crassifolia, occupying about 95 percent of the whole forestland in study area.

Field Measurement
In the study area, field measurements were conducted on two field plots distributing over the flight route (Figure 1).The entire artificial sample plot is divided into super-plot and lineplot.The super-plot was a 100 100 mm  area with a slope of less than 20 degrees and it was divided into 16 subplots with the size of 25 25 mm  .The line-plot was a sample line consisting of 20 subplots with the size of 20 20 mm  distributed along the direction of flight.The interval between the subplots in the lineplot was 50 meters.The total number of subplots was 36.The center coordinate of each subplot was positioned by differential global positioning system (DGPS) station.Each DGPS is placed in an open area to ensure its position accuracy.The tree height and diameter at breast height (DBH) were measured using a laser hypsometer and a diameter tape respectively.

Biomass Calculation
The individual tree biomass was calculated based on their DBH and height according to the empirical relative growth equations.Subplot forest AGB was calculated as the sum of individual tree AGB in each subplot.

 
where DBH= diameter at breast height(cm) H=tree height(m)

Processing of LiDAR Data
In order to extract the vertical structural features of the point cloud, we preprocess the original point cloud data.Original LiDAR point datasets usually contain several outliers that are far above or below the earth surfaces, and it will influence the accuracy of extracted features.In the study area, we first remove some outliers manually.Then LiDAR point clouds were classified as canopy and ground returns using the adaptive triangulation network filter algorithm through Terrasolid software.A interpolated digital terrain model (DTM) with the grid cell size of 1m resolution was generated based on the ground points.Using the this DTM, we can remove the influence of topography and obtain DTM normalized LiDAR points (Nie et al., 2017).Then heights of non-ground points were normalized and feature metrics were calculated based on DTM normalized LiDAR points.

LiDAR Metrics Calculation
A variety of LiDAR-derived metrics can be used to estimate AGB, including minimum height, maximum height, mean vegetation return height, LiDAR height percentiles and canopy cover (Bright et al., 2012).And these feature metrics calculated from DTM normalized LiDAR points were coincide with the previous studies (Cao et al., 2014;Chen et al., 2012).For canopy cover, a height threshold was set 2.0 m to separate canopy returns from ground returns in this study.In addition, density metrics, accumulative height of LiDAR height and kurtosis of LiDAR height are also highly correlated with aboveground biomass.Therefore, we extracted all these feature metrics from normalized points, totaling 47 features.The optimal LiDAR metrics for estimating biomass may vary with vegetation type, environment and landscape of the study area.In this study, we selected various LiDAR feature metrics to obtain the most effective biomass estimation model.A summary of LiDAR metrics used in this study is provided in Table 1.Table 2. Summary of estimation accuracies of AGB using original LiDAR feature metrics and reconstructed LiDAR feature metrics respectively.Two types of regression models represent the better ability of estimating AGB using reconstructed feature metrics.

Reconstructing Feature Metrics by Auto-Encoder Network
According to the study area, the biomass regression can be directly carried out if the feature metrics were extracted from accurate normalized vegetation points.However, the features extracted from the area with insufficient vegetation points are deficient.Whereas, we can learn effective feature expression from accurate features extracted from sufficient vegetation points in some low dense forest area.Auto-Encoder neural network is an algorithm that can actively search the relationship between features and has the ability of repairing features.Moreover, Auto-Encoder is an unsupervised algorithm, all the study area data can be trained.We extracted 100000 samples from the study area, each of which has a size of 25 25 mm  . For each of these samples, we extract the LiDAR features mentioned above as training samples.At the same time, we have 36 samples with biomass information as our validation samples.Before the training samples are input into the network, we normalized the data firstly.Auto-Encoder adopts generally L2 regularization, but L2 regularization is susceptible to noise and high variance data.Unfortunately, our data exactly exits this serious problem.To address this problem, we use L1 regularization which is widely applied in image inpainting.For specific network structure, it contains an input layer with 47 neurons, a hidden layer with 36 neurons and an output layer with 47 neurons, the activation function is the Relu.We apply stochastic gradient descent(SGD) to train the Auto-Encoder with a mini-batch size of 128.In SGD, a learning rate of 0.0001 and a momentum of 0.9 are to be applied.

Statistical Analyses and Modelling
In this study, two types of model regressions were used to estimate forest biomass, i.e, multiple stepwise regression (MS) and partial least square regression(PLS).Moreover, logtransformed biomass values and feature metrics were tested.The biomass estimation results from original feature metrics and reconstructed feature metrics were compared and analyzed to assess the potential of the reconstruction of LiDAR feature metrics for estimating biomass.
The leave-one-out-cross-validation (LOOCV) is an effective method to evaluate the generalization capability of regression models, being particularly useful for models with only a small number of samples available (Crespi et al., 2008;Peduzzi et al., 2012).To assess the reliability of these models, the LOOCV was performed, and the predicted residual sum of squares (PRESS statistic) was calculated.To validate the model predictive power, the root mean square error (RMSECV) from the cross validation analysis was calculated based on the PRESS statistic and the number of observed samples (Nie et al., 2017).The close agreement in magnitude between RMSE and RMSECV suggests that the fitted model tends to have less overfitting and more generalization (Jensen et al., 2008).
For MS regression, a single regression analysis was performed firstly for each metric and then a stepwise multiple regression analysis (criteria: probability of F to enter < = 0.05; probability of F to remove > = 0.1) was performed for all of the derived LiDAR feature metrics to determine the optimal independent variables and biomass estimation models.Because of the logarithm transformation of all the predicted metrics and plot biomasses, the simple linear model is applicable instead of a nonlinear regression model.On the other hand, we noticed that LiDAR feature metrics have multicollinearity problems because of high correlations among the metrics (Laurin et al., 2014).Fortunately, PLS regression can effectively resolve small samples and multicollinearity problems that are faced in multiple linear regression (Chen et al., 2012), and it has been increasingly used in vegetation biomass estimation field.In our experiment, we selected the optimal number of latent variables using the LOOCV method to avoid overfitting of PLS regressions.

RESULTS AND DISCUSSION
To assess the availability of reconstructing feature metrics, we performed MS regression and PLS regression using 36 groundbased AGB against original LiDAR feature metrics and reconstructed LiDAR feature metrics, respectively.The corresponding experiment results were listed in Table 2 and  Figure 2.For MS regression, the results showed that LiDAR feature metrics directly derived from LiDAR discrete-return point data were relatively weakly related to biomass(R 2 =0.698,RMSE=16.6841(Mg/ha),RMSEcv=16.8531( Mg/ha)) comparing with reconstructed LiDAR feature metrics (R 2 =0.724,RMSE=15.9534(Mg/ha),RMSEcv=16.0874(Mg/ha)), and we noticed that improvement of biomass estimation did not perform obviously (Figure 2 (a) (b)).The MS regression performed low prediction ability for AGB due to its modeling limits among these feature metrics.
On the other hand, the reconstructed feature metrics performed much better than original feature metrics when using PLS regression.It was because that the PLS regression can effectively resolve multicollinearity problems especially facing small samples.We can find that R 2 improved by 16.16%, RMSE and RMSEcv reduced by 17.21%, 16.95% respectively comparing with original LiDAR feature metrics (Table 2).It further proved that reconstructed feature metrics have strong prediction ability for AGB estimation.The Auto-Encoder neural network has powerful feature expression ability learned from numerous train samples and can actively find the relationship between various features.Therefore, we can make use of Auto-Encoder neural network to restructure our structure feature metrics extracted from dense area.It could make contribute to improving the accuracy of the AGB regression and our experiments have proved this point.

CONCLUSION
Our study aimed to addressing the task that AGB estimation can hardly be carried out in dense vegetation areas.For reconstructed feature metrics, the structure parameters are good indicators of forest AGB.While the structure parameters derived directly from discrete-return LiDAR data showed relatively weak correlation with ground-based AGB due to insufficient number of ground points.It means that our method can eliminate the problem of insufficient points and get good AGB estimation results.This is our main contribution to the AGB estimation.Additionally, the regression model is an essential factor for AGB estimation accuracy.Our results showed that PLS regression performed better than MS regression mode, this is due to the existence of a serious multicollinearity problem in our characteristic variables.
In summary, the results showed that reconstructing structure feature parameters has the potential for improving biomass estimation accuracy.The methods developed and tested in this study could be useful for accurately estimating biomass using discrete-return LiDAR data, and the method developed in this study may also be used in similar studies.However, the AGB estimation was only conducted in a coniferous forest and may not be directly applicable to different areas with different vegetation types.Therefore, future works should focus on testing a wide range of forest types.

Figure 1 .
Figure 1.The study area and distribution of sampling sites (black points "." are the center of field plots)

Figure 2 .
Figure 2. Ground-based AGB value (Mg/ha) versus predicted AGB value (Mg/ha) from the regression model using LiDAR feature metrics.(a) and (b) are the regression results using original LiDAR feature metrics by multiple stepwise regression (MS) model, (c) and (d) display the regression results using reconstructed LiDAR feature metrics by partial least square regression (PLS) model.

Table 1 .
Summary of the feature metrics derived from the LiDAR data used as candidate variables for estimating biomass