A COMPARATIVE ANALYSIS OF PIXEL-BASED AND OBJECT-BASED APPROACHES FOR FOREST ABOVE-GROUND BIOMASS ESTIMATION USING RANDOM FOREST MODEL

: Providing an accurate above-ground biomass (AGB) map is of paramount importance for carbon stock and climate change monitoring. The main objective of this study is to compare the performance of pixel-based and object-based approaches for AGB estimation of temperate forests in north-eastern of New York State. Second, the capabilities of optical, SAR, and optical + SAR data were investigated. To achieve the goals, the random forest (RF) regression algorithm was used to model and predict the AGB values. Optical (i.e. Landsat 5TM, Landsat 8 OLI, and Sentinel-2), synthetic aperture radar (SAR) (Sentinel-1 and global phased array type L-band SAR (PALSAR/PALSAR-2)), and their integration have been used to estimate the AGB. It is worth mentioning that the airborne light detection and ranging (LiDAR) AGB raster has been used as a reference data for training/testing purposes. The results demonstrate that the OBIA approach enhanced the RMSE of AGB estimation about 5.32 Mg/ha, 8.9 Mg/ha, and 5.29 Mg/ha for optical, SAR, and optical + SAR data, respectively. Moreover, optical + SAR data with the RMSE of 42.63 Mg/ha and R 2 of 0.72 for pixel-based and RMSE of 37.31 Mg/ha and R 2 of 0.77 for object-based approach provided the best results.


INTRODUCTION
Besides being habitat for plants and animals, forests provide oxygen, wood, fuel, medicine, prevent soil erosion and sever floods (Simonian 1995). Forests play important role in global ecosystems. Forest management requires accurate, timely, consistent, and comprehensive monitoring (S. Li, Quackenbush, and Im 2019). In particular, forests are a crucial part of carbon cycle which can impact the climate change (Jackson et al. 2008). Above-ground biomass (AGB) is one of the key parameters for carbon stock calculations. Accurate AGB prediction provides useful information for carbon emission and sequestration assessment (Jackson et al. 2008).
Forest AGB can be estimated by cutting and measuring the dry weight of the trees or applying allometric equations using the measured tree height or diameter at breast height (Silveira, Santo, et al. 2019). However, these techniques are destructive, laborious, time consuming, and practical for small regions. As a solution, remote sensing presents a non-destructive method for AGB estimation by using airborne or spaceborne optical and synthetic aperture radar (SAR) imagery (Dube et al. 2016;Issa et al. 2020). Although optical datasets provide valuable spectral information, they suffer from saturation issue and weather condition influences their quality. Saturation occurs when the pixels' spectral reflectance value is not matched with the real reflectance in forested regions with dense canopy (Zhou et al. 2016;Urbazaev et al. 2018). In comparison to optical data, SAR imagery are captured at longer wavelengths which makes them to be able to penetrate through forest canopy. SAR images are independent of weather conditions and they provide information about the physical structure of the trees (Berninger et al. 2018;Urbazaev et al. 2018). It is worth mentioning that SAR data may also be affected by saturation specially in dense forested areas (Urbazaev et al. 2018). Thus, many studies have recommended to use the combination of optical and SAR data to improve the results of AGB estimation (Boudreau et al. 2008;Karlson et al. 2015;Dube et al. 2016;Berninger et al. 2018;Cao et al. 2018;Dang et al. 2019;Duncanson et al. 2020;Issa et al. 2020;C. Li, Li, and Li 2020). In this study, optical (i.e. Landsat 5 TM, Landsat 8 OLI, and Sentinel-2), synthetic aperture radar (SAR) (global phased array type L-band SAR (PALSAR/PALSAR-2) and Sentinel-1) datasets, and their combination was used to examine the efficiency of different remote sensing sources. It is worth mentioning that airborne light detection and ranging (LiDAR) data was used to produce AGB rasters as reference maps.
One of the crucial steps in estimation AGB is to find a well-suited algorithm. Generally, AGB can be estimated using statistical and mathematical regression models (Wu et al., 2016). Traditionally, multiple linear regression models are used to model the relationship between input predictors and AGB sample plots. Multiple linear regression model is easy to implement while it does not match with the remote sensing data as they are distributed non-linearly. Thus, recently, machine learning algorithms are being used to predict the AGB accurately. (Y. ) have shown that decision-tree based models such as random forest (RF) provide promising results. Other studies have also used the RF regression algorithm for AGB estimation (Mutanga, Adam, and Cho 2012;Karlson et al. 2015;Dang et al. 2019;Huang et al. 2019;Hudak et al. 2020).
Feature extraction method is another parameter that influences the AGB predictions. Pixel-based technique is widely used while object-based image analysis (OBIA) has gained attention during recent years (Salehi, Daneshfar, and Davidson 2017). OBIA categorizes similar pixels into objects and can overcome the limitations of mixed pixels (Salehi et al. 2012). So far, two studies have focused on estimating AGB using an OBIA approach (Hirata et al. 2018;Silveira, Silva, et al. 2019). Hirata et al. (2018) used an OBIA and a multiple linear regression model for AGB estimation of seasonal tropical forests in Cambodia. Silveira et al. (2019) used the RF regression to model the AGB over mountainous Brazilian forest. These two studies have been focusing on AGB prediction of tropical forest, thus, this research focuses on temperate forests of northeastern New York State.
The primary objective of this study is to compare pixel-based and object-based feature extraction approaches for temperate forest AGB estimation. Since RF model is capable of handling nonlinear datasets and it provides accurate estimations, we decided to model the AGB using the RF regression. The second objective is to investigate the potential of different remote sensing data such as optical and SAR data, separately. Third, the performance of the combination of optical + SAR data for AGB estimation was examined.

Study Area
This study focuses on estimating forest AGB of two forest properties in the Adirondack Park located in the north-eastern New York State ( Figure 1). Huntington Wildlife Forest (HWF) covers an area of 6,000 ha, and the elevation changes from 473 m to 908 m above mean sea level. HWF is a temperate forest with a mean annual temperature of 4.4 ºC and mean annual precipitation of 1010 mm. Pack Demonstration Forest (PDF) which covers an area of 2,500 ha is located in the southern Adirondacks. PDF elevation changes from 204 m to 377 m above mean sea level, the mean annual temperature is 5.07 ºC, and the mean annual precipitation is 1158 mm. White circles indicate sample plots located in Huntington Wildlife Forest (HWF) and Pack Demonstration Forest (PDF).

Field Inventory Data Collection
Continuous Forest Inventory (CFI) field measurements were used as reference datasets for training/testing purposes. These datasets were collected by the State University of New York, College of Environmental Science and Forestry (ESF) during July and August of 2011 and 2013 in HWF and PDF, respectively (Breitmeyer et al. 2019). There are 288 sample plots in HWF with a radius of 16.02 m. CFI data over PDF contains 95 sample plots with the radius of 11.3 m. The AGB of the sample plots has been calculated using measured diameter at breast height (DBH) and species-specific Component Ratio Method (CRM) allometric equations (Woodall et al. 2011;Clough et al. 2018

Remote Sensing Data
2.3.1 Airborne LiDAR: Airborne LiDAR data acquired over Warren, Washington, and Essex counties by New York State GIS program office (NYSGPO) between April 2015 and June 2015 using the Leica Airborne Laser Scanner 70 (ALS70). This data contains a 2.5 point per meter squared LiDAR point cloud. In order to process the LiDAR data, first, raw point clouds were converted into height-normalized point clouds using a k-nearest neighbor imputation algorithm (k = 5). Then, height and intensity predictors were computed using the height normalized LiDAR data to create a 30 m grid cell dataset.
Optical Data: For optical data, Landsat 5 TM, Landsat 8 OLI/TIRS, and Sentinel-2 imagery in 2011, 2013, and 2016 were used, respectively. The Google Earth Engine (GEE) cloud platform was used to download and pre-process the imagery (Gorelick et al. 2017). First, the surface reflectance data of July and August for each year was collected. Second, spectral bands were extracted and some vegetation indices such as normalized difference vegetation index (NDVI), Soil Adjusted Vegetation Index (SAVI), Ratio Vegetation Index (RVI), normalized burn ratio (NBR), and normalized difference moisture index (NDMI) were calculated based on spectral bands. All input layers were resampled using bicubic interpolation and re-projected to NAD83 Conus Albers EPSG: 5070 coordinate system to be aligned with 30 m LiDAR grid cells.

SAR Data:
Both L-band and C-band SAR data were used in this study to leverage the potential of structural information provided by these sensors to improve the accuracy of the AGB estimation model. First, global phased array type Lband SAR (PALSAR/PALSAR-2), onboard the advanced land observing satellite (ALOS), yearly mosaic with 25 m resolution was utilized (Gorelick et al. 2017). In this research, the dual polarization (horizontal transmit/horizontal receive (HH) and horizontal transmit/vertical receive (HV) polarizations) yearly mosaic in 2011 and 2013 for HWF and PDF was utilized. Second, Sentinel-1 dual polarization (VV and VH) C-band data with 10 m resolution in 2015 was used. Then, we used a smoothing speckle filter in GEE to reduce the speckle noise (Lee and Pottier 2009). Finally, dual polarization backscatters were used to calculate span and band ratios.

METHODOLOGY
In this study, an RF machine learning model was used to estimate the AGB of HWF and PDF using the combination of optical and SAR data. In order to provide more training/testing samples for the RF model, we decided to use airborne LiDAR data to produce the AGB rasters and use them as reference data for pixel-based and object-based approaches. First, height and intensity predictors of airborne LiDAR data were used to generate the AGB raster of HWF and PDF using CFI plots as field measurements. Then, the generated AGB rasters were used as reference data for training/testing purposes. Finally, the results of both pixel-based and object-based models were compared using the integration of spaceborne optical and SAR data.

Random Forest (RF)
The RF regression model is an ensemble machine learning algorithm which combines a large set of regression trees proposed by Breiman (Breiman 2001;Zhou et al. 2016). The RF starts with bootstrapping samples that are randomly replaced within the training dataset. Then, a regression tree is fitted to each bootstrap sample. At each node, a subset of input predictors is randomly selected for binary partitioning (Zhou et al. 2016;Izquierdo-Verdiguier and Zurita-Milla 2020). The regression tree splitting method is based on the Gini Index. Finally, the predicted value is obtained by calculating the average of all the trees (Zhou et al. 2016;Izquierdo-Verdiguier and Zurita-Milla 2020).

LiDAR AGB raster as reference data
Generally, machine learning algorithms require large amount of training data to perform better. To increase the number of training/testing samples, we used AGB rasters derived from airborne LiDAR predictors as reference datasets. Several studies used LiDAR-derived AGB maps as reference samples (Hirata et al. 2018;Hudak et al. 2020). The purpose was to include a full range of AGB values; thus, a stratified random sampling technique was used to generate training/testing samples from AGB rasters. Pixels of LiDAR AGB raster were sorted from 0 to maximum AGB with 5 Mg/ha bins. Then, 200 pixels/ objects were randomly chosen within each bin. One-half of samples were selected when the bin had less than 200 pixels/objects (Hudak et al. 2020). For the OBIA approach, the boundaries were overlaid on LiDAR-derived raster to calculate the mean AGB of the object. The reference samples were divided into 70% training and 30% testing samples. The performance of the model was evaluated using the root mean square error (RMSE) and R squared (R2) metrics.

Object-based Image Analysis (OBIA)
Recently, object-based approach is getting more attention in remote sensing applications (Blaschke 2010). While pixel-based approach considers each pixel separately, OBIA categorizes pixels with similar spectral reflectance into objects (Addink and Coillie 2010). OBIA is capable of reducing mixed pixels issues by clustering similar pixels (Salehi et al. 2012). In this study, we used the simple non-iterative clustering method in GEE (Achanta and Susstrunk 2017) to segment forest canopies in HWF and PDF. SNIC segmentation parameters (i.e. size, compactness, connectivity, neighborhoodSize, and seeds (Tassi and Vizzari 2020)) were selected based on trial and error and size of objects. The SNIC parameters were set as follows: size=5, compactness=0.1, connectivity=8, neighborhoodSize=60, and seeds=10. Mean, variance, and some other gray level cooccurrence matrix (GLCM) features such as angular second moment (ASM), contrast, entropy, and homogeneity were calculated as input predictors.

RESULTS AND DISCUSSION
The results of the RF model for optical, SAR, and optical + SAR data is listed in Table 1 for both pixel-based and object-based approaches. As Table 1 shows, the OBIA outperforms the pixelbased model in all datasets. Since OBIA segments pixels into objects, mixed pixels issue is mitigated resulting in reducing prediction errors. Object-based approach improved the R2 from 0.72 to 0.77 for optical imagery, 0.61 to 0.67 for SAR imagery, and 0.73 to 0.78 for optical + SAR data in comparison to the pixel-based approach. Other studies also have proven that the OBIA improves the results of AGB predictions (Hirata et al. 2018;Silveira, Silva, et al. 2019). Although OBIA provides better results, finding the appropriate parameters and optimum objects are the challenges. Thus, in this study, a trial and error method based on the RMSE metric and the objects' shape was used to select the best objects.
As shown in Table 1, in both pixel-based and object-based approaches, optical + SAR data provides the best results by improving the RMSE and R 2 values. Then, optical and SAR data are in the second and third places, respectively. It can be concluded that spectral and structural information provided by the combination of optical and SAR data improves the AGB estimation.
Optical datasets have been widely used in some studies for forest AGB estimation (Singh et al. 2012;Zhang et al. 2019;. Although optical imagery provide valuable spectral information, saturation and weather conditions limit their capabilities. Thus, other studies have used SAR data to overcome the limitations of optical imagery (Saatchi 2019). In addition, SAR signals are more sensitive to geometrical and physical characteristics of forest canopies. It is worth noting that SAR data may suffer from saturation depending on the wavelength and the forest canopy density (Zhou et al. 2016;Saatchi 2019). Therefore, some studies have been focusing on combining optical and SAR data to enhance the AGB estimation (Boudreau et al. 2008;Karlson et al. 2015;Dube et al. 2016;Berninger et al. 2018;Cao et al. 2018;Dang et al. 2019;Duncanson et al. 2020;Issa et al. 2020;C. Li, Li, and Li 2020). This study also emphasizes the importance of the integration of optical and SAR data for improved AGB estimation.  Table 1. Comparison of pixel-based and object-based AGB estimation in HWF and PDF forests using the RF model for optical, SAR, and optical + SAR data. Figure 2 shows the comparison between the RMSE values of pixel-based and object-based approaches for optical, SAR, and optical + SAR data. The OBIA approach enhanced the RMSE of AGB estimation about 5.32 Mg/ha, 8.9 Mg/ha, and 5.29 Mg/ha for optical, SAR, and optical + SAR data, respectively.

Figure 2.
Comparison the RMSEs of pixel-based and objectbased approaches for optical, SAR, and optical + SAR data.

CONCLUSION
This study compared the results of AGB estimation for pixelbased and object-based approach using optical, SAR, and optical + SAR data, separately. The RF regression model was used to predict the AGB of HWF and PDF temperate forests in northeastern New York State. In addition, multiple freely available optical (Landsat 5TM, Landsat 8 OLI, and Sentinel-2) and SAR (Sentinel-1 and global phased array type L-band SAR (PALSAR/PALSAR-2) data, and their combination were utilized. According to results, the object-based approach provided the best results regardless of datasets. Moreover, the combination of optical and SAR data enhanced the AGB prediction in both pixel-based and object-based approaches.