ESTIMATING SOIL MOISTURE USING POLSAR DATA: A MACHINE LEARNING APPROACH

Soil moisture is an important parameter that affects several environmental processes. This parameter has many important functions in numerous sciences including agriculture, hydrology, aerology, flood prediction, and drought occurrence. However, field procedures for moisture calculations are not feasible in a vast agricultural region territory. This is due to the difficulty in calculating soil moisture in vast territories and high-cost nature as well as spatial and local variability of soil moisture. Polarimetric synthetic aperture radar (PolSAR) imaging is a powerful tool for estimating soil moisture. These images provide a wide field of view and high spatial resolution. For estimating soil moisture, in this study, a model of support vector regression (SVR) is proposed based on obtained data from AIRSAR in 2003 in C, L, and P channels. In this endeavor, sequential forward selection (SFS) and sequential backward selection (SBS) are evaluated to select suitable features of polarized image dataset for high efficient modeling. We compare the obtained data with in-situ data. Output results show that the SBS-SVR method results in higher modeling accuracy compared to SFS-SVR model. Statistical parameters obtained from this method show an R of 97% and an RMSE of lower than 0.00041 (m3/m3) for P, L, and C channels, which has provided better accuracy compared to other feature selection algorithms.


INTRODUCTION
Due to the abundance of aerial measurement data from different sources, application of remote sensing data has gained a great deal of domain. Vast data distribution in remote sensing in comparison to field data has helped develop remote sensing technology in different fields (Tabatabaeenejad et al., 2015;Narvekar et al., 2015;Oh et al., 1992). One of such fields is an estimation of soil moisture from radar data. Scientists have proven that there is a significant effect in soil moisture anomaly and local weather condition, thus, providing accurate soil moisture data helps better understanding of the local weather condition. Radar potential with synthetic aperture radar (SAR) for calculating soil parameters has been known for more than thirty years (Tabatabaeenejad et al., 2015;Narvekar et al., 2015;Oh et al., 1992). Therefore, finding better solutions in predicting soil moisture based on radar data with synthetic aperture is of significant importance. Polarimetric synthetic aperture (PolSAR) images in both vast territory and high spatial resolution has made it an effective tool for calculating soil moisture compared to passive data (Narvekar et al., 2015). Effects of vegetation and intensity of signals affect the intensity of active data more significantly than passive data, and this characteristic is used for predicting soil moisture (Narvekar et al., 2015). Besides, lack of databases for these parameters on calculating soil moisture data, soil effects decomposition, and vegetation on redistribution coefficient (σ 0 ) is considered as a major hindrance for varied applications (Ahmad et al., 2010). To achieve accurate soil moisture estimation and avoid above-mentioned difficulties, a data-driven model is needed that can efficiently relate the inputs to the desired output and that is not computationally intensive (Ahmad et al., 2010). Artificial neural networks (ANN) are models assimilated based on humans learning the ability. These models are more powerful than noise data and they are able to provide a nonlinear multivariable relation between variables (Twarakavi et al., 2006). Lately, another data-oriented model such as support vector machine (SVM), which has a considerable popularity in the area of ANN, has gained popularity among researchers (Lin et al., 2009;Kalra and Ahmad, 2009). The SVM has been recognized as a basic kernel learning method and has been utilized broadly after successful application in the model identification and regression in different fields such as bioinformatics and artificial intelligence (Barh et al., 2015). Lyn et al. used SVM model for predicting windstorms and hourly rains in catchment basins in northern Taiwan and compared obtained data with ANN model. They demonstrated the superiority of SVM model over the ANN model (Lin et al., 2009). Carla and Ahmad used SVM model for predicting longterm sailing ship guidance with regards to oceans fluctuations in Colorado Riverfront (Kalra and Ahmad, 2009). Gail et al. applied SVM model to data over four and a half days for predicting soil moisture based on aerial variables, and compared the results with ANN model. Also, in their research, they concluded that SVM model has obtained a higher degree of accuracy in prediction compared to ANN model (Gill et al., 2006). Ahmad et al. have studied soil moisture downstream of Colorado River in the western United States based on data obtained from remote sensing using SVM regression technique for ten sites and compared results with forwarding and backward ANN model. They demonstrated that SVM model provides a better prediction for soil moisture compared to ANN and multivariate linear regression (MLR) models (Ahmad et al., 2010). Sarti et al. also calculated soil moisture with high accuracy. They applied polarimetric extraction technique of AIRSAR in C and L channels. This technique was applied after surface rigidness filtration and differentiation of vegetation from the surface of the earth (Sarti and Mascolo, 2012). On the other hand, many polarized properties for classifying land use/cover are obtainable from PolSAR images (Lardeux et al., 2007;Jafari et al., 2015). Therefore, due to this vast applicability of the method, it is inevitable to utilize different properties obtained from them such as classification and designating soil surface parameters. Thus, in some research, there is an endeavor to use feature selection technique to reduce feature space of these images. In 2008, Batia et al. used a genetic algorithm (GA) coded with integral numbers in order to choose suitable features. In this algorithm, the length of the chromosome is related to the number of features chosen and chromosome drives include the number of extracted features (Zhang et al., 2009). Hadadi et al. used GA and NN classification and Salehi et al. utilized GA and SVM to provide a method for choosing suitable features from radar images (Haddadi et al., 2011;Salehi et al., 2014). Also, Mao embarked on selecting features based on a hybrid algorithm. This method led to higher classification accuracy and also faster-applied algorithm (Mao, 2004). Many other scientists have developed methods in order to determine the link between radar signals and surface features. Their studies have focused on the effect of vegetation surface rigidness in determining soil moisture (Tabatabaeenejad et al., 2015;Oh et al., 1992;Attema and Ulaby, 1978). There have been a number of practical and theoretical evaluation models so far for determining soil moisture, with each model possessing some drawbacks and pitfalls. However, due to high reliability of radar images in extracting varied features on earth's surface, the authors are encouraged to evaluate the images of PolSAR to estimate soil moisture. The goal of this study is to propose an SVM model for predicting soil moisture from PolSAR image. We also propose optimal features of this method. The proposed model considers two scenarios. In the first scenario, this modeling is applied with all possible features in order to estimate soil moisture. In the second scenario, we choose an optimal category of features utilizing feature selection method. In this study, SFS and SBS algorithms are utilized for choosing optimal features for modeling of soil moisture based on images obtained from remote sensing radars.

Producing PolSAR features
In general, polarization features evaluated in this study have obtained from target product analysis. Features obtained from target product analysis models provide useful information about dispersion mechanisms. Target product analysis methods divided into two groups of coherent and non-coherent. (Table 1) (Cloude and Pottier, 1996). Coherent analysis methods divide scattering matrices into a set of corresponding scattering matrices with simpler or standard material (Cloude and Pottier, 1996), (Lee and Pottier, 2009). In non-coherent analysis method, the covariance matrix is divided into a set of corresponding second properties with simpler or standard material (Cloude and Pottier, 1996;Lee and Pottier, 2009). Pauli analysis is among the first and Freeman and Yamaguchi and anthropic parameters included in the second category (Lee and Pottier, 2009). Pauli analytic coefficients indicate scattering power of aims with single, double and volumetric surface scattering (Cloude and Pottier, 1996). Noncoherent Freeman analysis method divides covariance matrix into a set of covariance matrices of single, double and volumetric surfaces (Freeman and Durden, 1998). Therefore, it is possible to consider dispersion power emitted from these aims as a feature.

Support vector evaluation method
Original support vector machine (SVM) algorithm was proposed by Viping (Viping, 1987). SVM is an assimilated method of classification, regression and it is used for other training procedures. SVM take data to a new space based on predetermined category, in which data are dispersed and categorized in a linear fashion. Then a linear equation is meant to be found after searching for support vectors, the equation supports the highest margin between categories. Obtained data from these methods have high stability besides high accuracy (Viping, 1987). One type of SVM is support vector regression (SVR) which utilizing assimilated data for proposing a model and utilizing such models for predicting test data. The quality of SVM and SVR models is dependent on proper regulation of modeling of support vector machine (Nikraftar and Hasanlou, 2015). SVR has been identified as a reliable method for the last two decades (Ahmad et al., 2010;Nikraftar and Hasanlou, 2015). In this method, the aim is to evaluate an unknown function based on a limited number of samples. In SVR, entering X in a space with M number varied features is nonlinear and then based on these features a linear model designed from the following equation (Eq.1).
In Eq.1 ( ) refers to a set of linear transition and b is term bias. The legitimacy of regression function is dependent on good gamma and epsilon (ɛ, ϒ) parameters selection in kernel function.
In recent years there are various methods for determining optimal features in SVM by scientists (Ahmad et al., 2010;Nikraftar and Hasanlou, 2015). In this research we, use a grid search (GS) model to determine suitable features for modeling. In order to choose gamma and epsilon parameters, the 2 n range is assigned for each parameter in which the power n is a numeral range from minus to positive value. For evaluating n number, the quality of all sets is evaluated and those parameters with the lowest error are selected as optimal parameters.

Feature selection method
Feature selection (FS) is one of the important subjects in machine learning and it is considered in statistical pattern recognition (Bf and Ap, 2005). This is essential in many applications (such as in classification) since in these applications there are many features which are impractical or without informational value (Bf and Ap, 2005). In fact, forming a category is of transition and delivery property. This decline in data and regression algorithms helps easier and faster application. In some cases, determination coefficient (R 2 ) can be improved, in other cases, it helps more vivid and concise results from the subject matter (Bf and Ap, 2005). On the other hand, we know that the quality of SVR model depends on selected features, Hence in this research. Choosing unique and suitable features using optimization algorithms is the aim of this study. These features help the model to determine soil surface parameters on radar features to provide the best core base model for evaluating soil moisture. In this study two algorithms, including sequential forward selection (SFS) and sequential backward selection (SBS) (Guo et al., 2011). are utilized for varied feature selection of PolSAR features. In SFS which is highly simple search algorithm, all features are evaluated based on one single property after the designation of null for all sets of (Guo et al., 2011). Then in proximity to this feature with the best function sample is chosen. Then this feature is evaluated in wider sizes and this cycle continues unless there is a progress in the feature and the lowest quantity remains. The most important drawback in SFS is that the added feature is not deleted from sets of answers when proven unrelated (Aha and Bankert, 1996). In comparison, SBS acts differently and starts its work a set including all features and in each algorithm repetition, the feature chosen by evaluating function is deleted from the set. This continues until the number of features equal a fixed number. Likewise, the other method the main drawback here is the deletion of added feature even if it is suitable (Aha and Bankert, 1996).

PROPOSED METHOD
Modeling is based on PolSAR imaging that common methods for predicting soil moisture in remote sensing community (Sarti and Mascolo, 2012). So far there has been some research in the area of soil moisture based on radar features (Sarti and Mascolo, 2012). As mentioned previously the quality of SVR is dependent on proper regulation of SVR parameters. It is essential in this study to optimize SVR parameters in order to model soil moisture. Hence based on input data as mention in Table 1, two scenarios are suggested. In the first scenario, all SVR parameter are optimized based on using all PolSAR features then soil moisture estimation is done. In the second scenario, based on incorporating two selection methods (SFS and SBS,) the best features are chosen. In other parallel procedures, soil moisture model is done without optimization the SVR parameters but with selecting suitable features. Therefore, general procedures in this study follow four main parts, (1) feature generation from AIRSAR dataset, (2) soil moisture modelling based on SVR model and according to optimized features of SFS and SBS without optimizing SVR parameters, (3) soil moisture modelling based on SVR model and according to optimized features of SFS and SBS with optimizing SVR parameters and (4) comparing accuracy of different presented models (Figure 1).

Study area
The study area is located in the south of Oklahoma in United State and also is covered with vegetation ( Figure 2). This area is hot and dry in summer and moderate in winter (Miralles et al., 2010). Data obtained from these areas are gathered from the airborne platform and in-situ during soil moisture test in 2003 (Table 2). Figure 2. Study area in south Oklahoma.

Remotely sensed datasets
For incorporating in-situ data in this study, information gathered from SMEX03 campaign and was obtained in 2003 July 10 ('http://nsidc.org/data/amsr_validation/soil_moisture'). The amount of soil moisture was measured and analyzed in fourteen locations in Oklahoma. In this study among surface points, eight points were chosen randomly for model training and testing was done for remaining points. Also, PolSAR dataset acquired by AIRSAR instrument from this area. This dataset has a resolution of 6.6 meters in the angular direction. Every pixel in AIRSAR shows radar backscattering for every obtained vertical and horizontal (VV, HH, VH, and HV) polarizations. Each pixel includes backscattering information in three channels, C (5.31 GHZ), L (1.26 GHZ) and P (0.45 GHZ) as illustrated in Figure 3.

EXPERIMENT AND RESULTS
Since soil moisture modeling depends on radar features and on SVR parameters, hence in this area we evaluate soil moisture considering all features, features with optimization of SFS and features with optimization of SBS in two states of SVR parameter optimization and no optimization of SVR.

Soil moisture estimation with all features
As mentioned previously, the quality of SVR model results is dependent on suitably selected features. In this regards,  As it clears from obtained results, the proposed SVR model with considering P channel AIRSAR soil moisture image provide better modeling comparing to that C and L AIRSAR images.

SFS algorithm
In this scenario, four suitable features are chosen according to Table 4 are chosen for all selected features for proposing a support vector model based on SFS algorithm. As it is conceivable, better soil moisture modeling obtains when SVR parameters are optimized compared to the state in which SVR parameters are not optimized based on optimal values SFS algorithm proposed. Also, obtained results here show that proposed SVR model with considering P-channel AIRSAR soil moisture image provide better modeling that C and L AIRSAR images.

SBS algorithm
In this scenario, ten suitable features are chosen according to Table 7 are chosen for all selected features for proposing a support vector model based on SBS algorithm. Results of support vector model without optimizing SVR parameters are as Table 8 considering features that are chosen with optimized SBS algorithm and core model results are as for Table 9 with optimizing SVR model. As it is conceivable, better soil moisture modeling obtains when SVR parameters are optimized compared to the state in which SVR parameters are not optimized based on optimal values SBS algorithm proposed. Also, obtained results here show that proposed SVR model with considering P-channel AIRSAR soil moisture image provide better modeling that C and L AIRSAR images.

CONCLUSION
In this study tried to utilize data obtained from two optimal selection algorithms of SFS and SBS in three channels C, L and P which are related to 2003 soil moisture project in Oklahoma region in the United States. In the first scenario, optimal feature selection algorithms for soil moisture core model are compared. Among these models, support vector model has proven to more accurately evaluate soil moisture modeling. It has used SBS algorithm and considered all features and other features that selected by SFS algorithm. In the second scenario, optimal input features for support vector model when SVR parameter are not optimized and when SVR parameter are optimized are compared. Evaluating soil moisture and results show that the SVR model with optimized parameters has modeled soil moisture more accurately compared to the case in which SVR parameters are not optimized. In the third scenario, different soil moisture evaluation models in different channels are compared. Among three channels in AIRSAR data, P-band more accurately model soil moisture in all utilized models and L and C bands come later. The C-band has shown to provide inaccurate estimations comparing to other channels, it also provides a weaker result in optimal selective algorithm SFS among another algorithm for soil moisture modeling.