APPLICATION OF SPATIAL MODELLING APPROACHES , SAMPLING STRATEGIES AND 3 S TECHNOLOGY WITHIN AN ECOLGOCIAL FRAMWORK

How to effectively describe ecological patterns in nature over broader spatial scales and build a modeling ecological framework has become an important issue in ecological research. We test four modeling methods (MAXENT, DOMAIN, GLM and ANN) to predict the potential habitat of Schima superba (Chinese guger tree, CGT) with different spatial scale in the Huisun study area in Taiwan. Then we created three sampling design (from small to large scales) for model development and validation by different combinations of CGT samples from aforementioned three sites (Tong-Feng watershed, Yo-Shan Mountain, and Kuan-Dau watershed). These models combine points of known occurrence and topographic variables to infer CGT potential spatial distribution. Our assessment revealed that the method performance from highest to lowest was: MAXENT, DOMAIN, GLM and ANN on small spatial scale. The MAXENT and DOMAIN two models were the most capable for predicting the tree’s potential habitat. However, the outcome clearly indicated that the models merely based on topographic variables performed poorly on large spatial extrapolation from Tong-Feng to Kuan-Dau because the humidity and sun illumination of the two watersheds are affected by their microterrains and are quite different from each other. Thus, the models developed from topographic variables can only be applied within a limited geographical extent without a significant error. Future studies will attempt to use variables involving spectral information associated with species extracted from high spatial, spectral resolution remotely sensed data, especially hyperspectral image data, for building a model so that it can be applied on a large spatial scale.

Building ecological modeling framework has been the core of ecological research since the latter half of the 20 th century (Guisan and Zimmermann, 2000).It can provide a measure of a species' occupancy potential in areas not covered by biological surveys and consequently is becoming an indispensable tool to conservation planning and forest management.Technological innovation over the last few decades, especially in the fields of remote sensing (RS) and geographic information systems (GIS), greatly enhanced scientists' capacity to meet this challenge by giving them the ability to describe patterns in nature over broader spatial scales and at a greater level of detail than ever before (Guisan and Zimmermann, 2000).
Besides, advances in statistical techniques enhance the ability of researchers to tease apart complex relationships, while effectively incorporated of RS and GIS tools permit more accurate descriptions of spatial patterns and suggest directions for species distribution.Several alternative methods have been used to predict the geographical distributions of species (Elith et al., 2006).
Despite the extensive use of species distribution models, some important conceptual, biotic and algorithmic uncertainties need to be clarified in order to improve predictive performance of these models (Araújo and Guisan, 2006).For instance, species ecological characteristics, sample size, model selection and predictor contribution (Araújo and Guisan, 2006).Hence, it must be interpreted carefully of species' occupancy potential in areas not covered by biological surveys.Generally, models for species with broad geographic ranges and environmental tolerance tend to be less accurate than those for species with smaller geographic ranges and limited environmental tolerance (Thuiller et al., 2004;Elith et al., 2006).
According to species characteristic, the target species chosen for this study was Schima superba (Chinese guger trees, CGT,), which are widespread with elevation ranging from 300 to 2,300 m in central Taiwan, is one of the fine broad-leaf tree species and good for fitment.CGTs have high water content and dense crown closure, and high dispersal ability; therefore, they have excellent fire resistance characteristics and can grow to form a fire line (Liu et al., 1994).In this study, we consider different types of predictive models, as well as the complex environment of study area and the ways in which ecological relationships are affected by changes in scale.Hence, it was intended to develop models for predicting the potential habitat of the tree species, and has the following five steps.(1) In-situ data (CGTs) were collected from the Tong-Feng watershed, Yo -Shan Mountain area, and Kuan-Dau watershed in the Huisun study area in central Taiwan by using GPS.(2) GIS technique was used to overlay the layer of CGTs with environmental variables.(3) Three sampling schemes were created for model development and validation via different combinations of CGT samples taken from aforementioned three sites.(4) MAXENT, DOMAIN, GLM, and ANN were used to build predictive models.(5) The multi-modeling assessment approach was performed in this study.This included the application of a single model to data describing patterns at different spatial scales and the comparison of several models using a common dataset.

STUDY AREA
We chose a rectangular study area, encompassing the Huisun Forest Station, and it has a total area of 17,136 ha.The Huisun Forest Station is in central Taiwan, situated within 24 • 2´-24 • 5´ N latitude and 121 • 3´-121 • 7´ E longitude (Figure 1).This station is the property of National Chun-Hsing University.The entire study area ranges in elevation from 454 m to 3,418 m, and its climate is temperate and humid.In addition, the study area has nourished many different plant species more than 1,100 and is a representative forest in Taiwan.It comprises five watersheds, including two larger watersheds, Kuan-Dau at west and Tong-Feng at east.So far, all of the Chinese guger-tree samples (in situ data) were collected from the Tong-Feng, Yo-Shan, and Kuan-Dau sites in the Huisun study area by using a GPS.

Species occurrence data
We collected in situ CGTs data by using a GPS linked with a laser range, and then performed a post-processed differential correction that makes them have an accuracy of sub-meters.The dataset was eventually converted into ArcView shapefile format for later use.So far, CGT samples were collected from Tong-Feng (122), Yo -Shan (8), and Kuan-Dau (64) sites in the Huisun study area, respectively.
Pseudo-absences were generated for those models that required them (all except DOMAIN) by taking 500 samples randomly in study area.Three sampling designs (SD) were created for model development and validation through different combinations of CGT samples from aforementioned three sites (see figure 1).
SD-1: we randomly selected two-thirds of Tong-Feng dataset for building "Tong-Feng base model" and the remaining one-third of that dataset for model validation.
SD-2: we used the same base model built in SD-1 and only used samples taken from Yo-Shan about 0.5 km away from the Tong-Feng site to test the base model.SD-3: we still used the same base model in SD-1 and only used samples taken from the Kuan-Dau site about 5 km away from Tong-Feng site to test the base model.Then we evaluated the spatial extrapolation ability of the four models.

Environmental data
We collected digital elevation model (DEM) of 5 m resolution, orthophoto base maps (1:10,000), and two-date SPOT images.DEM was acquired from the Aerial Survey Office, Forestry Bureau of the Council of Agriculture, Taiwan.To meet the requirements of the study, the DEM was interpolated into 5  5 m grid size, geo-referenced to the coordinate system, TWD67 (Taiwan Datum, spheroid: GRS67) and Transverse Mercator map projection over two-degree zone with the central meridian 121E.The two-date SPOT-5 images were acquired from Center for Space and Remote Sensing Research, National Central University (CSRSR, NCU), Taiwan (© SPOT Image Copyright 2004 and 2005, CSRSR, NCU).System calibration and geometric correction with level 2B were performed on the images, and then they were rectified to the TWD67 Transverse Mercator map projection and resampled to 5 m resolution to be consistent with the layers from DEM.We chose the two-date SPOT-5 images (07/10/2004 and 11/11/2005) because they have the best quality with the amount of clouds less than 10%.
Elevation, slope, and aspect were generated from DEM by ERDAS Imagine software module, and hill-shade data layer by ArcGIS spatial analyst module.The ridges and valleys in the study area were used together with DEM to generate terrain position layer.The main ridges and valleys over the study area were directly interpreted from the orthophoto base maps; these lines were then digitized to establish the data layer by using ARC/INFO software for later use.The data layer in a vector format was then converted into a new data layer in a raster format by ERDAS Imagine software module, and then combined with DEM to generate terrain position layer (Skidmore, 1990).Vegetation indices were derived from the two-date SPOT images, one in autumn (11/11/2005), the other in summer (07/10/2004), based on the concepts stated in Hoffer (1978), and is expressed in equation ( 1

Model development
Predictive distribution models were formulated using the four different modeling algorithms.The modeling algorithms are briefly described below.
1) MAXENT can make predictions or inferences from incomplete information (Phillips et al., 2006), and may remain effective from small sample sizes (Kumar and Stohlgren, 2009).The principle of MAXENT is based on the concepts of thermodynamic entropy, and then is used to describe the probability distribution in several domains, and Bayesian statistics is for exploring the probability distribution of each (1) pixel when the entropy reach the maximum that the state would be extremely close to uniform distribution.That is, MAXENT would find out the type of probability distribution that is most likely occurring in the general state.The formula for MAXENT is shown in following equation ( 2 We define S A , the maximum similarity between candidate point A and the set of known record sites T j as equation ( 5): By evaluating S for all grid points in a target area, a matrix of continuous varying similarity values is generated which are not probability estimates, but degrees of classification confidence (Carpenter et al., 1993).
3) GLM is a generalization of general linear models.General class of linear models are made up of three components: random, systematic, and link function.Random component identifies response variable E(Y) and its probability distribution.Systematic component identifies the set of predictor variables (X 1 ,...,X k ).Link function identifies a function of the mean that is a linear function g(μ) of the predictor variables.The formula for GLM is shown in following equation ( 3 where α = constants β = regression coefficients X = predictor variable By using a logit link function that transforms the scale of the response variable, being able to relax the distribution and constancy of variances assumptions that are commonly required by traditional linear models (McCullagh and Nedler, 1989).Consequently, the GLM model is particularly suitable for predicting species distributions, and has been proven to be successful in various ecological applications (Guisan et al., 2002).
4) Back-propagation artificial neural network (BPANN) consists of input, hidden, and output layers.The input layer may contain information about individual training pixels including percent spectral reflectance in various bands and ancillary data such as elevation, slope, etc.Each layer consists of nodes that are interconnected.This interconnectedness allows information to flow in multiple directions as the network is trained.The weight of these interconnections is eventually learned by the neural network and stored.These weights are used during the output layer might represent a single thematic map land-cover class.
We set four layers (one input layer, one output layer, and two hidden layers) that can be trained using back propagation algorithm and particle swarm optimization (PSO) algorithm is implement.The structure of back propagation neural network is shown in figure 2.  Fabricius, 2000).Some common statistical measurements included producer's accuracy, user's accuracy, overall accuracy and Kappa coefficient (Jensen, 2005;Lillesand et al., 2008).

RESULSTS AND DISCUSSION
Initially, we depicted and compared the effect of micro-terrain feature in two watersheds as shown in high ridge in its surrounding.This U-shaped envelope makes solar radiation hard to totally reach all sites in Tong-Feng watershed.Hence, most of the sites have relatively low evaporation and keep a high humidity for the entire watershed.
In contrast, Kuan-Dau watershed has not only gentle sloping valley but also low ridge in its surrounding.This incomplete V-shaped envelope makes the west side of valley receive enough amount solar radiation, and thereby has a stronger evaporation.Hence, Kuan-Dau watershed was relatively drier and hotter than Tong-Feng watershed.To sum up, the topographic attributes of the Tong-Feng watershed are quite different from those of the Kuan-Dau watershed.Furthermore, table 2 summarizes the statistics of environmental variables for CGT samples in three sites (Tong-Feng, Yo-Shan and Kuan-Dau).
The table shows that species with broad elevation ranges and environmental tolerance.
Besides, hill-shade, by its definition, captures the effects of differential solar radiation due to a variation in slope angle, aspect and position, and shading from adjacent hills.According SPOT summer images of the study area (07/10/2004), which sun elevation of 71 degrees and sun azimuth of 91 degrees will be used.
The output shaded raster considers both local illumination angles and shadows.The output raster contains values ranging from 0 to 255, with 0 representing the shadow areas, and 255 the brightest.Then we got high mean value with CGTs sites since CGTs prefer to grow at gentler slopes and near-ridge positions.Therefore, we may make an indirect inference that CGTs always occur on the sites facing solar illumination.
We assigned sampling design-1 (SD-1) as base model to compare other sampling designs and overlaid environmental factors including five topographic factors and vegetation index derived from SPOT-5 satellite images.Owing to very large amount of calculation, we need to reduce dimension to improve calculating efficiency.Each method can calculate relative importance of six predictor variables with three predictive models for predicting the potential habitat of CGTs, as a reference for screening effective variable.The results showed that three predictor variables (elevation, slope and terrain position) are the relative important variable.Hence, we used three predictor variables to build models.The test results of kappa values for the four modeling methods for each of three scale designs are shown in table 3.As base model in SD-1, accuracy assessment results indicated that kappa values with MAXENT (0.70) was the best among them, followed by DOMAIN (0.62) and GLM (0.59), and ANN (0.58) was the last as these models were developed only from Tong-Feng sample set and tested by another independent Tong-Feng sample set.As shown in figure 4, predictions of MAXENT and DOMAIN models generated high potential areas of CGTs and considerably reduced the area of field survey to less than 6% (1,028 ha) of the entire study area (17,136 ha), and thus they were better suited for predicting the tree's potential habitat (also see table 4).
Next discuss how the extrapolation ability of those models (see table 3).
According to the base model, we extended prediction from one area to predict another and assessed the robustness of underlying relationships.As SD-2 and SD-3, the kappa values of these models originally from 0.58-0.70declined sharply to about 0.3, eventually near zero, with increasing spatial distance from 0.5 km to 5.0 km as the four models were tested by independent samples from Tong-feng, Yo-Shan, and Kuan-Dau sites, respectively.
Consequently, "Tong-Feng base models" built based on four algorithms failed to pass validation by Yo-Shan and Kuan-Dau test samples despite passing validation by Tong-Feng test samples.The outcome clearly indicated that the models merely based on topographic variables are most easily measured in the field and are considerably used because of their good correlation with observed species patterns in small spatial scale.Such variables usually replace a combination of different resources and direct gradients (e.g.climate, rainfall, etc) in a simple way (Guisan et al., 1999).However, the model performed poorly on spatial extrapolation from Tong-Feng to Kuan-Dau because the topographic attributes of the two watersheds are quite different from each other.Then, the models developed from topographic variables can only be applied within a limited geographical extent without significant error.

CONCLUSIONS
To build a modeling ecological framework could tease apart complex species-environment relationship and permit more accurate description of spatial patterns and suggest directions for future research.This study represents a broad comparative exploration of species ecological characteristics with different organisms and processes respond to their environments, and the ways that these responses vary geographically.
As shown in SD-1 (small spatial scale), the performance of methods from highest to lowest was: MAXENT, DOMAIN, GLM, ANN.MAXENT and DOMAIN models were the two most capable for predicting a single species.However, the outcome clearly indicated that the models merely based on topographic variables performed poorly on spatial extrapolation from Tong-Feng to Kuan-Dau because the humidity and solar illumination affected by micro-terrain of the two watersheds are quite different from each other.Therefore, the models developed from topographic variables can only be applied within a limited geographical extent without significant error.Future studies will attempt to use variables involving spectral information associated with species extracted from high spatial, spectral resolution remotely sensed data, especially hyperspectral image data, for building a model so that it can be applied on a large spatial scale.

Figure 1 .
Figure 1.Location map of the study area coefficient linear predictor normalize = a constant for numerical stability Z: a scaling constant that ensures that P sums to 1 over all grid cells 2) DOMAIN derives a point-to-point similarity metric to assign a classification value to a potential site based on its proximity in environmental space to the most similar occurrence.The Gower metric(Gower, 1971) provides a suitable means of quantifying similarity between two sites.The distance of d between two points A and B in a Euclidean p dimensional space is defined as equation (3): between 0 and 1 for points within the ranges use in Equation 3,

Figure 2 .
Figure 2. The structure of back propagation artificial neural network 3.4 Model Validation Evaluation methods of the different samplings, we used split-sample validation.The first one (training dataset) be used to build model; the other one (test dataset) be used to validate the model.For each model, predicted the response of the remaining data, and calculated the error matrix (De'ath andFabricius, 2000).Some common statistical measurements included producer's accuracy, user's accuracy, overall accuracy and Kappa coefficient(Jensen, 2005;Lillesand et al., 2008).

Figure 3 .
Figure 3. Perspective-viewing map showing the Huisun Forest Station International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B8, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia

Table 1 .
Microterrains of the two watersheds

Table 2 .
The statistics of environmental variables for CGTs in the two watersheds International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B8, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia

Table 3 .
Comparison of the accuracies of four models for predicting CGTs potential habitats with three sets of test data

Table 4 .
The distribution statistics of three models predicting the potential habitat of CGTs