THE UNCERTAINTIES ON THE GIS BASED LAND SUITABILITY ASSESSMENT FOR URBAN AND RURAL PLANNING

The majority of the research on the uncertainties of spatial data and spatial analysis focuses on some specific data feature or analysis tool. Few have accomplished the uncertainties of the whole process of an application like planning, making the research of uncertainties detached from practical applications. The paper discusses the uncertainties of the geographical information systems (GIS) based land suitability assessment in planning on the basis of literature review. The uncertainties considered range from index system establishment to the classification of the final result. Methods to reduce the uncertainties arise from the discretization of continuous raster data and the index weight determination are summarized. The paper analyzes the merits and demerits of the “Nature Breaks” method which is broadly used by planners. It also explores the other factors which impact the accuracy of the final classification like the selection of class numbers, intervals and the autocorrelation of the spatial data. In the conclusion part, the paper indicates that the adoption of machine learning methods should be modified to integrate the complexity of land suitability assessment. The work contributes to the application of spatial data and spatial analysis uncertainty research on land suitability assessment, and promotes the scientific level of the later planning and decision-making. * Corresponding author


INTRODUCTION 1.1 The research on spatial data and spatial analysis uncertainty
Due to knowledge limitation, measurement inaccuracy and error propagation during information processing, geographical information always possesses various amounts of uncertainty.(Goodchild and Gopal, 1989;Beard et al., 1991;Buttenfield, 1991;Zhang and Goodchild, 2002;Xiao et al., 2007).Therefore, data users should be aware that "all spatial data are wrong, but some are useful (outside of some exceptions where the data becomes legally the reality)" (Devillers et al., 2010).The early computer mapping exercise done at the Harvard Laboratory for Computer Graphics and Spatial Analysis in the late 1970s revealed that imperfection is inherent to spatial data and can directly influence the reliability of spatial analysis output (Chrisman, 2006).Works on the uncertainty of spatial data increased significantly with the arrival of GIS in the early 1980s with their capability to integrate spatial and non-spatial data, including the evaluation of specific data quality elements of vector, raster, and Digital Elevation Model (DEM) data, also remote sensing images (Devillers et al., 2010).Wu (2002) established a frame of GIS uncertainty and summarized the methods towards GIS data uncertainty.Besides, emphasis should also be placed on spatial analysis uncertainty since "all models are wrong, but some are useful" (Box, 1976).The national economic and social research council of America (ESRS) presented "The error propagation in geographic information systems" as one of the priority subject.The centre for spatial data analysis (nex-pri) in the Netherlands also promoted the question of "The theory of spatial analysis-error propagation in spatial analysis" while formulating their research plan (Wu et al., 2002).Shi (2015) has modelled the uncertainties of overlay analysis and buffer analysis to control the uncertainty propagation of spatial data during spatial analysis.
The disjunction of the uncertainty research with user requirement and practical application has now become one of the major problems which demands great emphasis (Devillers et al., 2010).The phenomena is partly generated by the deficiency of uncertainty research on the whole process from data acquisition and preparation, spatial analysis operation and results display from the perspective of certain application.The comprehension of the nature of the problem should get done before any best possible solutions are promoted.(Devillers et al., 2010).

The deficiency of reliability assessment on the application of GIS in planning
As the most important and fundamental information processing platform for digital planning, the application of GIS in planning lacks the consideration of uncertainty.The application of the ArcGIS has brought about innovation into the quantitative analysis of planning, causing huge impact to the traditional manner.The underlying assumption for the extensive application of GIS is that there is a positive correlation between the data processing capability and information availability on one hand and the quality of planning on the other (Malczewski, 2004).Due to the spatiotemporal complexity of the city system and the knowledge deficiency of planners towards the scientific principles of the methods and tools, the application of the spatial analysis is generally over-simplified without a reasonable consideration of uncertainty.Some sociologists even doubt the function of the technologies in planning and decisionmaking, arguing that planning involves a wide range of "untangle" activities such as advice giving, storytelling, myths, and other metaphors and rhetorical devices (Klosterman, 2001).The untangled activities are critical for the best balance and the optimal benefits of planning.The pure dependence of the information technology might bring about negative impact on the equity, accuracy and quality of real life (Malczewski, 2004).

The overview of the GIS based land suitability assessment uncertainty
Uncertainty has been considered in land suitability assessment, yet still insufficient especially in the field of planning.Land suitability assessment is an approach planners employ to identify the most appropriate spatial pattern for future land allocation according to specify requirements and preferences (Hopkins, 1977;Collins et al., 2001;Malczewski, 2004).The GIS-based land suitability assessment can be traced back to the applications of hand-drawn overlay techniques used by American landscape architects in the late 19th and early 20th (Collins et al., 2001).The spatial weighted overlay analysis is regarded as the most significant application of GIS in planning and management (Hopkins, 1977;Brail and Klosterman, 2001;Collins et al., 2001;Malczewski, 2004).With the continuous promotion of the multiple-planning integration, land suitability assessment will exert a much more scientific and fundamental role on the multiple planning projects.The land suitability assessment procedure in planning generally employs the multiple criteria decision-making (MCDM) method.The flow chart along with the possible uncertainties is presented in Fig. 1.The error in data and the discrepancy of methods can inevitably lead to uncertainty of the final suitability result and further decision-making (Hu et al., 2007a).In this regard, uncertainty exists in data preparation, index standardization, weights determination and result visualization.Research has been done on the uncertainties of land suitability assessment and ecological sensitivity assessment.For instance, the uncertainties of the DEM resolution, interpolation method and scale effect of terrain description (GAO, 1997;Tang et al., 2003;Chen et al., 2005;Shi, 2015), index standardization (Bolliger and Mladenoff, 2005;Zhou et al., 2007;Zhang et al., 2010), weight determination (Hu et al., 2007b;Liu et al., 2012;Zhuo, 2012) and result classification and visualization (Hu et al., 2007a).
Some experts focus on the uncertainty analysis of the whole process of land assessment (Hu et al., 2007a;Yu, 2010).However, this macro perspective neglects the specificity of planning and the custom of planners, making it difficult to make an anticipative difference in the actual application.The ignorance of uncertainty will undoubtedly exert negative impact on the scientific nature and accuracy of the later planning.The uncertainty might gradually enlarge as the planning procedure advances, resulting in the frequent modification of planning schemes, illegal land use phenomena and ecological environment deterioration.
The paper establishes a framework of the GIS based land suitability assessment uncertainty research, ranging from index system establishment (the uncertainties of the data itself, the discretization of continuous raster data, index weight determination), to the classification of the final result.

THE UNCERTAINTY OF THE INDEX SYSTEM
2.1 The uncertainty of index data 2.1.1The uncertainty of raw data: Due to the inherent uncertainty of the external world which is a complex, multidimensional and nonlinear system, the knowledge limitation, the measurement inaccuracy and the error propagation during data processing and analysis (Hu et al., 2007a;Longley et al., 2011;Shi, 2015), the data collected for the land suitability assessment possess certain uncertainty.
The reliability of the assessment result depends on the data quality and the depth of the land and land-use related knowledge (Rossiter, 1996).Land suitability assessment is a comprehensive multiple criteria analysis involving the consideration of soil condition, hydrology, meteorology, geology, humanities, etc.The reliability of these data demands emphasis.For instance, the elevation, slope and aspect data derived from DEM are generally used in land suitability assessment to characterize topographical features, while the reliability of DEM to represent spatial information is affected by the original sampling and the interpolation method applied to generate it (Shi, 2015).

The uncertainty of applying raster data as the assessment object:
The raster data model has traditionally been regarded as the more appropriate approach for land-use suitability applications as the raster data structure is areaoriented (Malczewski, 2004).However, uncertainty exists on applying raster data to represent spatial information, such as the uncertainty to explain attribute value and identifying the location of a set-point (Wang and Du, 2007).
Accuracy loss is also unavoidable during the conversion from vector to raster.Some of the vector data collected for the land suitability assessment like social and economic data and land use data need to be converted to raster for the spatial overlay analysis.The spatial resolution of the raster data model is determined by the size of the grid cell.In consequence, the adoption of large gird cells may risk losing the accurate location of the spatial element, while small ones can avoid the problem but may increase the overall grid cells and extend the data processing time (Liu, 2005).Users generally choose different analysis scales according to their specific application requirements, thus the loss of accuracy differs from one another.

The uncertainty on the discretization of continuous raster data
The discretization of continuous features can result in the loss of the spatial distribution of a feature (Bolliger et al., 2005).In terms of land suitability assessment, continuous features like elevation, slope and aspect are required to be discretized into single-feature classification for the raster weighted overlay in land suitability assessment.However, the information loss of the process is generally ignored by planners.Take slope as an example, planners generally divide slope into five categories: 0°-5° as the most suitable, 5°-8° as suitable, 8°-15°as marginally suitable, 15°-25° as not suitable, those greater than 25° as the least suitable.Among them, 8.01° and 14.99° belong to the same level although they differ a lot form each other.While 7.99° and 8.01° belong to two different levels even the gap is pretty small (Zhou et al., 2007;Liu et al., 2012).
There are many approaches to explore the uncertainties of continuous raster data discretization, such as grey system method, rough sets and fuzzy mathematics.Zhou et al. (2007) adopted grey system to reduce the uncertainty.He pointed out that sampling should be as balanced as possible and the original data should be standardized before calculating the correlation coefficient.The research of the information loss from the continuous numeric features discretization also roots in the simple application of machine learning methods such as rough sets.Hu et al. (2008) constructed a numeric feature selection algorithm based on neighborhood rough set model to directly deal with the numeric attribute, saving the process of discretization.To solve the uncertainty, some experts apply the membership function in fuzzy mathematics to describe the degree of continuous data association (Zhang et al., 2010;Bolliger et al., 2005).Fuzzy sets are usually combined with MCDM in the field of suitable analysis.Corona et al. (2008) combined fuzzy sets and the weight linear combination in multiple criteria methods to conduct the land suitability assessment in south Italy, the result applied rational number 0 and 1 to represent the suitability grades, 0 as not suitable, 1 as very suitable.However, problem exists as how to accurately determine the membership function in fuzzy logic method.The determination has certain subjectivity because the cognition and understanding of the same fuzzy concept differs from one another.Some research amended the model to solve the problem and achieved higher reliability.Experts applied the cloud model to represent the fuzziness of quantitative data based on the randomness of the membership function (Hu et al., 2007b;Fan et al., 2008).Cloud model is a quality-quantity interchangeable model combining traditional fuzzy mathematics and probability statistics.It integrates fuzziness and randomness to realize the conversion between qualitative concept and quantitative value (Hu et al., 2007b

The uncertainty of index weight determination
The land suitability assessment is comprehensively affected by objective laws also the development priorities of the region, making the determination of the index weights intractable and complicated.Research on the uncertainty of the index weight determination is abundant.Hu et al. (2006) presented a method integrating subjective and objective information to determine the final index weights.The method applies an improved Delphi method to acquire the subjective weights, and an improved correlation analysis method for objective weights.Huang et al. (2007) indicated the defects of the comprehensive assessment methods generally applied like Analytic Hierarchy Process (AHP), fuzzy comprehensive method and Neural Network based method.He argued that these methods can only offer a comprehensive assessment result without a certainty measurement.They are unable to deal with the unknown factors as well.He thus promoted a comprehensive assessment method based on the D-S evidence theory, which is able to manage two different kinds of uncertainty (inconsistency and nonspecificity) compared to Probability Theory which can only manage randomness.Zhuo (2012) adopted the entropy method to calculate weights.Some research applies variable coefficient to expand the internal differences in weights so as to increase the differentiation among the classes (Liu et al., 2012).On the whole, the methods can be divided into subjective weighting method (Delphi method, grey conjunction analysis, AHP, etc.) and objective weighting method (mean square deviation algorithm, range method, entropy method, etc.) (Guo, 2002) .Subjective weighting method means hiring experts who possess land knowledge to grade the indexes and then calculate the weights, while objective weighting method implies the adoption of mathematical theory and method to calculate the weights from observation data.One of the prior works for land suitability assessment is to integrate the merits of the two methods and overcome their demerits, such as the fuzziness and randomness of experts' understanding towards the indexes and weights, also the over-emphasis on quantitative method while ignoring the subjective attitude of decision-makers (Hu et al., 2007a).

THE UNCERTAINTY OF THE ASSESSMENT RESULT CLASSIFICATION
As the final step of the land suitability assessment, the raster weighted overlay analysis is critical as the result acts as a bridge between mappers to decision-makers.Uncertainty in inherent in map classification and can generate an unreliable spatial pattern.However, uncertainty is largely ignored in choropleth mapping (Koo et al., 2017).The classed map is a medium that displays the specific land suitability conditions of a region to urban planners, public officials and also the public.Neglecting the uncertainty of the classification will definitely impact the accuracy of the assessment, leading to decision risk.

The uncertainty of choropleth map classification
Maps construct a communication channel between mappers and users (Mark, 1974;Jenks and Caspall, 1971), making the reliability of the map significant especially when the map is produced to help explore spatial data or construct spatial knowledge (Xiao et al., 2007).Harvey (1970) considers classification as "perhaps the basic procedure by which we impose some sort of order and coherence upon the vast inflow of information from the real world".However, important information might risk being lost during the communication if the map is badly prepared (Jenks and Caspall, 1971;Traun and Loidl, 2012).The acceptance of the misclassification level of a map is determined by purpose of the map (Xiao et al., 2007).As a result, the production of a choropleth map must put great emphasis on the application purpose and the corresponding users (Mark, 1974).Some papers focus on the uncertainty of the choropleth map classification and related solutions are proposed (Jenks and Caspall, 1971;Xiao et al., 2007).It is certain that data inaccuracy and the error propagation during information processing can bring doubt into the reliability of the classification.
The reliability of GIS tools demands emphasis as well.The process of spatial analysis follows a series of well-defined stages: problem formulation, planning, data gathering, exploratory analysis hypothesis formulation, modeling and testing, consultation and review, etc.However, GIS related software tools only address the middle sections of the process.Besides, GIS software places almost no constraints upon a user's selection of classification method although classification should be interpreted in terms of purpose as well as method (Smith et al., 2015).As a result, some resulting maps might fail to properly reflect data-inherent patterns and a part of which might even indicate spatial configurations that have no reliable statistical or contextual logic (Maceachren and Ganter, 1990).
Uncertainty also exists during the final classification of the land suitability assessment result.Multiple factors influence the certainty of the classification, such as data type, class numbers, intervals and the autocorrelation of the spatial data (Slocum et al., 2005;Traun and Loidl, 2012;Smith et al., 2015).Planners ought to avoid the pure application of the software and select the most suitable classification method according to the data feature and the actual demand of planning.

The various methods for data classification based on ArcGIS:
ArcGIS provides a variety of classification methods for raster data, including manual, equal interval, defined interval, quantile, natural breaks, geometrical interval and standard deviation.A brief introduction of the methods can be checked in table 1 (ESRI, 2014).

Manual
The analyst allows users to define their own classes, they can manually add class breaks and set class ranges that are appropriate for their data.

Equal interval
Equal interval divides the range of attribute values into equal-sized subranges.It is best applied to familiar data ranges, such as percentages and temperature.

Defined interval
Defined interval allows users to specify an interval size used to define a series of classes with the same value range.ArcMap will determine the number of classes based on the interval size and the range of all field values.

Quantile
Each class contains an equal number of features.The method is well suited to linearly distributed data.Quantile assigns the same number of data values to each class.

Natural breaks
Classes are based on natural groupings inherent in the data.Class breaks are identified that best group similar values and that maximize the differences between classes.It is data-specific classification and not useful for comparing multiple maps built from different underlying information.

Geometrical interval
The method scheme creates class breaks based on class intervals that have a geometrical series.This algorithm was specifically designed to accommodate continuous data.

Standard deviation
The Standard deviation classification method shows users how much a feature's attribute value varies from the mean.ArcMap calculates the mean and standard deviation.The difference among the classification methods for univariate analysis is remarkable (Chang,1978;Brewer and Pickle, 2002).The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W7, 2017 ISPRS Geospatial Week 2017, 18-22 September 2017, Wuhan, China methods is essential.The natural breaks (Jenks) is broadly used by planners as a default priority for the classification of raster data (Claggett al., 2004;Ren, 2012;Chen et al., 2013;Luo, 2016).The next part will explore the applicability of the method in land suitability assessment.

The applicability of the natural breaks (Jenks):
The natural breaks method in ArcGIS refers to a classification method for choropleth map put forward by Jenks (Jenks and Caspall, 1971;Jenks, 1977).The method is equivalent to unconstrained clustering (Fisher, 1958).Breaks which are selected to separate values where large changes in value occur are typically uneven.The result can be significantly affected by the number of classes (Smith et al., 2015).The method is extensively applied in GIS packages such as ArcGIS (Dent, 1999;Koo et al., 2017).
Jenks Natural Breaks algorithm (Smith et al., 2015): " Step 1: The user selects the attribute, x, to be classified and specifies the number of classes required, k; Step 2: A set of k-1 random or uniform values are generated in the range [min{x}, max{x}].These are used as initial class boundaries; Step 3: The mean values for each initial class are computed and the sum of squared deviations of class members from the mean values is computed.The total sum of squared deviations (TSSD) is recorded; Step 4: Individual values in each class are then systematically assigned to adjacent classes by adjusting the class boundaries to see if the TSSD can be reduced.This is an iterative process, which ends when improvement in TSSD falls below a threshold level, i.e. when the within class variance is as small as possible and between class variance is as large as possible.True optimization is not assured.The entire process can be optionally repeated from Step 1 or 2 and TSSD values compared." Although the natural breaks (Jenks) method has been broadly applied in univariate classification, its applicability in the land suitability assessment in planning still requires verification.
Research has shown that the method considers only variances among attribute estimates while ignoring their uncertainties (Jenks, 1977;Koo et al., 2017).According to the development and application of the natural breaks (Jenks) method besides the field of suitability assessment, it is generally applied to the classification of some initial data based on their patterns, such as mortality map (Brewer and Pickle, 2002), population density map (Xiao et al., 2007), rainfall map (Golian et al., 2010), median household income map (Koo et al., 2017).The land suitability assessment is a comprehensive analysis which covers index standardization, weight determination and raster weighted overlay before the final classification procedure, making the final classification much more complicated than the applications mentioned above.The accomplishment of the former procedures actually delivers some kind of expectation to the final result.However, the adoption of the natural breaks (Jenks) method will risk ignoring the expectation while purely focus on the pattern of the data itself.
Besides, ESRI (2014) indicates that "natural breaks are dataspecific classifications and not useful for comparing multiple maps built from different underlying information".In terms of land suitability assessment, the assessment rule of different regions should not differ two much from each other as planning site selection follows certain universal law despite of the peculiarity of a region.As a result, the adoption of the natural breaks (Jenks) method might end up distinct difference between the land suitability levels of the same region from two different scales of analysis.The paper applies the natural breaks (Jenks) in a town named Songbai inside Shennongjia Forest District from the forest district scale and the town scale.The results differ a lot from each other, especially in the low level where the difference in area percentage reach up to 12.2% (Fig 4, Table 2).(Jenks and Caspall, 1971;Slocum et al., 2005;Traun and Loidl, 2012;Smith et al., 2015).As to the land suitability assessment in planning, there is no uniform standard for the numbers of class.The current existing class numbers including 3 classes (Corona, 2008;Liu et al., 2012;Ren, 2012), 4 classes (Chen et al., 1999;Chen et al., 2010) and 5 classes (Zhou et al., 2007;Zhuo, 2012).Research indicates that the number of breaks should be an odd value as even number of classes misses a central class.With a number of classes less than 4 or 5 the level of detail obtained may be too limited while more than 9 classes makes it difficult to distinguish key differences between zones (Smith et al., 2015).Integrating actual applications with expert advice, the paper applies 5 classes for the demonstration.

The location of intervals:
The location of the intervals co-determined by the class number and classification method is also critical.Robinson (1960) claimed the selection of intervals as the most important problem.Research suggests that class breaks should be located at "critical" values derived from field observations or a particular known or unknown bias held by the map-maker (Jones, 1930).Various methods are applied for the classification of the land suitability assessment besides those provided by ArcGIS, such as the combination of limiting conditions with weighted index method (Chen et al., 1999), and machine learning methods like neural network (Jiao, 2004;Hu et al., 2005), K-Means clustering (Zhou et al., 2007) and ant colony optimization (Yu, 2010).The main challenge is to integrate the expectation delivered from the former procedures and the specific pattern of the data to achieve a better result.

The spatial autocorrelation of geographical data:
Almost all geographical data possesses the nature of autocorrelation while it is commonly ignored as the classification for choropleth maps is generally based on nonspatial attribute values (Traun and Loidl, 2012).According to Tobler's first law of geography (Tobler, 1970), attribute values that are close in space tend to have similar values and are more likely to fall into the same class interval (Mak and Coulson, 1991).Research has revealed that land-use spatial data follow the first law of geography and possess certain autocorrelation (Xie et al., 2006;Qiu et al., 2007;Gao et al., 2010).To ensure the accuracy of the land suitability assessment analysis, the data acquired should be as precise as possible.Elevation and slope data derived from DEM ought to get a resolution as high as possible.Consequently, the final result of the assessment might end up being too trivial in some areas (Fig 5), leaving erratic "islands" within a certain class just because the grid value falls slightly above or below the class interval.Traun and Loidl (2012) concluded the current spatially aware classification methods and their possibility of improvement.They also presented a new approach named "Regioclassification", which adapts to the degree of spatial autocorrelation in data through the combination of the Moran's I scatter plot with the Fisher-Jenks algorithm.According to the research, the classifications for spatially autocorrelated data utilizing "Regioclassification" are visually less complex than those employing non-spatial classification approaches.The method can be further explored for classification of land suitability.The applicability of machine learning methods like neural network and K-means in classification also demands further investigation.As discussed above, land suitability assessment is much more comprehensive and complicated than pure data classification and clustering.Thus, machine learning methods ought to be modified to integrate the expectation delivered from the former procedures and the specific pattern of the data to achieve a better classification result.Planning possesses high spatiotemporal complexity as a particular application field of spatial data and spatial analysis.The information technology application in this field should integrate this complexity with the actual demand so as to guarantee the equity and accuracy of the analysis, so should the research of the land suitability assessment uncertainty.

Figure 1 .
Figure 1.The flow chart of the land suitability assessment procedure in planning The paper takes the land suitability assessment result of Shennongjia Forest District in China to demonstrate the various methods provided by ArcGIS.Equal interval, quantile, natural breaks and geometrical interval allow users to define the class numbers.The paper divides the raster data into 5 classes according to the research conclusion suggested in 3.3.1.Fig 2 shows that the results vary a lot from each other although they apply the same raster data into the same numbers of classes.Defined interval and standard deviation automatically determine the number of classes as 6 (Fig 3).The classification results differ from each other as well.

Figure 5 .
Figure 5.The erratic "islands" in the land suitability map of Shennongjia Forest District 4. CONCLUSION AND DISCUSSION Specific to the MCDM planners generally adopt for land suitability assessment, the paper discusses the uncertainties of the index system establishment (the uncertainties of the data itself, the discretization of continuous raster data, index weight determination) and the classification of the final result.Further research can be carried out on the uncertainty of index selection, the quantization of the uncertainty in each link and the uncertainty propagation during the whole process.The

Table 1 .
A brief introduction of classification methods provided by ArcGIS.(Thecontent is derived from ArcGIS 10.2 Help:Classifying Numerical Fields for Graduated Symbology.)