MODELLING THE RELATIONSHIP BETWEEN LAND SURFACE TEMPERATURE AND LANDSCAPE PATTERNS OF LAND USE LAND COVER CLASSIFICATION USING MULTI LINEAR REGRESSION MODELS

The threat of the ailments related to urbanization like heat stress is very prevalent. There are a lot of things that can be done to lessen the effect of urbanization to the surface temperature of the area like using green roofs or planting trees in the area. So land use really matters in both increasing and decreasing surface temperature. It is known that there is a relationship between land use land cover (LULC) and land surface temperature (LST). Quantifying this relationship in terms of a mathematical model is very important so as to provide a way to predict LST based on the LULC alone. This study aims to examine the relationship between LST and LULC as well as to create a model that can predict LST using class-level spatial metrics from LULC. LST was derived from a Landsat 8 image and LULC classification was derived from LiDAR and Orthophoto datasets. Class-level spatial metrics were created in FRAGSTATS with the LULC and LST as inputs and these metrics were analysed using a statistical framework. Multi linear regression was done to create models that would predict LST for each class and it was found that the spatial metric “Effective mesh size” was a top predictor for LST in 6 out of 7 classes. The model created can still be refined by adding a temporal aspect by analysing the LST of another farming period (for rural areas) and looking for common predictors between LSTs of these two different farming periods.


INTRODUCTION
Land Surface Temperature (LST) is a very important measure in urban areas because as it increases, the possibilities of drought and heat stress also increase.It has been found that the temperatures are indeed greater in urban areas compared to rural areas (Sundara Kumar et al., 2012).
It is clear that the main cause of the increasing temperatures is urbanization or the transformation of natural surfaces to buildings and other structures.These natural surfaces are often composed of plants and soil so they release water vapour that keeps the air cool in the area.On the other hand, materials that are being used as roofs of buildings have different thermal properties which absorbs more of the sun"s energy thus results to high temperature in the urban area.
Heat stress is a very big issue especially in urban areas and also in places where drought could occur.Areas in and around cities are generally warmer than most rural areas which should alarm people in those areas of the possibility of heat stress and its advanced effects like heat stroke which sometimes could be fatal.Furthermore, continuous urban development reduces vegetative cover and adds more surfaces that absorb heat like rooftops and paved roads.This also adds to the already growing harmful effect of climate change.It is known that there is indeed a relationship between land use and LST.Therefore, it is also very important to have a way to model LST based on Land Use Land Cover (LULC) so this relationship can be quantified.This model can be included in urban planning especially in plans of major urbanization to see the effect it would have on the LST in the area.

OBJECTIVES
This study aims to derive land surface temperature from a single Landsat 8 image and examine the effects of landscape patterns from LULC classification on surface temperature and create a regression model to predict surface temperature from class-level spatial metrics.The study also aims to develop a statistical framework to be used in analysing these relationships.

Study Area
The study site that was selected for this study was an area in Talisay City, Negros Occidental, Philippines that showed the contrast between urban and rural land classes.The Landsat 8 image was taken during the period just before harvest season where crops are almost fully grown and ready to be harvested.The Landsat 8 image was taken around 02:05 AM local time and it was also made sure to have little to no cloud cover or haze in the study area so as not to distort any calculations for land surface temperature.

Land Surface Temperature (LST) Derivation
LST was derived from the Landsat 8 image.Only band 10 was used because for single channel methods, LST derived from band 10 gives higher accuracy than that of band 11 (Yu et al, 2014).The following equation was used to convert the digital number (DN) of Landsat 8 TIR band 10 into spectral radiance (USGS, 2015): where L λ = spectral radiance Q cal = digital number (DN).M L = radiance multiplicative scaling factor for the band (RADIANCE_MULT_BAND_n from the metadata) A L = radiance additive scaling factor for the band (RADIANCE_ADD_BAND_n from the metadata) The spectral radiance is then converted to brightness temperature, which is the at-satellite temperature (T B ) under an assumption of unity emissivity using the following equation (USGS, 2015): where T B = at-satellite brightness temperature in Kelvin (K) K1 and K2 = band-specific thermal conversion constants from the metadata Next, NDVI is computed using the NIR and Red band of the Landsat 8 image since it will be used in determining land surface emissivity (LSE).NDVI is calculated using the following equation (Tucker, 1979): (3) In order to get the LSE, the vegetation proportion is obtained using the following equation (Carlson & Ripley, 1997): where P V = vegetation proportion NDVI = normalised difference vegetation index NDVI min = minimum NDVI value NDVI max = maximum NDVI value LSE is then computed using the following equation (Sobrino et al., 2004): LST can now be computed (in degrees Celsius) using the atsatellite brightness temperature and the land surface emissivity in this equation (Artis & Carnahan, 1982): where LST = land surface temperature T B is the at-satellite brightness temperature is the wavelength of emitted radiance ( = 11.5µm)(Markham & Barker, 1985) ρ = h * c/σ (1.438 * 10 -2 m K) σ = Boltzmann constant (1.38 * 10 -23 J/K) h = Planck"s constant (6.626 * 10 -34 Js) c = velocity of light (2.998 * 10 8 m/s)

Land Use Land Cover Classification
Detailed land use land cover classes were obtained by using LiDAR data and orthophotos with an object-based image analysis (OBIA) approach.Different features from the LiDAR data and orthophotos were created and used to both segment and classify the study area.Using the pit-free canopy height model (CHM) (Khosravipour et al, 2014), the unclassified image was separated into ground and non-ground objects by using contrast split segmentation (Trimble, 2014).The ground and non-ground objects were further segmented using multiresolution segmentation (Trimble, 2014) and refinement was done by removing small and irrelevant objects.Training and validation points were collected by visual interpretation using the orthophoto and LiDAR derivatives like CHM and intensity.
After collection, the classes found in the study area were Sugarcane (SC), Rice (Ri), Mango, Fallow (Fa), Bare Land (BL), Builtup (Bu), Grassland (Gr), Road (Rd), Trees (Tree), and Water.A support vector machine (SVM) was used to classify the segmented image and the initial result was refined using neighbourhood features and geometric features like area to get a more homogenous output.Smaller insignificant objects were merged with their larger neighbours.Accuracy assessment was done using the collected validation points.

Spatial Metric Computation
The relationship of LST derived from Landsat 8 and land cover class-level spatial metrics were investigated using simple linear regression.Preprocessing of the land cover polygons was done in ArcMap 10.3.The study site with an area of 3480 x 3570 meters was divided into 30 x 30 meter grids.Each grid had a 2 meter buffer which was considered as its edge.The land cover polygons were clipped according to each grid and were rasterized afterwards.
The rasters including their edges were then used as input in FRAGSTATS, a spatial analysis tool (McGarigal et al, 2012).It was utilized to derive the class-level spatial metrics of the land cover polygons in the study area.A total of 73 spatial metrics were tested in FRAGSTATS.Table 2. Class-level spatial metrics derived from FRAGSTATS

Statistics
Before proceeding to any type of statistical analysis, checking for the presence of any missing data was done first.Should there be any missing values, the type of missingness of these values should also be known, whether they are either Missing Completely at Random (MCAR), Missing at Random (MAR) or Missing not at Random (MNAR).With this information, it will be known whether the missing values are informative or not.
Little"s MCAR Test was used to determine if the missing values are missing randomly or non-randomly (Little, 1988).Missing not at random (MNAR) implies that the missing values are informative.They carry important information about the data and thus not ignorable.Expectation Maximization (EM) is a method used to address missing data that are MNAR (Dempster et al, 1977).Before proceeding with Multiple Linear Regression, the data needs to satisfy the assumptions for it.The following are the assumptions that need to be satisfied before proceeding (Montgomery et al, 2012): Assumption 1: The dependent variable should be either an interval or ratio variable, that is, at continuous scale.Moreover, there should be two or more independent variables with either at continuous (interval or ratio) or categorical scale (nominal or ordinal) Assumption 2: The residuals should be independent, that is, errors are independent of one another.Independence of errors were checked using Durbin-Watson statistic.Assumption 3: The dependent variable (surface temperature) should be proportional to the independent variables (spatial metrics).Assumption 4: The data must show homoscedasticity.Assumption 5: No multicollinearity must occur.This means that no two or more independent variables must be highly correlated with each other.Assumption 6: There should be no significant outliers.Assumption 7: The residuals or the errors must be approximately normally distributed.
Since spatial data naturally doesn"t conform to three of these assumptions, namely homoscedasticity, multicollinearity, and outliers, these assumptions were ignored.After addressing the missing values, the data was inspected if it satisfied the assumptions for Multiple Linear Regression.When the four assumptions were met, Multiple Linear Regression was utilized.

Land Surface Temperature
Land Surface Temperature in degrees Celsius was derived from the Landsat 8 image.It is seen in the resulting image that the lower left area has generally higher temperatures than the rest of the image.Statistics from the LST image were also calculated and can be seen in Table 3.The LiDAR and orthophoto data was successfully classified with an accuracy of not less than 90% overall accuracy and 85% Kappa coefficient.The resulting classes were agricultural fields, bare land, built-up, grassland, roads, trees, and water.The study area is composed of 44.15% of agricultural fields, 17.15% of trees, 15.26% of grassland, 10.66% of water, 6.19% of builtup, 3.94% of bare land, and 2.65% of roads according to the results.It is observable that the lower left area of the image is highly urbanized as it is mostly classified as either built-up and roads.This corresponds to the same area in the LST image where the concentration of high temperatures is.

Statistical Analysis of Data
There  Using the Little"s MCAR Test, missing values were identified as missing randomly or non-randomly.Since p-value is less than 0.05, which means that it is not significant, we can conclude that the values are missing not at random (MNAR).
Figure 5. Result of Little"s MCAR test.Circled in red shows the significance to be .000which shows the data to be MNAR.
Upon knowing that the missing data is MNAR, Expectation Maximization was done to treat the data.After treating all the missing values the data was checked if it satisfied the assumptions of multiple linear regression.It appeared that more than one assumption was not met, which may result to incorrect or misleading analysis.Some of the predictor variables were not included in the analysis to solve this problem.These predictor variables were the ones which do not pose high importance in predicting the response variable: land surface temperature.
Table 4 shows which of the remaining assumptions (1, 2, 3, 7) are satisfied or not satisfied by each class.Including the said variables in the analysis would still violate the assumption that residuals are independent.This could be due to bias that is explainable by omitted variables.Therefore, we just ignore this violation.After doing Multi Linear Regression, the top predictors per class were determined.Using these top predictors, a linear model for each class was then created to predict LST.Table 5 shows the top predictors as well as the linear model created per class.
Table 5. Summary of all classes with their top predictors and linear models for LST It was found that except for the class Road, all other classes have Effective mesh size as one of their top predictors or their only top predictor.If it is not the only predictor, it still has the biggest weight among the other variables in the linear regression models in each class.This means that Effective mesh size is a very important variable of Land Use Land Cover Classification in predicting LST.Effective mesh size is calculated by the following equation (Jaeger, 2000): The advantage of utilizing effective mesh size in the model is that it is "area-proportionately additive" which means that it characterizes the subdivision of a landscape independently of its size (Jaeger, 2000).

CONCLUSION AND RECOMMENDATION
Land Surface Temperature was derived from a Landsat 8 image although it was not validated due to equipment limitations.LULC classification was also derived from LiDAR and Orthophoto data.The LULC classification resulted in 7 classes namely agricultural fields, bare land, built-up, road, trees, and water.Class-level spatial metrics were derived from the LULC classification.After statistical analysis of the class-level spatial metrics and their relationship with LST, it was found that the spatial metric "Effective mesh size" was the most common variable in predicting LST as it appeared as a top predictor for 6 out of 7 classes, with Road as the only class without "Effective mesh size" as a top predictor.
The thermal data used to derive LST was acquired during a specific period (just before the end of the growing season).This has an effect on the LST since the crops are at the peak of their growth which would imply that the area would generally have lower temperatures.Analysis of the LST of another farming period would provide a better analysis of the relationship between LST and LULC landscape patterns.Aside from checking if effective mesh size remains a predictor for LST for an LST in a different farming period, finding the common predictors for two LSTs from different farming periods will give a more generalized model for predicting LST.With a generalized prediction model of LST based on LULC, better urban planning can be done with respect to the predicted LST to reduce heat-related stress and ailments in the area.

Figure 1 .
Figure 1.Orthophoto of the study site taken in June 2014

Figure 3 .
Figure 3. Land Use Land Cover Classification image with corresponding legend are 10,990 cases with 73 spatial metrics as predictors of surface temperature.Approximately, 16.22% (12 out of 73) of the spatial metrics have incomplete data while approximately 83.78% (61 out 73) of the spatial metrics have complete data.Out of 813,260 values the data has, approximately 8.736% is missing as seen in Figure 4.

Figure 4 .
Figure 4. Distribution of complete and incomplete data in the variables, cases and values.
Table 2 shows the class-level spatial metrics that were derived.

Table 3 .
Statistics of the Land Surface Temperature image

Table 4 .
Summary of which classes satisfy each assumption