BUSH ENCROACHMENT MONITORING USING MULTI-TEMPORAL LANDSAT DATA AND RANDOM FORESTS

It is widely accepted that land degradation and desertification (LDD) are serious global threats to humans and the environment. Around a third of savannahs in Africa are affected by LDD processes that may lead to substantial declines in ecosystem functioning and services. Indirectly, LDD can be monitored using relevant indicators. The encroachment of woody plants into grasslands, and the subsequent conversion of savannahs and open woodlands into shrublands, has attracted a lot of attention over the last decades and has been identified as a potential indicator of LDD. Mapping bush encroachment over large areas can only effectively be done using Earth Observation (EO) data and techniques. However, the accurate assessment of large-scale savannah degradation through bush encroachment with satellite imagery remains a formidable task due to the fact that on the satellite data vegetation variability in response to highly variable rainfall patterns might obscure the underlying degradation processes. Here, we present a methodological framework for the monitoring of bush encroachment-related land degradation in a savannah environment in the Northwest Province of South Africa. We utilise multi-temporal Landsat TM and ETM+ (SLC-on) data from 1989 until 2009, mostly from the dry-season, and ancillary data in a GIS environment. We then use the machine learning classification approach of random forests to identify the extent of encroachment over the 20-year period. The results show that in the area of study, bush encroachment is as alarming as permanent vegetation loss. The classification of the year 2009 is validated yielding low commission and omission errors and high k-statistic values for the grasses and woody vegetation classes. Our approach is a step towards a rigorous and effective savannah degradation assessment. * Corresponding author The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-2, 2014 ISPRS Technical Commission II Symposium, 6 – 8 October 2014, Toronto, Canada This contribution has been peer-reviewed. doi:10.5194/isprsarchives-XL-2-29-2014 29


INTRODUCTION
Land degradation takes place in all agro-ecological zones threatening some 1.5 billion people (Nater, 2011).Desertification, a specific kind of land degradation, occurs mainly, but not exclusively, in dryland regions and is affecting some 1.9 billion hectares of land word-wide and 250 million people (Mishaud, 2004).The various definitions of, and perspectives on, desertification will not be repeated here but their sheer existence is indicative of the fact that it is still difficult to identify a unifying explanation of the causes of desertification (e.g.Schwilch, 2011;Sommer, 2011;Reynolds et al., 2011;Verstraete et al., 2011).
One of the most seriously affected regions in the world is South Africa (SA) with ~440,000km2 vulnerable to some extent, according to the United Stated Department of Agriculture (Reich et al., 2001).A number of unsustainable land use practices, which result in habitat and land degradation in the dry land areas can be blamed, including expansion of rain fed cultivation onto unsuitable lands; soil mining and shortening of fallow periods; overgrazing, and uncontrolled harvesting of biomass.Through desertification, soil in SA has lost 25% or more of its fertility and the process is ongoing (De Beer et al., 2005); moreover, large scale erosion and desertification have led to food insecurity in several areas.Other regional effects include: disruption of the surface water balance; reduced carbon sequestration and release of carbon through soil erosion; impacts on regional climate through changes in the evaporation ratio, roughness, albedo and increased atmospheric dust loads (Reynolds and Stafford Smith, 2002).Meadows and Hoffman (2003) identify six areas as the most severely degraded in the country, including large areas of the North West Province, and conclude that these are likely to become even more susceptible under predicted climate change scenarios.
In view of the far-reaching consequences of land degradation and desertification (LDD) in SA and the large areas that are said to be affected there (Reich et al., 2003), there is a need for inventories and monitoring at the regional to country scales using consistent, objective, repeatable, and spatially explicit measures (Prince, 2004;Prince et al., 2009).Objective measurement of degradation for large areas has, however, proved extremely difficult, mainly due to multiple criteria and the lack of reliable methods (Prince, 2002;WMO, 2005).Existing maps such as the USDA NRCS 1:35,000,000 Desertification Vulnerability map of Africa (Eswaran and Reich, 2003), or even the larger scale land degradation map of the 226 local municipalities of SA by Meadows and Hoffman (2003), all depend on coarse resolution soils maps and indicate vulnerability to degradation, rather than actual degradation.
A particularly important issue in assessing and monitoring LDD is to gain an overview about affected areas, and to connect the large scale with regional and local processes.Earth observation (EO) data are of considerable value in the context of monitoring environmental processes.With the history of operational EO sensors reaching back over four decades, they allow retrospective analysis of the state and development of ecosystems at different scales and with different spatial coverage.Remote sensing data adhere to the principles of repetitiveness, objectivity and consistency, which are prerequisites in the frame of monitoring and surveillance (Hill et al., 2004).Consequently, observatories based on EO satellite data and additional information have repeatedly been suggested, with a view to serving requirements of policy-making, planning and land management (Group of Earth Observation, 2005).
Remote sensing data, along with geocomputation techniques, have substantially contributed to correcting various 'myths' surrounding the desertification process and have provided tangible items, such as soil and vegetation properties (Hill et al., 2008).Nevertheless, there is no indicator of degradation that is directly inferable from satellite-based data.Suitable indirect indicators need to be defined, which can be related to processes of erosion, salinisation, increase of flammable vegetation volume, etc. (Perez-Trejo, 1994 andVerstraete, 1994).Secondly, land degradation essentially operates in the time dimension and can be conceptualized as a pathological process of multi-annual land cover dynamics (Prince, 2002).Correspondingly, indicators need to be derived for a sequence of time steps and the time dimension needs to be incorporated into the analysis (Gutman, 1999 andLu et al., 2004 ).Moreover, the choice of LDD indicators should always be based on the fact that they are scale-specific (Geeson et al., 2002) as well as site-specific, if they are to best describe the dynamics of the region in question.
Over the last years, bush encroachment is increasingly being monitored using EO data.The vast majority of studies have employed Landsat data due to the archive reaching back to the beginning of the 1970s.Classification techniques have varied from pixel-based (e.g.ML) to object classification (Vogel and Strohbach, 2009).More recently, machine learning classifiers have evolved that can achieve high land cover classification accuracies, such as classification trees (CT), artificial neural networks (ANN), support vector machines (SVM) and random forest (RF).Random forest (RF) is a machine learning classifier that is not commonly used in land remote sensing and has not been evaluated thoroughly by the remote sensing community compared to more conventional pattern recognition techniques (Rodriguez-Galiano et al., 2012).The most important advantages of RF are their non-parametric nature, their high classification accuracy and their capability to determine variable importance (Rodriguez-Galiano et al., 2012).Rodriguez-Galiano et al. (2012) applied RF to classify 14 different land categories in a Mediterranean environment (Spain) and concluded that RF is highly accurate and robust to training data reduction and noise.In a different study, Rodriguez-Galiano and Chica-Rivas (in press) evaluated the performance of four different machine learning classifiers, namely CT, ANN, SVM and RF, in south Spain and found that RF was the most accurate algorithm and second most robust to noise and data reduction after SVM.Moreover, Mellor et al. (2013) tested the performance of RFs in an operational setting for large area (7.2 million hectares) sclerophyll forest classification in the state of Victoria, Australia and found very high overall accuracy (96%) and kappa statistic (0.91) for a forest/non-forest In African environments, RF classification and multitemporal Landsat imagery have been successfully employed to map land cover in Zanzibar (Knudby, et al., 2014), Madagascar (Grinand, et al., 2013), However, RF classification has not been employed so far to map southern African savannah land cover types.
Within this context, the present study aims to monitor bush encroachment-related land degradation in a savannah environment in the Northwest Province of South Africa.For this purpose, we employ random forests and multi-temporal Landsat data spanning 20 years to investigate the performance of the methodological approach and identify the extent of encroachment over the 20-year period from 1989 to 2009.

Study area
The study area is the part of the Landsat scene with path = 173 and row = 78 that falls within the Dr Ruth Segomotsi Mompati District Municipality (formerly known as Bophirima District Municipality), which is one of the four districts of the North West province of South Africa.Temperatures range from 17° to 31 °C in the summer and from 3° to 21 °C in the winter.Annual rainfall totals about 360 mm, with almost all of it falling during the summer months, between October and April (Wikipedia, 2014).The geology of the area consists mainly of sandy soils and the lithology consists of sedimentary rocks dating back to the Quaternary, sandstone, limestone, conglomerates and alluvium deposits (State of Environment Report, 2002).

Datasets
The Landsat data used are shown in Orthorectification of the available images was first performed using a viewing-geometry approach.The viewing-geometry and block adjustment model implementing Toutin's approach (Toutin, 1994) for ortho-rectifying images was employed.
In order to produce radiometrically consistent images that can be compared to each other, image calibration was then applied.The images were normalised to a reference image, as a reliable correction to absolute reflection units is not possible.The following three calibration steps were undertaken in the radiometric correction procedure: 1) Top-Of-Atmosphere (TOA) reflectance calibration (also called sun angle and distance correction); 2) Bi-directional Reflectance Distribution Function (BRDF) calibration, and 3) terrain illumination correction (Wu et al. 2004).

Random forest classification:
Ground-truth data showing the location and extent of representative bush and nonbush land cover classes to train the land cover mapping process were derived from the colour aerial imagery (75% used for training and 25% for validation).We followed the NGI national land cover mapping nomenclature to map 6 classes: 1. Shrubs and bushes 2. Graminoids (herbaceous) 3. Graminoids 4. Standing artificial water bodies 5. Non-perennial pans 6. Urban The RF classification code used is in R and is freely available by the Center for Biodiversity & Conservation (http://biodiversityinformatics.amnh.org/index.php?section=R_ Scripts).The script reads an ESRI shapefile with training polygons and then randomly selects a user-determined number of samples from each land cover type.A multispectral image is also input.For each sample, the data values for that polygon are determined and these are then used to run the Random Forest model.After building the model the multilayer image is read and the land cover type is predicted for each pixel.The output classified image is in GeoTIFF format.
The classification process was an iterative one: the RF output images were reviewed and the algorithm was re-run with a new set of training sites when that was deemed necessary.1960), were estimated.

RESULTS
The resulting RF land cover classifications for the four scenes are shown in Figure 2.There is a steady increase of the shrubs and bushes, especially in the western part of the study area.This increase is taking place as the graminoids are becoming less and less.This is also demonstrated in the graph of Figure 3, which shows that over the course of the 20 years of the study period:  there has been a steady and rapid increase in the area covered by shrubs and bushes from ~58% in 1989 to ~67% in 2009; and  there has also been a subsequent decrease in the area covered by graminoids from ~41% to~ ~33%.
Figure 3. Change in area covered by bushes and grasses The validation results in Table 2 show that the RF classifier yields high users and producers accuracies for all classes, overall accuracy (91%), as well as k-statistic figures (k=0.89).The only classes that are difficult to map accurately due to their spectral similarities are the non-perennial pans and the urban areas.However, this doesn't affect the findings of this study with regards to the encroachment of woody plants in areas covered with graminoids.

DISCUSSION
Our results corroborate the findings of previous field studies in the Northwest Province region (Mampholo, 2006), which show that bush encroachment is as alarming as permanent vegetation loss.The accuracy assessment performed on the 2009 results show high accuracy figures for all classes, with an overall accuracy of 89% and an overall kappa of 0.87.However, the accurate assessment of savannah degradation through bush encroachment using Earth Observation (EO) data and techniques remains a formidable task due to the fact that on the satellite data, vegetation variability in response to highly variable rainfall patterns might obscure the underlying degradation processes (Vogel and Strohbach, 2009).

CONCLUSIONS
Land degradation and desertification are affecting large areas of savannah in South Africa and bush encroachment has been identified as one of the causes.EO data and techniques can be used to monitor woody plant encroachment and here we suggested a methodological framework for doing so, using an area in the Dr Ruth Segomotsi Mompati District Municipality of the Northwest Province as a study case.Using multitemporal Landsat data and random forest classification, we found that, over the 20 years of the study period, woody plant encroachment was increasing steadily and rapidly in the expense of graminoids.
Further work is currently underway in order to:  carry out extensive fieldwork to assist in the identification of specific types of encroaching bushes;  use a fuller set of Landsat data consisting of at least one scene per 2 years, and  extend the study area to cover the entire Northwest Province.

Figure 1 .
Figure 1.The study area, mainly within the Dr Ruth Segomotsi Mompati District Municipality of the Northwest Province

Figure 2 .
Figure 2. Land cover classifications of the four Landsat scenes

Table 1 .
Acquisition date, sensor and source of Landsat scenes Validation was carried out for the 2009 classification as this is the date that coincides with the aerial imagery.A total number of 350 random points was distributed across the scene with a minimum number of 50 points allocated to the smallest class (i.e.urban) to ensure that an adequate number of samples was used for the assessment of every class.Contingency matrices, omission and commission errors, overall classification accuracies and overall kappa indices (Cohen

Table 2
. Accuracy assessment for the RF classification of the 2009 image.S&B: Shrubs and bushes; G(H): Graminoids