ON THE DEVELOPMENT OF A NOVEL APPROACH FOR IDENTIFYING PERENNIAL DRAINAGE IN SOUTHERN BRAZIL: A STUDY CASE INTEGRATING SENTINEL-2 AND HIGH-RESOLUTION DIGITAL ELEVATION MODELS WITH MACHINE LEARNING TECHNIQUES

Riparian vegetation plays a key role in maintaining water quality and preserving the ecosystems along riverine systems, as they prevent soil erosion, retain water by increased infiltration, and act as a buffer zone between rivers and their surroundings. Within urban spaces, these areas have also an important role in preventing illegal occupation in areas of hydrologic risk, such as in floodplains. The goal of this research is to propose a framework for identifying areas of permanent protection associated with perennial drainage, utilizing satellite imagery and digital elevation models (DEM) in association with machine learning techniques. The specific objectives include the development of a decision tree to retrieve perennial drainage over high resolution, 1-meter DEM’s, and the development of digital image processing workflow to retrieve surface water bodies from Sentinel-2 imagery. In-situ information on perennial and ephemeral conditions of streams and rivers were obtained to validate our results, that happened in the first trimester of 2020. We propose a minimum of 7 days without precipitation prior to in-situ validation, for more accurate assessment of streamflow conditions, in order to minimize impacts of surface water runoff in flow regime. The proposed method will benefit decision makers by providing them with reliable information on drainage network and their buffer zones, as well as yield detailed mapping of the areas of permanent protection that are key to urban planning and management. * Corresponding author


Research Background
Riparian vegetation plays a key role in maintaining water quality and preserving the ecosystems along riverine systems, as they prevent soil erosion, retain water by increased infiltration, and acting as a buffer zone between rivers and their surroundings. Growing cities and the sprawl of urban areas can pose serious threats to the environment, especially in developing countries where the need for natural resources must coexist with sustainable development and the use of resources. Under the Brazilian Forest Code, a piece of legislation on land use and management first established in 1965 and redesigned in 2012, perennial drainage represented by springs, creeks, streams, and rivers must have its surrounding riparian vegetation preserved, in order to maintain water quality, geologic stability and biodiversity, aside from preventing soil erosion (Law 4771/65, 1965;Law 12651/12, 2012). This buffer zone is known as Area of Permanent Protection (APP), and it is protected by federal law, thus its preservation is imposed by the constitution. Occupying such spaces is illegal, and the land cover alteration can result in the application of heavy fines as well as having the right of occupation revoked. Celesc S.A is an electricity utility company, established in the 1950's in southern Brazil. To this day, Celesc S.A provides electricity to approximately 3 million homeowners in Santa Catarina, roughly 42% of the entire population of that state (Informa CELESC, 2020). Despite its strong history and tradition in Santa Catarina, Celesc S.A faces challenges every day. In recent years, Celesc has been having legal issues related with providing electricity for new and established consumers in areas of permanent protection. For the most part, the type of protected areas Celesc has been infringing are related with water resources, in which the forest code establishes permanent protection for perennial drainage. Intermittent and ephemeral water resources do not fall under the category of permanent protection. Federal prosecutors from Santa Catarina have accused Celesc of ignoring environmental laws, stating that they provide electricity for consumers that inhabit protected areas. However, there is no existent cartographic base, to this day, that represent all the APPs described in the forestry code. The constitution is confusing on who's responsibility is for identifying and mapping these areas. In some developed cities, urban planners and GIS professionals have the materials and the skills for mapping these areas, however, that is not the case in most municipalities in Santa Catarina, where close to zero geospatial information is existent on APPs. This situation led Celesc S.A to propose a research and development (R&D) project to develop a method for identifying and mapping APPs in Santa Catarina. This paper describes the methodology developed for identifying perennial streams and water bodies in Santa Catarina, utilizing remote sensing datasets and machine learning algorithms.

Research Problem
Identifying and mapping perennial drainage network at a fine scale in a country with continental dimensions such as Brazil is a highly time-consuming task, and yet, such mapping is essential for the sustainable development of the nation. This study focuses on four municipalities in the State of Santa Catarina, located in the Southern region of Brazil. Santa Catarina has 295 municipalities and it is the home of about 7 million people, with most of the population living nearby the Atlantic coastline. Most of Santa Catarina's cities do not have detailed digital cartographic information, relying, in most cases, on coarse, small scale, and relatively old cartographic maps that were produced for the entire country.

Research Goal, Objectives
The goal of this research is to propose a framework for identifying areas of permanent protection associated with perennial drainage, utilizing satellite imagery and digital elevation models (DEM) in association with machine learning techniques. These data and techniques will be used to identify and map riverine buffer zones at high resolution in four municipalities in the state of Santa Catarina: Chapecó, Jaguaruna, Joinville, Otacílio Costa. The specific objectives include the development of a decision tree to retrieve perennial drainage over high resolution, 1-meter DEM's, and a workflow for the digital image processing and spectral modelling to retrieve surface water bodies from Sentinel-2 imagery.

Significance of this Study
This study contributes with identifying landscape features that are prone to permanent protection according to federal law, focusing in water resources. As a pilot project, its results and applicability have the potential for upscaling it for the entire state of Santa Catarina, as 1-meter resolution is provided for all its area, as well as Sentinel-2 imagery. Having detailed information on APPs is crucial for having sustainable and existent social-economic conditions.

Study Area
The study area comprehends four municipalities in the state of Santa Catarina, located in southern Brazil. These cities were chosen based on the various landscape components that surrounds them, including mangroves and restinga vegetation for the cities located in the shoreline, and mountainous and rugged terrain in the hinterland. The municipalities are: Chapecó, Jaguaruna, Joinville and Otacílio Costa (Figure 1). The state of Santa Catarina has 295 municipalities and hosts approximately 7 million people (Oliveira Rocha, Aurélio Leite, Agripino Sagaz, Machado Mibielli & Rotta Furlanetti, 2018). Most of the population in Santa Catarina lives in urban centers near the Atlantic coastline. Jaguaruna and Joinville are located within the Atlantic watershed, that drains towards the Atlantic Ocean at East. Chapecó and Otacílio Costa meanwhile, are located within the Uruguay river basin, draining westward towards the River Plate. The climate is characterized by welldistributed rainfall year-round, with accumulated totals of approximately 1500 millimeters in Jaguaruna and 2000 millimeters in Chapecó. The four municipalities are heterogenous and contrast among them, from the vegetation to the form of the terrain. In the shoreline, the landscape is marked by broadleaf-evergreen perennial rainforest, as well as sedimentary-quaternary plains, with the presence of natural lakes such as those seen in Jaguaruna. In the Uruguay river basin, the vegetation becomes semi-deciduous, losing its leaves partially in the autumn/winter seasons. The terrain gets rugged, as rainfall events carve through the basalt underneath Chapecó and Otacílio Costa (Nunes, 2009).

Materials
This study utilizes Sentinel-2 imagery at 10-meter resolution, including 4 multispectral bands within the visible and near infrared spectrum: blue (centered at 490 nanometers), green (centered at 560 nanometers), red (centered at 665 nanometers), and near infrared (centered at 842 nanometers). It also makes use of high-resolution DEM provided by the Secretaria de Estado de Desenvolvimento Econômico e Sustentável (SDS/SC), a branch of the state government in Santa Catarina responsible for the sustainable development and economic growth of that state. The following paragraphs will be dedicated in explaining the materials utilized and the methodology proposed.

Sentinel-2 Multispectral Satellite Imagery
Sentinel-2 is a constellation of two twin satellites with a multispectral payload capable of obtaining 13 multispectral bands at various resolutions, from visible and near infrared (VNIR) to shortwave infrared (SWIR). The constellation is operated by the European Space Agency (ESA), with its primary goal of land monitoring, focusing in vegetation, soil and coastal areas (Sentinel-2 -Missions -Sentinel Online, 2020). This study utilizes Senitnel-2 imagery obtained from the USGS/EarthExplorer platform, provided with radiometric and geometric corrections along with ortho-rectification to provide highly accurate geolocated imagery. The images are provided at level 1C, meaning they are corrected for top-of-atmosphere reflectance. Sentinel-2 imagery utilized in this study are of 10meter resolution, within the VNIR spectrum.

Digital Elevation Models
The DEM utilized in this research is a product derived from airborne digital photography obtained between 2010 and 2011, in an effort made by the State government for mapping the State of Santa Catarina at a scale of 1:10.000. This dataset was processed in stereopairs in order to create DEM's and digital surface models (DSM). Considering their pixel size, of 0,39 centimetres, the product of that process yielded a DEM and a DSM of 1-meter spatial resolution.

Waikato Environment for Knowledge Analysis -WEKA
We utilize the Waikato Environment for Knowledge Analysis, also known as WEKA to train and execute the Classification and Regression Tree (CART) algorithm, utilized to yield a classification scheme as a decision tree. WEKA is a suite of machine learning algorithms first created in 2000 from a team of researches and scientists from the Waikato University. WEKA is a popular platform to perform basic machine learning tasks, such as decision trees, naïve Bayes, support vector machine (SVM), neural networks, etc. (Witten, Frank and Hall, 2011). In this research, we utilize WEKA to train and execute the J48 algorithm, an open-source alternative to the canonical C4.5 developed by Quinlan and its associates (Quinlan, 1986).
2.2.4 ENVI 5.5 + IDL 8.7.2 ENVI + IDL software and the programming language Python and IDL is used in this research to perform the integration of machine learning libs with raster processing workflows. ENVI software and its IDL programming language interface are used to classify raster data, as well as to execute decision trees classification schemes to transform continuous to discrete raster data.

METHODS
The methodology will be described in the following paragraphs.

Digital Image Processing of Sentinel-2 Imagery
This research utilizes spectral modelling of Sentinel-2 imagery to create customized spectral indices, designed to maximize feature-extraction such as perennial water bodies and perennial streams and rivers. These indices were developed using traditional remote sensing methods and techniques and are based on canonical literature.

Spectral Indices
The normalized difference vegetation index (NDVI) first appeared in scientific literature in 1973, in the work done by Rouse et al. (Rouse et al., 1973) for monitoring vegetation conditions in the great plains of the United States of America. NDVI falls within the realm of the vegetation Indices (VI's), a group of spectral indices created to analyze and measure vegetation conditions from remotely sensed data. Other common utilized VI's include: Soil Adjusted Vegetation Index (SAVI), Enhanced Vegetation Index (EVI), etc. For its most part, vegetation indices rely on utilizing the contrasting spectral signature of leaves within the red and near infrared wavelengths, where in most cases, a healthy leaf tends to absorb red incident radiation and reflect with higher intensity in the near infrared spectrum (Lillesand, Kiefer and Chipman, 2015). NDVI is utilized in this research for maximizing feature extraction of open-skies water bodies. Formula 1 illustrates the equation to produce NDVI from Sentinel-2 bands, where B4 is the red band and B8 is the near infrared band, both at 10 metre spatial resolution.
The normalized difference water index (NDWI) was introduced in remote sensing first by Gao, in its novel approach for analyzing water in vegetation from remotely sensed observations (Gao, 1996). Gao's index focuses in modelling spectral responses at the two near infrared channels, one centred at 0.86 nm and the other centred at 1.24 nm. A variation from Gao's index was proposed by McFeeters (1996), aiming to delineate water bodies from remotely sensed imagery. The difference between these two indices is that the first on is designed to analyze water content inside trees canopies, while the latter one is focused in identifying open-sky water bodies such as lakes, rivers, streams, etc. McFeeters NDWI is the product of a division between the ratio of green light and near infrared light. Formula 2 below illustrates the equation designed to produce NDWI from Sentinel-2 imagery, where B3 is the green band, and B8 the near infrared band, both at 10 metre spatial resolution.
The output of these formula is presented as a grey-scale raster file, ranging from -1 to 1 floating point. A coefficient of 0.13 is applied in the MOD_NDWI, empirically tested as to remove noise such as shadows from trees and buildings. A combination of MOD_NDWI and NDVI can accurately and sharply identify water bodies with <= 10 meters, considering Sentinel-2 spatial resolution.

Digital Image Processing of Digital Elevation Models
Combined with the information obtained from processing Sentinel-2 imagery, we propose the utilization of DEM's, and sub products derived from image processing and modelling, representing surface, morphometric, and hydrological processes. These sub products, here named as attributes, contain valuable information that can be used by a classification scheme, in order to identify given features on a surface of a study area.

Attribute Modelling
The following attributes were produced from the original DEM dataset for data mining, that is used for algorithm training and generate a decision tree classification scheme. All attributes, both hydrological and morphometrical were produced on TerraHidro plugin from TerraView platform, an open source GIS software library (Rosim et al., 2003).

Hydrologic Attributes
From a wide range of possible hydrological attributes, contributing area showed to be more prominent, as it represents in a simplistic yet accurate manner the flow of water from a surface area; upstream to downstream. This attribute represents the total area of all the cells above-stream that converge to a given cell downstream, in a flow-direction scheme represented in a local drainage direction (Rennó et al., 2008).

Morphometric Attributes
Three morphometric attributes were chosen to integrate the stack of bands utilized for the data mining process describe in the 3.2.2. These attributes were chosen due to their significance in the classification produced by the decision tree algorithm. Formulas 3, 4, and 5 bellow illustrate their formulation.
These attributes were merged into a 4-band raster layer, including the hydrological attribute contributing area and the morphometric attributes described above. This 4-layer raster form the basis for the data mining and its information is utilized by the decision tree algorithm to produce the classification scheme.

Data Mining and Algorithm Training
We utilize the machine learning platform WEKA to create a decision tree utilizing the J48 algorithm, designed to identify and map perennial streams, rivers and springs from the DEM provided by the State of Santa Catarina. By combining two distinct data sources, multispectral satellite imagery and highresolution DEM, data mining can be split into two parts: identification of open surface water bodies utilizing Sentinel-2 and modelled spectral indices, and the identification of perennial streams and springs utilizing DEM's and its byproducts. Because of its importance, the training samples were obtained in a process of image interpretation with the aid of ground-truth georeferenced data, collected in-situ in field campaigns. The aggregation of the spectral indices helped reducing spectral correlation of the targets utilized in the training process, increasing the predictability of the model and reducing error.

RESULTS
By combining information from DEM's and Sentinel-2 imagery, we were able to obtain a representative drainage network, that includes open surface water bodies and small streams and creeks, that otherwise would not be identified from the satellite imagery only, considering the spatial resolution (<10 meters) of the imagery considering the streams may be hidden under the trees canopies. Figure 2 illustrates a Sentinel-2 image modelled to MOD_NDWI and NDWI. The greyscale image in the background represents crude NDWI, whereas the images displayed in the red boxes depict an overlay of two layers: NDWI in shades of light blue, and MOD_NDWI in shades of yellow. Water bodies are clearly delineated using this technique, and the presence of noise is minimized when using empirical coefficients.

Classification of Perennial Stream Network
The decision tree produced by Weka is illustrated in figure 3. It is the result of attribute modelling and data mining described in previous sections. The decision tree represents the attributes that better suit the classification scheme, i.e. it represents the threshold values identified as important for determining whether a stream is perennial or ephemeral. In the decision tree below, E represent pixels that are to be classified as ephemeral, P perennial, and attributes CA, MC, PC and VC represent attributes contributing area, mean curvature, plain curvature and vertical curvature respectively.  Arrows indicate differences existent in these two datasets.

Accuracy Assessment
Accuracy assessment for the estimated perennial drainage is illustrate below. The metrics are presented in the means of confusions matrices, overall accuracy, and false positive and false negative testing. For clarification, SDS/SC represents the existing cartographic base for the Sate of Santa Catarina, and it is regarded as the official state geographic base. Estimated drainage represents the estimated drainage identified from the methodology described in this paper. Reference data was obtained from field work, observing a period of at least 7 days without precipitation to minimize impacts of surface water runoff in the streams and rivers visited.

Confusion Matrices
Results are presented in the plots below. Each count was obtained from points randomly distributed within the city limits. The determination of ephemeral or perennial is qualitative, e.g. whether there is stream-flow or not. However, in many cases, it was necessary to use an auger to verify stream-flow underneath topsoil. In other cases, stream-flow was minimal to nonexistent, upstream from a given point, with resurgence at another point downstream, elucidating its perennial condition with underneath flow.  The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII- B3-2020, 2020 XXIV ISPRS Congress (2020 edition)

Plotting Metrics: Assessment of Results
Below are illustrated the overall accuracy as well as an assessment of false positives and false negatives identified for each municipality.

DISCUSSION
Having accurate, detailed information on natural resources on an area of development, is of top-priority for decision makers and urban and territorial planners. Remote sensing has been providing for decades a feasible yet accurate method for mapping with detail many aspects of the landscape, including land use and land cover, geology and mineralogy, water resources, vegetation cover and plant health, etc. Machine learning can leverage data analysis and data-extraction, by utilizing powerful computational capacity with state-of-the-art algorithms, designed to, among many other things, sort and classify data. This research proposes a method for identifying and mapping perennial drainage network over four municipalities in Santa Catarina State. Perennial drainages are protected by law and have a buffer zone designed to maintain geologic stability and water quality; thus, it is vital that this type of drainage is mapped with accuracy throughout the State of Santa Catarina. The results indicate that the proposed methodology can be applied for mapping, with some degree of confidence, perennial drainage in the forms of streams and water bodies, though not necessarily more detailed than the official base. The proposed methodology can be of great value for identifying and mapping water-related APPs for all the 295 municipalities in Santa Catarina. This information would greatly benefit territorial and urban planners, preventing illegal occupation of such areas as well as the preservation of natural resources. The methodology is almost entirely based on freely available and existent datasets, such as Sentinel-2 imagery and the high-resolution DEM provided by SDS/SC. The existent cartographic base on water resources has a great deal of value, as it represents with detail springs, streams, and rivers. However, this base does not have the level of accuracy necessary for unrestricted use, as the in-situ validation shows. The biggest issue with SDS/SC base lies on determining the flow-regime of its mapped streams and springs, elucidated by high percentual of false negatives, indicating the occurrence of perennial streamflow in locations where the official base did not. This methodology has limitations, however. First, it integrates two datasets of different nature: DEM's and satellite imagery. Although both are represented as a raster surface, their spatial resolution is different, so the final cartographic scale of this product would have to have the scale of the coarserand latter one. Moreover, the DEM dataset is the product of an overflight realized in 2010 and 2011, so there will be differences between the surface represented in the DEM and the surface represented in the real-world to date, even though changes in topography happen slower than in vegetation phenology, for instance.

CONCLUSIONS
This paper summarizes a pilot project developed between Celesc S.A and Caruso, a public-private partnership established in the means of a R&D project designed to develop a new method for identifying perennial drainage, that is vital for determining areas of permanent protection under the Brazilian Forest Code (12.651/12). We propose a method utilizing geographic information systems and machine learning algorithms to extract accurate, to date cartographic information on water resources for the study areas, that comprehends four municipalities in Santa Catarina state. The research also evaluated and compared the results obtained from the proposed method with official cartographic bases, with in-situ validation. The methodology proposed could be an alternative for obtaining accurate information on water resources, that could be used to enrich to existent and official cartographic base. Determining stream-flow regimes of water bodies, streams and rivers, directly interferes and impacts land management and planning, such as deciding whether an area should be considered of permanent protection or not. Further research is necessary to identify threshold values for the decision tree classification scheme, as the performance of the classification will directly impact the existence and dimension of APPs. Data mining is an interactive process, and the better the miner, the better the results. Other attributes could also improve the classification accuracy, by taking in consideration other aspects from the landscape. The DEM utilized in this research is a by-product of stereoscopic techniques over digital aerial photography, thus its representation may contain errors due to geometry and technical issues. LiDAR (light detection and ranging) DEM is likely to present better results, especially when generating hydrologic attributes, as LiDAR cloud points tend to better represent the imaged surface.