AUTOMATED ROAD BREACHING TO ENHANCE EXTRACTION OF NATURAL DRAINAGE NETWORKS FROM ELEVATION MODELS THROUGH DEEP LEARNING

High-resolution (HR) digital elevation models (DEMs), such as those at resolutions of 1 and 3 meters, have increasingly become more widely available, along with lidar point cloud data. In a natural environment, a detailed surface water drainage network can be extracted from a HR DEM using flow-direction and flow-accumulation modeling. However, elevation details captured in HR DEMs, such as roads and overpasses, can form barriers that incorrectly alter flow accumulation models, and hinder the extraction of accurate surface water drainage networks. This study tests a deep learning approach to identify the intersections of roads and stream valleys, whereby valley channels can be burned through road embankments in a HR DEM for subsequent flow accumulation modeling, and proper natural drainage network extraction. * Corresponding author


INTRODUCTION
Aside from cartographic purposes, accurate hydrographic data is an important component of hydrologic modeling, ecosystem analysis and flood forecasting, among several other important tasks (Poppenga, Gesch, and Worstell, 2013).The National Hydrography Dataset (NHD) is a comprehensive vector database of surface water features for the United States that is managed by the U.S. Geological Survey (USGS) and partner organizations, including the Environmental Protection Agency and various state and private organizations (U. S. Geological Survey, 2000).Within the conterminous United States, the high resolution (HR) NHD is a multi-scale data set of hydrographic features comprised from the best available data sources having scales of 1:24,000 or larger.In places of recent collection, HR NHD content is accurate and up to date, but in many areas the content must be updated to better support hydrologic applications.
Improved computational power and geoprocessing capabilities are enabling a growing number of geospatial applications that use HR geospatial data, such as the NHD.Updating and maintaining the HR NHD to support such applications is a challenging task.This paper describes methods that are being tested to automatically update the HR NHD using HR data that are either available or in the process of being collected for the entire country.HR data sets useful for updating HR NHD include lidar point cloud, lidar-derived HR bare earth DEM data from 1 to 3 meter (m) cell resolution, and HR image data.
Flow-direction and flow-accumulation modeling can furnish surface-water drainage lines from DEM data (O'Callaghan and Mark, 1984;Jenson and Dominigue, 1988;Tarboton, Bras, and Rodriguez-Iturbe, 1991;Montgomery and Foufoula-Georgiou, 1993;Maidment, 2002;Passalacqua, Tarolli, and Foufoula-Georgiou, 2010;Passalacqua, Belmont, and Foufoula-Georgiou, 2012).Given proper validation, such methods can help update HR NHD content (Poppenga, Gesch, and Worstell, 2013).However, it has been noted that added details included in HR DEM data present obstacles such as roads and bridges that obstruct flow accumulation routes from the proper natural drainage pattern (Poppenga et al., 2010;Lindsay and Duhn, 2015;Yuan et al., 2017).It is possible to use existing vector transportation or other data to automatically create breaches in elevation models where embankments exist for features such as culverts or bridges, and thereby improve subsequently derived drainage models (Waller et al., 2015;Maderal et al., 2016).But these methods rely on accurate and complete transportation data, which is not available in all places in the United States, particularly for unpaved roads in rural areas.Alternatively, automated methods do exist to detect and breach infrastructure embankments in a DEM prior to extracting hydro features, but these methods are not exact and only provide a partial solution (Poppenga et al., 2010;Lindsay and Duhn, 2015).
Another possible solution is to extract road features from image or other data.Clode et al. (2007) demonstrated a workflow to extract vector road features from lidar point cloud data, which uses a hierarchical classification technique, and reported classification accuracy values in the 75 to 85 percent range.Samadzadegan, Bigdeli, and Hahn (2009) tested a variety of combinations of classification techniques on lidar elevation and intensity data to delineate roads in an urban area, of which the best accuracy was about 87 percent.Sameen and Pradhan (2017) applied a deep convolutional neural network that included two networks-an encoder and decoder-to delineate road features in very high resolution (13-cm resolution) 3-band orthophotos.In this case, the Exponential Linear Unit (ELU) Commission IV, ICWG IV/III activation function was optimized with the Stochastic Gradient Descent (SGD) algorithm to produce road extraction accuracy of 88.5 percent (Sameen and Pradhan, 2017).
Deep learning or hierarchical learning is a class of machine learning that applies statistical techniques to learn feature patterns in multiple raster data layers, allowing subsequent identification of similar patterns in other areas within similar data sets.More specifically, deep learning is the application of an artificial neural network (ANN) that uses more than one hidden layer of nodes (or neurons) to recognize patterns in data (Buscombe and Ritchie, 2018).Wang et al. ( 2016) reviewed various techniques for extracting roads from remotely sensed images, and they describe an ANN as a supervised classification method, inspired by biological neural systems, that uses a computational model composed of a network of connected nodes (or neurons).A full description of ANN methods is provided by Basheer and Hajmeer (2000).
In this paper, we test the use of deep learning methods to extract road and drainage valley features from elevation data.The learning algorithm is trained using a set of existing roads and stream valleys, and then the model is used to predict where all and stream valleys exist in the elevation data.If results are satisfactory, the extracted intersections of roads and stream valleys could be applied to breach embankments at these locations using a least-cost approach similar to methods used by Poppenga et al. (2010) and Lindsay and Duhn (2015).In addition, extracted roads may be used to update the national transportation database.

METHODS
In this work, we apply a deep neural network approach using TensorFlow TM , an open source machine learning software library furnished by Google for high performance computational research.Several raster datasets derived from HR DEM data, including slope, aspect, curvature, and topographic position index (TPI), are being investigated, but initial tests described in this paper are focused on TPI.TPI is determined as the difference between a point elevation value and the local average elevation within a specific radius or within a surrounding window of cells (De Reu et al., 2013).The TPI exaggerates local lows and highs in a DEM relative to the nearby topographic features, and thus accentuates ridges and valleys.For this work, TensorFlow TM is implemented through Python and the Keras application programming interface (API).Data processing is completed on a 12-node Linux cluster, each node having 20 processing cores and 128 gigabytes of RAM.Data are stored on a parallel shared Lustre file system in a highspeed Infiniband network, which provides rapid access to files.

Study Area and Data
The Panther Creek watershed, NHD 10-digit Hydrologic Unit Code (HUC) watershed 0710000708, was selected for initial testing.This watershed encompasses roughly 170 square kilometers in a low relief agricultural area in central Iowa, within the Eastern Great Plains Ecological Division (Comer et al., 2003).Topographic data for this study are derived from airborne lidar point cloud data from the USGS 3D Elevation Program (3DEP).Lidar data for this site are Quality Level 3 [> 0.5 aggregate nominal pulse (pls) density, pls per square meter (m 2 ); Heidemann, 2018] collected in 2008.A DEM with a 3-m nominal cell size was derived from the lidar data.Road embankments and bridges were manually breached to create continuous valleys in the DEM wherever such intersections could be clearly distinguished.Within the Panther Creek watershed, the elevation ranges between 266.6 and 330.3 m, with a mean and standard deviation of slope of 3.7 and 5.5 percent rise, respectively.A TPI raster dataset was computed from this DEM using a 9x9 window (Figure 1).TPI values within the watershed range from -4.47 to 3.54 m.Training data for the neural network are generated from vector features from the USGS National Map.The Census roads from the transportation layer are used to train for road patterns, and the HR NHD flowlines features were initially used to train for stream valley patterns (Figure 1).However, as can be seen in Figure 1

Deep Learning Test
An initial workflow for a convolutional neural network (CNN) was tested to learn and predict road and drainage valley patterns from a single layer: the TPI layer.The workflow begins by rasterizing the road stream valley vectors to the same resolution as the TPI (3 m).Following this, 500 pixels are randomly selected from both the road and stream raster datasets.
An additional 500 pixels are randomly sampled from areas that do not correspond to either road or stream.All sampled pixels are more than 20 pixels from the edge of the watershed.

PRELIMINARY RESULTS AND DISCUSSION
Preliminary results predicting road and valley pixels for the study area from TPI using the tested CNN are shown in Figure 3.This model accurately predicts greater than 90 percent of the test pixels.A visual comparison of the Census roads and NHD flowlines (Figures 1 and 2) to the predicted roads and valleys (Figure 3) generally indicates good matching between the training networks and predicted networks.It is noted that some obvious anomalies exist in the predicted data, such as the swaths of missing predicted cells in the southern section of the watershed, and the scattering of small clusters of road or valley cells away from the main paths.Furthermore, the road and valley paths in the predicted pixels are wider than in the training data, but a thinning and vectorization technique, such as described by Zahn (1993), should furnish more precise feature delineations.This cursory review of the initial deep learning test has furnished promising results.However, testing of several model enhancements are underway for which results will be thoroughly assessed.
Aside from automating the thinning and vectorization process, several tasks remain to be tested that could improve results.Yuan, F., Larson, P., Mulvihill, R., Libby, D., Nelson, J., Grupa, T., and Morre, R., 2017.Mapping and analysing stream network changes in Watonwan River watershed, Minnesota, USA.International Journal of Geo-Information, 6(369), 20 pp. Zhan, C., 1993.A hybrid line thinning approach.Proceedings of Autocarto and American Society for Photogrammetry and Remote Sensing Conference,Bethesda,MD,

Figure 1 .
Figure 1.Topographic Position Index for the Panther Creek watershed in central Iowa.Census road lines and highresolution National Hydrography Dataset flowlines are overlain in red and bright cyan, respectively.In the large-scale panel on the right, roads are manually breached to connect valleys in the DEM.
, the HR NHD flowlines do not precisely follow the stream valleys in the elevation data.Imprecise selection of training patterns can adversely impact classification results.Therefore, a set of elevation-derived drainage lines were extracted from the 3-m DEM using the open source GeoNet tools(Passalacqua et al., 2010; Sangireddy et al., 2016)  with a 50,000-cell threshold for forming the flow accumulation skeleton.As expected, the resulting network included some erroneous drainage lines caused by road and bridge embankments (where not breached in the DEM).To eliminate improper drainage lines within training vectors, the extracted drainage lines were automatically conflated to the HR NHD flowlines using a Coefficient of Line Correspondence (CLC) tool(Stanislawski et al., 2015).The CLC process uses a raster line-density differencing technique to estimate linear features in one dataset that match or mismatch the linear features in another dataset, where both datasets represent a similar set of features.In this case, matching lines are within areas that are 95 percent likely to have the same line density in both 10-m resolution line-density raster datasets.Subsequently, the elevation-derived drainage lines that match the HR NHD flowlines were used to train for stream valleys (Figure2).Although not completed for this preliminary test, further improvement to selection of the training set of drainage lines could remove features within a buffer around road features where erroneously extracted drainage lines may yet exist.

Figure 2 .
Figure 2. Topographic Position Index for the Panther Creek watershed.Census road lines are overlain in red.Drainage lines derived from 3-m resolution elevation data which match the high resolution NHD flowlines are overlain bright cyan.
For each sampled pixel, a 41x41-cell window, centered on the sample pixel, is extracted from the TPI layer to form a sample pattern.An additional three windows are generated by rotating the extracted window 90, 180, and 270 degrees.Thus, a total of 2000 sample patterns are generated for each feature type (road, stream, and other), making a total of 6,000 sample patterns.The 41x41-pixel window was deemed an size to represent the target features based on visual interpretation.The 6000 sample windows are used to train, validate, and test a CNN.Two thirds of the samples are used for training, and the final third is split in half between validation and testing.Two CNN are constructed, one for each relevant feature type (roads and valleys).These models consist of two sets of convolutionpooling pairs, followed by a 1,024-node dense neural network and single-class classification via softmax, which is trained under binary cross-entropy.Convolution layers use a rectified linear unit (ReLU) activation function to identify smaller, more recognizable patterns in each window, and pooling layers combine the outputs of the convolutions together.The validation patterns are used to tune the two CNNs.The accuracy of the CNN is measured by comparing predicted values with the actual values for the 1000 test patterns.After training and testing the CNN, the model is used to predict road, valley, and other pixels for the entire TPI dataset.
Applying additional data layers into the model, such as slope, aspect, curvature, and high-resolution image data could substantially enhance these results.Slope, aspect, and curvature are easily derived from the DEM, and 1-m resolution National Agricultural Imagery Program (NAIP) images are readily available from USGS.Furthermore, adjustments to the configuration of the CNN could be tested, along with adjustments to window size for training patterns and the use of additional techniques to refine selection of training sample pixels.

Figure 3 .
Figure 3. Road and stream valley pixels for Panther Creek watershed in central Iowa predicted from Topographic Position Index using a convolutional neural network.In the large-scale panel at the right, Census road lines and high-resolution National Hydrography Dataset flowlines are overlain in red and bright cyan, respectively.
Finally, the Panther Creek watershed is roughly 170 square kilometers with varying conditions for road and valley patterns.Subdividing the watershed into smaller partitions could limit the variability of the data and produce training patterns that are more precise within a data partition.Development of a process to subdivide the datasets in a manner that includes sufficient training data in each partition is needed.Korean Journal of RemoteSensing, 33(4), pp.423-436.doi.org/10.7780/kjrs.2017.33.4.8.Sangireddy, H., Stark, C.P., Kladzyk, A., and Passalacqua, P.,  2016.GeoNet: An open source software for the automatic and objective extraction of channel heads, channel network, and channel morphology from high resolution topography data, Tarboton, D. G., Bras, R. L., and Rodriguez-Iturbe, I., 1991.On the extraction of channel networks from digital elevation data./nhd.usgs.gov/chapter1/chp1_data_users_guide.pdf.Wall, J., Doctor, D.H., Terziotti, S., 2015.A semi-automated tool for reducing the creation of false closed depressions from a filled LIDAR-derived digital elevation model.Proceedings of the 14 th Multidisciplinary Conference on Sinkholes and the Engineering and Environmental Impacts of Karst, pp.255-262, Rochester, MN, 5-9.Wang, W., Yang, N., Zhang, Y., Wang, F., Cao, T., and Eklund, P., 2016.A review of road extraction from remote sensing images.