USING DEEP LEARNING AND HOUGH TRANSFORMATIONS TO INFER MINERALISED VEINS FROM LIDAR DATA OVER HISTORIC MINING AREAS

This paper presents a novel technique to improve geological understanding in regions of historic mining activity. This is achieved through inferring the orientations of geological structures from the imprints left on the landscape by past mining activities. Open source high resolution LiDAR datasets are used to fine-tune a deep convolutional neural network designed initially for Lunar LiDAR crater identification. By using a transfer learning approach between these two very similar domains, high accuracy predictions of pit locations can be generated in the form of a raster mask of pit location probabilities. Taking the raster of the predicted pit location centres as an input, a Hough transformation is used to fit lines through the centres of the detected pits. The results demonstrate that these lines follow the patterns of known mineralised veins in the area, alongside highlighting veins which are below the scale of the published geological maps.


INTRODUCTION
Detection of geological lineaments is a significant part of regional geological analysis, providing information on local geological structures. Lineaments are a broad category of features, corresponding to mappable linear surface features which may represent a subsurface phenomenon (O'Leary et al., 1976). Traditionally, lineaments were digitised manually from airborne and spaceborne optical imagery or airborne geophysics; however, these methods are time consuming, subjective and potentially unreliable (Masoud and Koike, 2017). In addition to the time and subjectivity issues, in many climates direct fault mapping is challenged by a lack of exposed surface rocks across large geographical extents (Yeomans et al., 2019). To address these issues, much research has been focused on developing semi-automatic methods for lineament detection, from early methods using potential field data (Blakely and Simpson, 1986) to modern MATLAB based toolboxes (e.g. TecLines; Rahnama and Gloaguen, 2014). Semi-automated methods historically have had difficulties with roads and field boundaries, along with vegetation obscuring the ground surface in optical imagery. Using LiDAR data instead of optical data can overcome some of these issues, as shown in Grebby et al. (2012).
In many areas of the world, particularly in post-industrialised nations, the marks of historic mining activity are still visible on the landscape. Rather than using the natural geomorphology to map the structural geology to infer the mineralisation, it may also be possible to infer the mineralisation directly from the mining remains. Furthermore, in some cases data on mine workings and mineralised structures may be lost; therefore, methods such as this can add value. This method also could be used to search along strike for potential shafts that may have been covered or undetected. This paper presents a novel methodology which uses deep learning to detect historic mining remains from LiDAR data, prior to semi-automatically fitting lineaments in the area to infer potentially mineralised features. Herein, we summarise related work that utilises semi-automatic lineament detection and deep learning methodologies. The geology and mining history of the study area in the Dartmoor National Park is briefly outlined prior to detailing the algorithm and processing steps, concluding with the results and recommendations for further work.

RELATED WORKS
Primarily, semi-automatic lineament detection approaches follow a processing workflow of data representation, image enhancement, edge extraction and edge connection (Šilhavý et al., 2016, Masoud andKoike, 2017). The input data format can be an image from an optical multispectral satellite sensor (Soto-Pinto et al., 2013, Rahnama andGloaguen, 2014), a multiview hillshade from a Digital Elevation Model (DEM) (Šilhavý et al., 2016, Masoud andKoike, 2017), a principal curvature image generated from a DEM (Bonetto et al., 2015) or a tilt derivative image generated from airborne geophysics and LiDAR data (Middleton et al., 2015, Yeomans et al., 2019. The input image is then pre-processed to improve its characteristics for edge detection. The techniques used here vary based on the input raster type. Linear features are detected using either object-based image analysis (Middleton et al., 2015, Yeomans et al., 2019, Canny edge detectors (Mallast et al., 2011), Random Sample Consensus (RANSAC) algorithm (Bonetto et al., 2015) or variants of the Hough transform. The Hough transform is an image processing method for detecting lines, originally proposed by Hough (1962) and described in the context of lineament detection by Wang and Howarth (1990). It is robust to line gaps and noise, making it the algorithm of choice for lineament detection in many geological toolboxes such as ADALGEO (Soto-Pinto et al., 2013) and TecLines (Rahnama and Gloaguen, 2014). In general, following the line extraction, the approaches employ some form of postprocessing to improve segment connectivity and reduce noise.
Historic mine workings can cause problems with traditional semi-automated methods due to the anthropogenic modification of the land surface and their lack of linearly connected features. Therefore, the use of a deep learning based method is useful to hone the lineament detection. Deep learning techniques for image processing have advanced rapidly in the last decade, fuelled by increases in processing power and available training datasets. A type of deep neural network, the Convolutional Neural Network (CNN) has become the dominant choice for most image processing tasks (Razavian et al., 2014). Ball et al. (2017) and Zhang et al. (2016) give a review of the applications of these deep learning models to remote sensing problems. However, they conclude that applications using LiDAR data, either applied directly to the point cloud or using an image-like gridded representation, are less frequently studied than applications based on optical data. Many of the published LiDAR based remote sensing applications come from archaeology, where LiDAR is a widely used data source for both human interpretation of heritage landscapes (Bewley et al. 2005, Hesse 2010 and Moyes and Montgomery 2019) and semi-automated site detection based on template matching or traditional "shallow" machine learning methods (Freeland et al., 2016, Sevara et al., 2016and Guyot et al., 2018. Despite the often simple geometry of the sites to be detected, the accuracies of these methods generally cannot approach human levels (Verschoof-van der Vaart and Lambers, 2019). To attempt to improve performance, several recent studies have begun to examine how deep learning methodologies could be used (Trier et al. 2019, Verschoof-van der Vaart and Lambers 2019).
The primary difficulty encountered with most remote sensing deep learning studies is the lack of large domain specific datasets available for model training. Nogueira et al. (2017) give an overview of the relative merits of training a CNN model for remote sensing from scratch versus fine tuning an existing model. Fine tuning is most effective when the source and target domains are similar; therefore, a model which has been trained on the ImageNet database (Deng et al., 2009) of three channel colour images can successfully be fine-tuned for optical three channel colour satellite images (Ren et al., 2018). As LiDAR data is single channel height information, this can be more challenging to fine-tune successfully from colour imagery trained models (Ball et al., 2017, Verschoof-van der Vaart andLambers, 2019). A solution may be found in the planetary and space science field, where large LiDAR datasets such as those from the Lunar Reconnaissance Orbiter (Zuber et al., 2010) can be combined with existing human annotated crater catalogues to generate greater amounts of training data. Silburt et al. (2019) built and trained a successful U-Net based model from these datasets, including publishing the fully trained model, named DeepMoon on GitHub. 1 U-Net was designed originally for biomedical image segmentation by Ronneberger et al. (2015). U-Nets are a popular model architecture choice for problems with limited training data and have achieved good results on remote sensing problems (Bai et al. 2018, Zhao et al. 2019and Jeppesen et al. 2019).

MATERIALS
The case study area chosen for this research is Dartmoor National Park, an upland area of moorland in the southwest of the UK. The predominant vegetation cover is heather, fern, bracken, gorse and marsh grasses. The area has been mined for tin and copper almost continuously from the 12th to the 20th centuries and the remains are pervasive and visually striking throughout the landscape (Newman, 2010). The type of objects to be detected were trial pits, mineshafts, and shallow pit workings. These mining remains are often overgrown and can pose a hazard to humans and livestock. Figure 1 shows how these objects present in the LiDAR data. Dartmoor National Park is underlain by the Dartmoor Granite pluton and is the largest granite pluton exposed at surface (650 km 2 ) within the Early Permian Cornubian Batholith (Scrivener, 2006). The granite is characterised by its peraluminous geochemistry and K-feldspar megacrysts (Simons et al., 2016). The area is variably mineralised and southern Dartmoor is known for tin veins of "black tin" or cassiterite (Dines, 1956). The test area for this study is focussed over the Hexworthy Mine (an amalgamation of Hootens Wheals and Hensroost mines) where the main vein structures trend approximately NNW and subordinate veins course ESE-WNW (Dines, 1956). The area shows demonstrable surface workings and provides an ideal case study site.
The open access LiDAR data used in this study was sourced from the Environment Agency https://environment.data.gov.uk/. It has a resolution of 0.5 m and is provided as 1 km x 1 km ascii grid tiles in either Digital Surface Model (DSM) or Digital Terrain Model (DTM) format (Environment Agency, 2009). The DSM was chosen instead of the DTM as upland short sward vegetation is difficult to distinguish from bare earth in 0.5m LiDAR data (Luscombe et al., 2015) creating challenges for the filtering algorithms. The training, cross-validation and test datasets were generated from three geographically separate areas of Dartmoor National Park, each with high incidences of mining remains. Eleven 1 km 2 tiles were used for training, with a single tile each for cross-validation and testing. The study area and dataset extents are shown in Figure 2.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition)

METHODS
The pipeline proposed in this research contains two modules: the first module detects mining pits using deep learning and the second module fits mineralisation trends to these detections using a Hough transform. Figure 3 shows the processing pipeline. The training data was generated by creating a multilayer GIS for each area which included historic maps, aerial imagery and multiple visualisations of the LiDAR data. These layers were used to aid the human interpreter to manually digitise a dataset of over 1,500 mining pits. The data was then exported as 256 x 256 pixel image patches with the pit locations as corresponding .xml labels. The images were exported with 52% overlap to avoid losing pits at the boundaries of images and to ensure every grid square had two prediction values. As the processing area is large and the elevation differences are often subtle, the images were initially exported with their full 16-bit float values for each pixel despite the CNN model requiring 8-bit inputs. The image patches were then rescaled individually to the 0-1 range before being remapped to 8-bit integer format. For human interpretation, different LiDAR data visualisations have been shown to greatly enhance interpretation (Kokalj and Somrak, 2019). Following the workflows described in Kokalj and Hesse (2017) the additional data representations of Simplified Local Relief Models (SLRM) and the measures of positive and negative relief openness, calculated as the angular size of a sphere looking either up or down at each pixel location (Doneus, 2013) were generated from the exported tiles using the Relief Visualisation Toolbox (Kokalj and Somrak, 2019). Figure  4 shows how these different visualisations look to a human. Due to the limited training data for this specific problem, the deep learning strategy chosen is fine tuning an existing model from a similar domain. Initially, an object detection model based on the Inception architecture (Szegedy et al., 2016) pretrained on the Microsoft Common Objects in Context dataset (Lin et al., 2014) was chosen, however, detection rates remained below 40% across the fine tuning hyperparameter range. This is hypothesised to be due to the greater differences in source and target data types. A closer match can be found from the DeepMoon model, developed by Silburt et al. (2019), as detecting lunar craters from orbital LiDAR is a very similar task to detecting mining pits from aerial LiDAR. Alongside the dataset similarities, the model architecture is more appropriate, as very deep modern models such as Inception do not generally perform as well as simpler models such as U-Nets when training data is limited.
For the transfer learning strategy, fine-tuning the model whilst keeping the final layer intact was chosen, as the final segmentation categories are geometrically if not conceptually the same. When choosing the fine-tuning learning rate and number of epochs (full passes through the training dataset), multiple different models were generated and assessed against the crossvalidation dataset. The best results were obtained when all model weights were unfrozen and the training was run over four new epochs, each containing 520 images with a learning rate of 10 -4 . Between epochs, random mirroring, rotating and shifting augmentation transforms were carried out. All training was carried out in Python 3.6 using TensorFlow (Abadi et al., 2015) and Keras (Chollet, 2015) using code adapted from Silburt et al. (2019). Alongside the validation metrics output from the TensorFlow console, manual assessment of 5 particularly challenging cross validation images was used to verify the epoch and learning rate choices. Figure 5 shows the model clearly begins to overfit after 4 epochs.
To determine the most appropriate data representation, the accuracies of the DSM, SLRM and openness visualisation types were examined using the cross-validation dataset, as shown in Figure 6. For the purpose of geological line fitting we hypothesise that precision should take precedence over recall, as noise from false positives may have greater negative impact than missed detections. To test this theory, the positive openness representation model and the DSM representation models were selected for further processing. The positive openness model has the highest precision and the second highest recall, whereas the DSM model has the second highest precision and the highest recall. The DSM model also exhibits a higher overall F1 score. The lower scoring representations of SLRM and negative openness were not processed further. The full area mask forms the input to the geological line fitting module. In this module the merged raster layer is pre-processed in Python using OpenCV (Bradski, 2000) to improve its characteristics for line fitting. A thresholding algorithm is applied to maintain only the pixels with a probability above 0.6 of belonging to the pit class. This removes some of the artefacts at image boundaries and also limits the amount of incorrect predictions and noise shown in the image. As it is easier to fit lines to dots rather than rings, the background is filled with white using a simple flood filling algorithm, which colours all connected pixels with the specified new colour. This step removes the rings leaving just the centres. For the final preprocessing step, the image is inverted back to a black background to maintain consistency. These pre-processing steps are shown in Figure 7. To fit the lines, an interactive Hough transform program was created to allow the user to control the parameters of the transformation whilst viewing the fitted lines. This allows for suitable settings for the Gaussian blur filter, the edge enhancement filter and the Hough transform itself to be varied and their effects visualised. The Hough transform is sensitive to the specific geometry of an dataset, therefore, rather than set the parameters for the test dataset based on empirical assessment for each test image, as described in Rahnama and Gloaguen (2014) the interactive step allows the method to be easily used with multiple datasets of varying properties. This choice introduces compromises related to higher subjectivity and lower automation; however, it improves generalisation and usability at the proof of concept stage. As can be seen in Figure 8, the essential trends do not change despite different settings, only the number and density of the extracted segments differ. This allows the user to adjust the detection to noise ratio appropriately. After appropriate settings are chosen, the lines are converted from image to map coordinates and exported as georeferenced coordinate pairs. The lines can then be imported into a GIS software package for further visualisation and analysis such as bearing calculations.

RESULTS AND DISCUSSION
Firstly, the result analysis evaluates the accuracy of the deep learning module for detecting the mining pits. Full description of the model's performance, evaluation criteria and results on multiple datasets are found in Gallwey et al. (2019). The accuracy metrics for both the DSM data representation and the positive openness data representation are given in Table 1. These were ground-truthed during a field visit to the test area. It can be seen that whilst the F1 scores are higher for the model trained using the DSM data representation, the model trained using positive openness has higher precision, making it less noisy for line fitting. As the linear mineralised trends are typically made up of over 10 individual pits, the lower recall score may prove acceptable in this context. Figure 9 shows the predicted pit locations from the positive openness model, overlaid with the true pit locations. It can be seen that a cluster of pits left of centre have been missed by the algorithm; on the site visit these pits were the shallowest in the area, indicating that the model performs adequately for detecting the larger pits associated with more activity and therefore greater mineral concentrations.   Again, the differing data scales proved challenging, with scale related imprecisions noticeable in the BGS data when viewed at 1:5,000 due to a resolution of 50m at 1mm line thickness. Figure  11a shows the BGS data alongside higher resolution probable mineral vein locations, digitised manually from the LiDAR data. It can be seen that several smaller linear features are not present on the BGS layer, along with a deviation in angle on the southern end of the main north-south vein. Figure 11b shows the automatically extracted lines from the positive openness representation plotted against the BGS data. It can be seen that the general trends are positive, with the algorithm picking up several line angles more precisely than the 1:50,000 layer, but that it does not extend far enough in many instances. For the additional mineral vein locations inferred in Figure 11a, two were picked up by the algorithm, and two were missed. It is hypothesised that as the algorithm is fitting lines to densities of detected pits, the shorter line segments are due to the CNN not detecting a large enough cluster of points at the extremities of the lines, leading to missed sections. This can be attributed to the lower recall of the positive openness predictions. Another factor is that neither CNN model was trained to detect trenches that do not contain pits; the two missed east-west veins are primarily trenches containing very few pits, likely the cause of the missed line detections. Figure 11c shows the results from the lines automatically extracted using the predictions from the DSM representation. There are many more detected lines and the result appears noisier than that shown in Figure 11b, although the more southerly east-west trench missed by the BGS data has been picked up.

CONCLUSIONS
The geological lines generated using this technique correlate with the trends of the well-known lineaments in the Dartmoor area, both those semi-automatically extracted from LiDAR data by Yeomans et al. (2019) (Figure 8) and those published by the British Geological Survey (BGS) in their 1:50,000 mapping products (BGS, 1995). The results using the positive openness representation provide cleaner results when viewed on a map; however, the results from the DSM representation are more successful at detecting missed lines at high resolutions and show greater directional agreement on the half-rose plot. Further work to incorporate trench identification into the deep learning model would improve detection accuracy, alongside further refinements of the Hough transform parameter selection process. This preliminary work demonstrates that the lines produced from this technique can aid geological interpretation in regions of historic mining activity, particularly where records have been lost or are incomplete.