GIS-BASED RAPID EARTHQUAKE EXPOSURE AND VULNERABILITY MAPPING USING LIDAR DEM AND MACHINE LEARNING ALGORITHMS: CASE OF PORAC, PAMPANGA

Disaster risk reduction and management (DRRM) not only requires a thorough understanding of hazards but also knowledge of how much built-up structures are exposed and vulnerable to a specific hazard. This study proposed a rapid earthquake exposure and vulnerability mapping methodology using the municipality of Porac, Pampanaga as a case study. To address the challenges and limitations of data access and availability in DRRM operations, this study utilized Light Detection and Ranging (LiDAR) data and machine learning (ML) algorithms to produce an exposure database and conduct vulnerability estimation in the study area. Buildings were delineated through image thresholding and classification of the normalized Digital Surface Model (nDSM) and an exposure database containing building attributes was created using Geographic Information System (GIS). ML algorithms such as Support Vector Machine (SVM), logistic regression, and Random Forest (RF) were then used to predict the model building type (MBT) of delineated buildings to estimate seismic vulnerability. Results showed that the SVM model yielded the lowest accuracy (53%) while logistic regression and RF models performed fairly (72% and 78% respectively) as indicated by their F-1 scores. To improve the accuracy of the exposure database and vulnerability estimation, this study recommends that the proposed building delineation process be further refined by experimenting with more appropriate thresholds or by conducting point cloud classification instead of pixel-based image classification. Moreover, ground truth MBT samples should be used as training data for MBT prediction. For future work, the methodology proposed in this study can be implemented when conducting earthquake damage assessments.


INTRODUCTION
Disaster risk reduction and management (DRRM) requires an understanding of risk, hazards, and vulnerability and their interrelations amongst each other. Geographic Information System (GIS) provides a powerful tool that allows DRRM experts to map, visualize, and analyze the interrelations among the three elements of DRRM. However, DRRM-related mapping efforts will only be effective if sufficient and reliable information is available. It remains a challenge for concerned government agencies and organizations to collect this information. Apart from being time-consuming, the data collection phase of DRRM operations often entails large monetary costs and manpower requirements. Oftentimes, DRRM initiatives stop during data collection phase and do not carry on with the analysis of the data collected due to resource constraints; in some cases, data collection activities are terminated prematurely due to budget constraints (Torres et al., 2019).
The wide range of available remote sensing data at present allows the possibility to derive necessary information for disaster risk management in a more efficient and cost-effective manner, therefore eliminating the challenges mentioned above. Remote sensing has become an operational tool in DRRM due to the significant amount of data it can provide (van Westen, 2000). This study will use remote sensing tools and methods for applications in earthquake preparedness. For this purpose, Light Detection and Ranging (LiDAR) was used to generate data on building infrastructures for the conduct of a reliable seismic vulnerability assessment. Related studies on building detection often used remote sensing products such as Synthetic Aperture Radar (SAR) and high resolution aerial images (Tupin & Roux, 2003;Geiß et al., 2014). These data products can be challenging to process because they either require high processing power (as in the case of SAR) or are not readily accessible to the public as with high resolution aerial images. Torres et al. (2019) highlighted in their study that using LiDAR data has its advantages for the purpose of building detection since it has a higher spatial resolution compared to SAR and requires a much lower processing power. LiDAR data and open-source LiDAR processing tools are also becoming more accessible to the public. LiDAR serves as a useful remote sensing tool for capturing building footprint data-not just its location but its attributes as well. Moreover, machine learning methods provide a means to automate data creation. Related studies have demonstrated how machine learning can be used for seismic vulnerability estimation (Geiß et al., 2015;Torres et al., 2019;Ningthoujam & Nanda, 2018). With LiDAR and machine learning combined, DRRM operations can be streamlined so that the resourceintensive mapping process of exposed and vulnerable infrastructures may be partially automated. Thus, this paper aims to: (1) implement a rapid and highly automated method for earthquake exposure and vulnerability mapping using remote sensing and machine learning tools; and (2) to demonstrate the usefulness of LiDAR data in the DRRM field, specifically for earthquake preparedness.
LiDAR-derived surface models were used to map and describe exposed building infrastructures in the study area. Different machine learning techniques were then used to estimate the seismic vulnerability of each building footprint by classifying each according to model building type (MBT) (i.e., whether reinforced concrete or masonry). However, the unavailability of ground-truth MBT samples presented a limitation to this study as this affected the accuracy of the models produced. Nonetheless, the proposed methodology illustrates how LiDAR and machine learning techniques can streamline DRRM operations and help communities in the Philippines be more prepared for earthquakes.

Seismic vulnerability assessment
Availability of earth observation (EO) data has improved the quality of earthquake damage maps that can be produced. However, single-source EO data is often not sufficient for producing accurate maps. Dell'Acqua and Gamba (2012) noted that data fusion of a priori vulnerability models or geophysical information to EO data can increase the quality of generated maps. Furthermore, methods proposed for seismic vulnerability mapping often require considerable historical data which are challenging to collect and often limits the geographical scale of damage assessment to just a small study area (Polli et al., 2009).
A new trend of data collection in the DRRM field is through crowdsourcing and crowd mapping which involves volunteers manually digitizing building footprints. With the increasing number of areas being covered by satellite imaging platforms, crowdsourcing of exposure and vulnerability information has become the current trend of collecting granular data for DRRM applications. However, this is heavily reliant on public participation and requires costs on data distribution systems which eventually raises issues on timeliness, accuracy, and costeffectiveness (Dell'Acqua and Gamba, 2012). On the other hand, recent developments in remote sensing technologies provide a more cost-effective means of mapping built structures and are also able to collect data more frequently at larger extent. Therefore, remote sensing may help overcome data challenges and limitations in conducting seismic vulnerability assessments.

LiDAR for building detection
An up-to-date building footprint database is essential in earthquake risk management. Remote sensing data can aid in the regular updating of footprint databases to ensure accurate and timely outputs of risk assessments. Torres et al. (2019) used LiDAR data due to its high resolution and its capability of deriving information that are vital for seismic vulnerability assessments. For instance, obtaining building height information is a challenging task due to the limited availability of 3D data (Torres et al., 2019).
Processing tools for LiDAR datasets are becoming more accessible as well. Ramiya et al. (2017) developed a segmentation-based methodology for building detection using an open-source library and found that the results are comparable with the output from the Terrasolid commercial software. Machine learning algorithms also provide more efficient and cost-effective methods for LiDAR data processing. With machine learning, processing of large-scale point clouds can be done automatically. Zhou & Gong (2018) used deep neural networks to extract residential buildings from LiDAR point clouds. The increasing availability of open-source and automated LiDAR tools have also led to LiDAR's wider usage in the disaster risk studies. For instance, Sarp et al (2014) integrated high resolution orthophotos with a LiDAR-derived nDSM and conducted image classification to detect buildings and structural damages after the Van Erciş earthquake in Turkey. Similarly, Zhou, Gong, and Hu (2019) used LiDAR point clouds to detect building damages after a typhoon event.
To produce accurate footprint data from automated processes, multiple remote sensing datasets are often combined-fusion of SAR and optical imagery (Tupin & Roux, 2003;Geiß et al., 2014), multi-sensor multispectral data (Fan et al, 2019), or integration of LiDAR data and multispectral images. However, building detection solely based on LiDAR data is also possible. Huang et al. (2018) detected buildings using surface characteristics and penetrating capacities of buildings derived from LiDAR point clouds. Similarly, Ma (2005) processed LiDAR point clouds to generate surface models and detected buildings by extracting points having a specific height and belonging to a certain planar area. Otherwise, a highly accurate building detection is often achievable only with very highresolution imagery (VHRI). Femin & Biju (2020) were able to develop a highly accurate building detection method from satellite images from publicly available VHRI from SpaceNet; however, VHRIs are often commercial and not available from open-source platforms in the Philippines. Furthermore, building attributes would still need to be sourced elsewhere even when building areas are successfully detected from VHRI.

Building vulnerability estimation
A classification of structural building types is one indicator of seismic vulnerability for each building. When an exposure database containing footprint geometry is readily available, vulnerability estimation can be easily conducted with machine learning algorithms. Geiß et al. (2015) implemented a method for MBT estimation using a more rigorous manner by considering a wider range of variables such as spatial and spectral features from multi-sensor images and in situ observations. The number of these variables were then reduced using filter-based algorithms before they are used as variables in the classification algorithms. By using a set of multi-sensor images, more information can be captured regarding the characteristics of a building. However, this also requires more complex processing. Similarly, Torres et al. (2019) estimated the MBTs of buildings in Lorca, Spain based on calculated building attributes such as roof slope, building height, footprint area, and shape complexity. Instead of classifying buildings according to their MBTs, Ningthoujam & Nanda (2018) implemented a multiple regression analysis on the footprints in their test study area to estimate the damage grade of the existing reinforced concrete buildings which have already been mapped. The independent variables such as building age, number of storeys, construction quality, and maintenance condition were used to estimate the damage grade of each footprint which were then projected in a GIS to produce a vulnerability map. These related studies show the value of implementing machine learning techniques in seismic vulnerability assessments.

STUDY AREA
Earthquake exposure and vulnerability mapping was done to two selected barangays in Porac, Pampanga-Manibaug Paralaya and Manibaug Libutad. Porac was selected as a study area because of the recent earthquake event observed in the municipality. On April 22, 2019, a 6.1 magnitude earthquake shook the Central Luzon region (PHIVOLCS, 2019). The observed intensities of the earthquake varied per city/municipality and those located in the province of Pampanga significantly experienced many earthquake fatalities.
The municipality of Porac experienced an Intensity VI (very strong) earthquake then. Thus, the municipality is worth mapping for the purpose of this study. Barangays of Manibaug Paralaya and Manibaug Libutad were selected as areas of interest specifically because built-up structures are observed to be relatively more concentrated in these areas than in other barangays in the municipality. The relatively dense built-up area in these two barangays is assumed due to their proximity to the provincial capital (Angeles City, Pampanga).

METHODOLOGY
In line with the objective of this study, all data used are solely remote sensing data. Moreover, the processes were automated as much as possible to achieve a rapid exposure and vulnerability mapping process.

Data
The main datasets used in this study were the LiDAR-derived digital terrain model (DTM) and digital surface (DSM) model as collected from the Phil-LiDAR program. Additionally, orthophotos were made available as well. The Phil-LiDAR program is an expansion of the University of the Philippines Disaster Risk and Exposure Assessment for Mitigation (UP DREAM) program which conducted LiDAR data collection for flood hazard and natural resource mapping applications. The program was conducted in two phases (Phil-LiDAR 1 and Phil-LiDAR 2) from 2014 to 2017 and covered several provinces in the Philippines as study areas. Therefore, available LiDAR data from Phil-LiDAR was limited to the pre-2019 earthquake event in Porac. Figures 2a and 2b show the DTM and DSM of the study area.

Methods
The methodology as implemented in this study consists of two parts: exposure mapping and vulnerability estimation (Fig. 3).
Exposure mapping was conducted in a GIS environment while vulnerability estimation was done using WEKA 3.8.5, an opensource data mining and machine learning software.

Exposure mapping:
Initially, the DTM and DSM were used to generate a normalized DSM (nDSM) which is the main input for the building delineation process. The nDSM is calculated by subtracting the DTM from the DSM. The nDSM was then filtered twice using masks generated through thresholding-firstly, of DTM-derived terrain ruggedness index (TRI) raster and secondly, of nDSM-derived height values. Related studies also conducted rigorous thresholding of the nDSM using TRI and height for building extraction (Villanueva et al., 2015;Ekhtari, 2009). TRI, defined as the difference in elevation between a cell and those adjacent to it, was used to eliminate tree features since buildings generally have a flatter surface while trees have more rugged surface. The height filter was then applied to the TRI-filtered nDSM to eliminate the ground features such as roads, crop lands, and shrubs. These two filters significantly reduced the features in the nDSM which also reduced the probability for misclassification between buildings, vegetation, and ground classes.
The filtered nDSM was segmented to conduct an object-based image analysis (OBIA) and training samples of the three classes were collected. Support Vector Machine (SVM) was then used to classify the nDSM. The classification output was postprocessed using a Majority Filter to remove any single isolated pixels before converting it into vector polygons. Building footprints were then extracted from the vectorized classification output and were simplified and smoothened as deemed fit. Building attributes were then calculated for each footprint. These attributes calculated are relevant for vulnerability estimation and based on the attributes used in the study of Torres et al. (2019). The final output for this part is an earthquake exposure database.

Vulnerability estimation
: earthquake vulnerability of the study area was estimated by predicting the MBT of the delineated buildings. In this case, buildings were classified into two: masonry or reinforced concrete. Masonry buildings are more vulnerable to earthquakes while reinforced concrete buildings are more resilient. The MBT prediction was done using machine learning algorithms implemented through the WEKA software.
The training/test dataset was created through collection of MBT sample points using Google Street View. Since there were no means of conducting ground survey, MBT of sample points were identified mainly through visual inspection, thus, is largely prone to error. Moreover, other MBTs may exist in the study area which are not easily discernible from the Google Street View. The MBT samples were then spatially joined to the delineated building footprints so the MBT was added as a new attribute to the exposure database. Buildings with unidentified MBT are then predicted using machine learning techniques.
Prior to implementing the machine learning for MBT prediction, four highest ranked building attributes from the exposure database were selected using the ReliefF algorithm; the selected attributes were used as explanatory variables for the MBT prediction. Three machine learning algorithms were then implemented to create an MBT prediction model; SVM, logistic regression, and Random Forest (RF). Building features with known MBTs were used as training/test dataset and a 10 k-fold cross validation was used in running the algorithms. The model with the highest accuracy and reliability was then used for predicting the MBT of all building features in the exposure database. The MBT attribute field which previously only contained MBT classification of the sample points is then updated with the MBTs predicted by the machine learning model.

Exposure mapping
The nDSM was filtered twice using threshold values firstly of TRI and secondly of height. Thresholds are set after several trials of filtering the nDSM using different values. Finally, the combination of TRI and height filters yielded the best results. In other areas, different threshold parameters may be more applicable. TRI was used to eliminate trees and retain buildings which have a flatter surface, thus, only pixels with TRI<0.75 were retained. Figure 4a shows the TRI-filtered nDSM which significantly eliminated tree features. Figure 4b shows a closeup of how TRI was able to significantly reduce tree features.  The TRI-filtered nDSM was filtered further using height values. A one-storey building, by standard, is commonly about 2 to 3 meters high. Thus, a 2.5 m threshold value was set. Figure 5a shows the height-filtered nDSM which eliminated ground features and retained pixels with height>2.5 m. Ground features which now have pixel values of zero are colored in gray. Figure  5b shows croplands eliminated in the height-filtered nDSM. The filtered nDSM was used as input for image classification of building, vegetation, and ground classes. After the classification, building features were extracted, simplified, and smoothed.
There are a total of 2,125 building features delineated in the survey area. The building delineation process performed fairly in sparse areas and for buildings with large areas (see Figure 6a and 6b). It is also able to detect building structures covered by vegetation although buildings with highly irregular geometry was produced (see Figure 6d). The delineation method did not perform well in dense areas and tends to merge buildings located too closely to each other (see Figure 6c). Overall, the building delineation process was effective in detecting locations of infrastructures in an urban area but was not accurate enough for generating granular per footprint features.

Figures 6. Delineated building features.
After the building footprints have been delineated, building attributes were calculated using GIS tools to complete the exposure database. Geometry attributes-centroid XYs, area, and perimeter were easily calculated in GIS. The zonal statistics tool was used to obtain building heights from the nDSM. The number of storeys per building and their respective rise type were calculated and classified based on building heights. Lastly, the shape complexity was calculated using the Shape Complexity Index tool available from the Whitebox Tools GIS software package. Shape complexity values range from 0 to 1; as the shape of the polygon becomes more complex, the value approaches 1 (Lindsay, 2020). Table 1 lists the attributes calculated and the description of each field.

Vulnerability estimation
Earthquake vulnerability was estimated by predicting the MBT of the buildings in the study area. Geiß et al. (2015), Torres et al. (2019), and Ningthoujam & Nanda (2018) reported the capability of machine learning methods for this specific purpose. Following their works, this study has conducted MBT prediction by implementing SVM, logistic regression, and RF classification. Prior to creating a prediction model, a training/test dataset was created by collecting sample points of MBTs identified through visual inspection in Street View. Figure 7 shows a map of the collected sample points. A total of 262 sample points were identified which is composed of 168 masonry and 94 reinforced concrete buildings. However, only 158 of these sample points intersected and were spatially joined with the delineated buildings from the exposure mapping conducted. The final training/test dataset is composed of 98 masonry and 68 reinforced concrete buildings. Table 2 shows the distribution of the MBT samples collected.  The 158 buildings with known MBT were loaded into the WEKA software to create MBT prediction models. To reduce the dimension of available data which can be used as explanatory variables for MBT prediction, the ReliefF algorithm was used to select only the four highest ranked attributes from the eight attributes (excluding MBT) in the exposure database. Based on the results of the ReliefF attribute selection, building height, shape complexity, XY coordinates, and perimeter were the highest ranked and were therefore used as explanatory variables in the prediction models.

FIELD NAME DESCRIPTION
SVM, logistic regression, and RF were implemented using a 10 k-fold cross validation. F-1 measure and kappa statistics were computed to assess the accuracy of each model and compare the reliability of the three models amongst each other, respectively (see Table 3). McHugh's (2012) interpretation of kappa values was used as the basis for interpreting the kappa measures of the three models.  Table 3. F-1 and kappa statistics of three MBT prediction models.

ML
SVM is the least accurate model with a 0.53 F-1 score and a 0.081 kappa which implies that there is no agreement between the training and test data. Logistic regression performed relatively well with a 0.724 F-1 score and a 0.409 kappa which can be interpreted as a weak level of agreement. Among the three, the RF model yielded the highest F-1 measure of 0.779 with a fair kappa statistic of 0.525. Furthermore, the confusion matrices of the three models (see Tables 4a, 4b, and 4c) were also analyzed to assess the performance of each model. A comparison of the confusion matrices of logistic regression and the RF models shows that the latter performed better than the former. Since the RF model produced the highest accuracy and performed best out of the three models, it was then used to predict the MBTs of all 2,125 delineated buildings in the exposure database. Figure 8 shows the map of predicted MBTs in the study area. Based on the results of the MBT prediction, the model not only predicted the MBTs of buildings with unknown MBT classification but also detected misclassifications from the training/test dataset. As expected, most buildings in the two barangays are masonry. Clustering of reinforced concrete buildings was also observed in the central part of the study area.

Masonry
It should be noted that since the training/test dataset used contains a high level of uncertainty, the prediction results must be further validated. Results of the model can be further improved if: (1) the training/test dataset used were more accurate (e.g. sourced from authoritative agencies or collected through ground survey); and (2) building geometries were accurately delineated since shape complexity is one of the explanatory variables used in the models. Nonetheless, the rapid exposure and vulnerability mapping procedure implemented in this study proved that it could address challenges in data collection and risk assessments in the DRRM field, specifically for earthquake preparedness.

CONCLUSION
This study was able to successfully implement a rapid and highly automated methodology for exposure and vulnerability mapping. The whole process, excluding the time spent for experimentation, can be done by one personnel in less than 2 days. Implementing the proposed methodology in a larger area may require a few more days of processing. The study was also able to show how remote sensing and machine learning tools can help in solving problems and challenges on data availability and collection.
In this study, LiDAR data was used as a primary dataset which was proven useful in detecting buildings and was faster and less tedious to process than when optical and radar RS were utilized instead. LiDAR was also very useful in deriving relevant information on earthquake exposure and vulnerability.
Inaccuracies in the generated exposure database and vulnerability maps are apparent. Nonetheless, outputs still provided a good generalization of earthquake exposure and vulnerability of the study area. While results may not completely replace the level of accuracy of ground surveys, the methodology proposed in this study will significantly aid in streamlining data collection processes in the DRRM projects. The exposure and vulnerability maps produced through the rapid method proposed in this study may also be used to identify areas that would need further onsite field investigation.

RECOMMENDATIONS FOR FUTURE WORK
There are many aspects in the rapid mapping methodology developed in this study that need to be further improved. Firstly, to improve the accuracy of the exposure database and the results of the vulnerability estimation, the building delineation process must be refined. It is recommended that future work also use a Normalized Difference Vegetation Index (NDVI) threshold to better eliminate tree features in the nDSM. If point cloud data are available, point cloud classification instead of image classification may also be done for building detection to produce more regularized footprint geometries. MBT samples are also preferably collected through ground surveys to prevent misclassification in the test/training dataset and create a more reliable prediction model. Fieldwork may also be conducted to validate the results of the vulnerability estimation. Lastly, the rapid exposure and vulnerability methodology may be implemented for pre-and post-earthquake events to conduct damage assessments. This will be helpful in streamlining DRRM activities and speeding up earthquake disaster response.