POTENTIAL OF GEOLOCATED CROWDSOURCED IMAGE POSTS IN PREDICTING AN EARLY ESTIMATE OF THE PATTERNS OF STRUCTURAL DAMAGE FOLLOWING A HURRICANE

During a disaster, the activity of the crowd represents a very valuable source of the on-the-ground conditions shared by the affected citizens. The approach, presented in the paper, explores the relationship between the spatial distribution of crowdsourced image posts and damaged buildings in order to understand the potential of modelling the spatial distribution of damaged buildings based on geolocated images. The posts related to the hurricane Michael that happened in the United States in October 2018, showing the building damage of Panama City, have been collected by NAPSG Foundation and GISCorps volunteers. The building damage assessment, based on the analysis of high-resolution post-event imagery, has been performed by FEMA. Exploring the two available independent point datasets, the spatial pattern of each individual dataset has been analysed and furthermore the spatial relationship between them has been explored. A set of spatial statistics has been performed with R software. For this purpose, the distance-based methods have been used, that consider the mutual position of points to describe the patterns. The results shown the spatial relationship between the crowdsourced photos and different damage types. Furthermore, potential of crowdsourced images for improving the awareness of the structural damage after the hurricane have been discussed. * Corresponding author


INTRODUCTION
Mobile technologies, web-based platforms, and social media have made it possible to easily exchange information related to any topic, including natural and human driven disasters. Focusing on disasters, Poblet et al. (2018), have shown that this type of sources has changed the landscape of disaster management. Damage assessment upon a disaster event, such as a hurricane, has a very important role since it helps to understand the nature and size of the event on which the further emergency response and recovery activities depend. The term crowdsourcing was introduced by Jeff Howe (2008) who defined it as "the act of taking a job traditionally performed by a designated agent and outsourcing it to an undefined, generally large group of people in the form of an open call". Combining mobile technology and crowdsourcing methods new forms of contribution to the disaster management have been created. According to the research of Roberts and Doyle (2017), during a disaster, the crowd engagement is constantly growing. It could represent a very valuable source of the on-the-ground conditions shared by the affected citizens. If this type of source is considered as real-time crowdsourcing of crisis information, the spatial distribution of geolocated images related to an event could represent an early indicator of the severity of its impact (Spasenovic et al., 2019). In the light of previous consideration, the following question may arise: would it be possible to estimate the building damage distribution by exploiting crowdsourced information?
Over the past years, the increase of research work focused on crowdsourced information related to a disaster has been recorded in literature. Corbane et al. (2012) have explored the relationship between the spatial pattern of SMS messages and building damage in the Haiti disaster of 2010. Triglav-Cekada and Radovan (2013) have explored the potential of crowdsourced information for mapping the flood that happened in Slovenia in November 2012. Albuquerque et al. (2014) have compared the spatial distribution of social media with respect to authoritative data in order to identify useful information for disaster management.
The work presented in this paper explores the spatial relationship between two datasets that describe the damage of the buildings after the hurricane. Crowdsourced image posts were compared with the building damage map to understand the potential of modelling the spatial distribution of damaged buildings based on geolocated images. The case study event is hurricane Michael that hit the Florida Coast of the United States of America in the period from 7 th October to 11 th October of 2018. According to the report from National Oceanic and Atmospheric Administration (NOAA, Beven et al. 2019) Panama City was one of the most affected areas. The available crowdsourced images describe the damage of the buildings in Panama City. The dataset was provided by National Alliance for Public Safety GIS (NAPSG) Foundation and it is publicly accessible. NAPSG foundation with the help of GISCorps volunteers have collected one hundred and nine crowdsourced photos relevant to the hurricane impact over the area of interest. On the other hand, the building damage dataset has been created by Federal Emergency Management Agency (FEMA), based on the analysis of high-resolution post-event imagery performed by professional mappers. Exploring the two available independent point datasets, the spatial pattern of each individual dataset has been analysed and furthermore the spatial relationship between them has been explored. A set of spatial statistics have been computed with R software. For this purpose, the distance-based methods have been used, that consider the mutual position of points to describe the patterns. More precisely, the positional relationship between the crowdsourced images and affected buildings have been recorded. The results are presented and discussed, showing the potential of crowdsourced images for improving the awareness of the structural damage after the hurricane.
The NAPSG Foundation, with the help of twenty GISCorps volunteers, in four days collected nearly 600 geolocated photos relevant to the hurricane. The photos were located over the areas hit by the hurricane, showing its impact on the structures, nature, and community.
For the purpose of this research photos describing the building damage over Panama City have been selected from the original dataset. Totally, one hundred and nine crowdsourced photos showing the damaged buildings of Panama City have been selected. The selected photos have been grouped into two categories: the first group of 104 photos (95.5%) presents affected buildings and the second group of 5 photos (4.5%) presents collapsed buildings. Three days after the hurricane Michael landfall, the first images were collected on the platform. The total dataset has been ready in seven days, the exact day of the platform activation is not known. As it can be seen in Figure 1, the highest number of photos, representing building damage in Panama City, was collected over a period of two days, the most productive was the fifth day with totally 67 photos (62%). The images were collected from social media and news outlets. The georeferencing of the image posts was done by volunteers, that found a clue objects in the pictures, useful for location definition. This is very important fact that allows to make the hypothesis that the defined location of the collected image posts has high precision. It is useful to point out that all GISCorps volunteers have knowledge of Geographic Information Systems (GIS), according to which the tasks were assigned. Thanks to this, it is possible to confirm the reliability of the provided data.

Building damage assessment
FEMA Mapping and Analysis Centre (MAC) created an open access repository: the Historical Damage Assessment Database (source: http://disasters.geoplatform.gov/publicdata/ National/Data/HistoricalDamageAssessmentDatabase/), that contains geospatial damage assessments from the past national disaster events in the United States. For this research, a point dataset showing the buildings of Panama City that were damaged by hurricane Michael was downloaded from the repository. The buildings damage assessment was based on the analysis of high-resolution post-event imagery performed by professional mappers. Two damage categories were present in the dataset: Affected and Destroyed. "Affected" label corresponds to buildings which in the post-event imagery were missing roof segments, presented failure of structural elements and/or had visible damage. "Destroyed" label corresponds to buildings which collapsed. It is important to point out that visual imagery assessment was done using nadir imagery so damages to the sides of buildings were not evaluated. In order to have a more complete building dataset, which includes also not damaged buildings, Open Street Map (OSM) buildings of Panama City and surrounding area have been exported and compared with the FEMA dataset. In the OSM dataset, buildings are mapped as polygons, in order to have the data of the same type, the centroid of each OSM building polygon has been considered. Buildings coming from OSM that have not been mapped in the FEMA dataset have been marked as "Not Affected" and added to the damage assessment point dataset. All in all, the dataset contained 38.594 points ( Figure 2) of which 25.293 (65.5%) were labelled as "not affected" buildings, 13.044 (33.8%) as "affected" buildings and 267 (0.7%) as "destroyed" buildings.

Quantitative analysis: kernel density estimator
An important descriptor for the purpose of this research is data distribution. Density estimates are ideal for this purpose, for the simple reason that they are easily understandable quantitative analysis. The kernel density estimator was proposed by (Silverman, 1986) for the estimation of the probability density function of a uni-variate random variable of which a sample of n observations is known. In the uni-dimensional case, given the sample of n observations X1…Xn the Kernel density estimator is defined as: ( 1) where h = bin width, also called bandwidth x = origin K = symmetric probability density function which satisfies the condition (Silverman, 1986): (2) It is possible to implement algorithms that extend the kernel density estimation to two-dimensional variables, producing "heatmaps" that can be used to represent the results of the kernel density estimates.
Kernel density heatmaps can help to better understand the data and decide the direction of further analysis. For this reason, they are appropriate for the comparison of building damages types and suitable crowdsourced photos.

Results of kernel density
The kernel density has been calculated to compare the spatial distribution between the two a priori different datasets. In fact, even though the datasets are different, they are both presenting the building damage types. Two categories of crowdsourced photos have been analysed separately and compared with heatmaps of corresponding building damage type. The heatmap of crowdsourced images, representing the affected buildings, was compared with the heatmap of the affected building mapped by FEMA (Figure 3). The heatmap of crowdsourced images representing the destroyed buildings was compared with the heatmap of the destroyed buildings ( Figure 4). As presented in Section 2, the size of the two datasets is very different. For ease of comparison, the same bandwidth was selected for the compared datasets. The bandwidth of 1 km was selected, according to the methodology of Sheather and Jones (1991). All analyses were performed in R software using the stats package (R Core Team, 2020). shows the kernel density of affected buildings mapped by FEMA. Observing the heatmap it is possible to see that affected buildings were registered in almost all parts of Panama City, especially in the south and south-east. The density of crowdsourced images of affected buildings (Figure 3b) shows that the images were taken in the same parts of the city where the affected buildings were registered. With this observation it is possible to say that the distribution of crowdsourced photos of affected buildings closely resembles the distribution of affected buildings. Moreover, the significant density of 1.2 10 -6 was registered for collected images in the southern part of the city. In the south-east part slightly lower density was registered, showing the presence of the collected photos in this area. Registered photos had smaller concentration than the ones in the southern parts of the observed area.  Figure 4a shows the kernel density of destroyed buildings mapped by FEMA. From the heatmap it is possible to see that a significant concentration of damaged buildings was recorded in the south-east part of the city. The kernel density map of crowdsourced images of destroyed buildings (Figure 4b) shows different results. The crowdsourced images of destroyed buildings were taken in the south-west parts of the city. For the selected radius of 1 km it is possible to see that the maximum density is 1.2 10 -07 , meaning that few points are present. In the south-west part of the city FEMA mapped destroyed buildings but with a low density (light blue spots in Figure 4a). The collected images of the destroyed buildings will not be further analysed, since they are not located in the most affected area and their number is low.

Spatial pattern analysis
To better understand the pattern of the spatial distribution of the point datest, the distance-based method, Ripley's K-function, has been applied. The main characteristic of the distance-based methods, in general, is the consideration of the points spacing for pattern description (Moller and Waagepetersen, 2007). Spatial pattern analysis based on Ripley's K-function is a second-order analysis of point patterns in a two-dimensional space (Ripley, 1976). A point process can be considered as a a) The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2020, 2020 XXIV ISPRS Congress (2020 edition) probabilistic model of phenomena or objects representable as a finite set of points in an observation window W (Diggle, 1986). Ripley's K function can be calculated in a univariate form, taking into consideration points of the same type. Given the location of all points from the same type within the study area, the Ripley's self-K function (Ripley, 1976) is calculated: ( where r = radius λ = density (number per unit area) of events E = expected number of events within the radius r from a randomly chosen event.
The idea of the self-K function is to estimate the number of other points lying within the distance r of a randomly chosen point of the same event. In general, to evaluate the clustering or dispersion of the point pattern the Ripley's K(r) function should be tested to complete spatial randomness (CSR) (Ripley, 1979). This can be done by comparing the observed values to homogeneous Poisson process, where the theoretical values are distributed independently without any interaction. In practice it is considered to use the Besag L-function, transformation of the K-function (Besag and Clifford, 1989) because its variance is approximately constant under CSR (Ripley, 1979): Ripley's K-function and L-function have been widely used for defining and understanding the relationship between point pattern for data of the same type. They are hence suitable for exploring the relationship of crowdsourced photos and each type of building damage.

Results of the self-correlations for damaged buildings and crowdsourced photos
In order to understand the spatial distribution of crowdsourced images and buildings with different type of damage grade the self-function Lii(r) has been applied. Each point type of the datasets has been individually observed: crowdsourced images, not affected buildings, affected buildings and destroyed buildings. All analyses were performed in R software using the spatstat package (Baddley and Turner, 2005;Baddley, 2008).  (Figure 5d), under the hypothesis of CSR. The study area was defined with the observation window representing the city of Panama City boundary map. To determine the statistical significance of the results the Monte Carlo method was applied and corresponding confidence intervals were obtained. In the context of spatial pattern analysis, the Monte Carlo method simulates randomly generated distribution of the same dimensions as the observed point pattern (Haase, 1995). The number of simulations was set to 99 which computed the 99% confidence envelope (Leemans, 1991).

Observed value Theoretical value Confidence interval
Figure 5. Self-function Lii for a) not affected buildings, aa) zoom on the distribution for small distances <20m, b) affected buildings, bb) zoom on the distribution for small distances <40m, c) destroyed buildings, d) crowdsourced images The deviations of the sample statistic from zero expectation (red line) is positive and above the upper limit of the confidence interval (upper green line) for all three types of building damage grades and crowdsourced images. The magnitudes of deviations from CSR are high, so it is possible to say that at both smalldistances and large-distances the distributions are clustered for crowdsourced photos as well as for all levels of building damage. According to the obtained results, as expected, the correlation is higher for the building damage of a same grade at small distances. The same can be said for crowdsourced images that have been taken from neighbouring locations. Taking a closer look at Figure 5a and b, Figure 5aa and bb show that the deviation trend for very small distances is under the zero expectation or inside the confidence interval. Therefore, it is possible to say that for very small distances (in particular less than 7m for not affected buildings and less than 15m for affected buildings) the spatial distribution can be considered random. The reason for that could be explained with the fact that points correspond to the centroids of buildings and in most cases buildings are surrounded by a garden. The nugget effect that is present in Figure 5d is caused by the spatial pattern of crowdsourced images, since more than one point shares the same location. This is very useful information related to the quality of the data collected from crowdsource platform. Checking the images with the same location, it has been found that two out of twelve pictures have been taken from the same location with different directions, while the remaining ten images are showing the damage of the same object (hospital and school) but from different angles (different locations). aa) The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2020, 2020 XXIV ISPRS Congress (2020 edition)

Spatial correlation of crowdsourced photos and building damage map
The K-function can be used not only to summarize the point pattern (as presented in Section 3.3), but also to explore the relationship between points of different type. The previous analyses considered only the location of points, they ignored any other information. The point patterns could have additional information about each point. For example, the points of a pattern could represent different types of objects (in our case points represent crowdsourced posts and damaged buildings). In addition, each point u of a point process X, could be associated to a random variable mu, called a mark. These data are called multivariate spatial point patterns (Stoyan and Stoyan, 1996). In this case the generalization of K(r) to more than one type of points is called cross-K function and is computed as follows: where r = circle radius λ = density (number per unit area) of events i ≠ j = types of the events E = expected number of events within the r of randomly chosen event. The idea of the cross-K function is to estimate the number of type j points within distance r of a randomly chosen type i point. The obtained value is compared with the theoretical one, which represents the absence of attractions or repulsion between data points of different types (Lotwick, 1984). For this purpose, the hypothesis of independence of population was used. As for the self-function (paragraph 3.3) it is also possible to define transformed cross-L function.

Results of the cross-correlations
The cross-function Lij(r) has been applied in order to understand the spatial interaction between the two a priori different datasets of crowdsourced photos and building damage. Each building damage type has been considered separately and analysed with respect to crowdsourced images. In this way, it has been possible to study the type (namely, attraction or repulsion) of the different interactions and their intensity. It was possible to evaluate whether one type of point tends to be surrounded by points of the other type. Figure 6 shows the results of the computed cross-functions Lij describing the relationship of crowdsourced photos and not affected buildings (Figure 6a), of crowdsourced photos and affected buildings (Figure 6b), of crowdsourced photos and destroyed buildings (Figure 6c), under the hypothesis of independence of population (red line in the plots). To determine the statistical significance of the results, the Monte Carlo method was applied, and 99% confidence intervals were computed (green line in the plots).

Observed value
Theoretical value Confidence interval Figure 6. Cross-function Lij for a) crowdsourced images and not affected buildings, b) crowdsourced images and affected buildings, c) crowdsourced images and destroyed buildings The results show that cross-correlation between crowdsourced images and not affected buildings (Figure 6a) is positive and lies above the Monte-Carlo simulation envelope (99% confidence interval). The high magnitude from the upper boundary of the confidence interval indicates significant interactions between crowdsourced images and not affected buildings for all distance ranges. A bit different distribution of the cross-correlation between crowdsourced images and affected buildings has been recorded ( Figure 6b). The computed Lij is negative and inside the confidence interval for small distances (r < 500 m approximately), indicating not significant interactions. Furthermore, for the distances between 0.5 km and 4 km the distribution is positive and above the confidence interval, indicating the attraction between the crowdsourced images and affected buildings. This means that in a 0.5 to 4 km radius, the number of crowdsourced images surrounded by randomly chosen affected buildings is greater than expected if the two patterns are independent. For a distance of 4 km or more, repulsion between two observed objects has been found. Analysing the relationship between the crowdsourced photos and the destroyed buildings (Figure 6c), the distribution of the cross-correlation lies inside the envelope for distances smaller than 2 km and for distances larger than 7 km, suggesting not significant interactions between crowdsourced photos and destroyed buildings. For the remaining distances (2 km < r < 7 km approximately) the repulsion has been found. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2020, 2020 XXIV ISPRS Congress (2020 edition)

DISCUSSION AND CONCLUSION
The work presented in this paper explores the relationship between geolocated crowdsourced photos of damaged buildings and building damage obtained from remote sensing data. The aim of this work is to explore spatial point processes in order to understand the possibility of crowdsourced data to predict an early estimate of the structural damage patterns following a hurricane. The first analysis used the kernel density function in order to build heatmaps. This fast and easy quantitative analysis highlighted the spots with a significant concentration of the observed point datasets. A simple visual inspection of the obtained results confirmed the concurrence between the distribution of crowdsourced photos of affected buildings and the distribution of affected buildings mapped by FEMA. On the contrary, the distribution of collected images of destroyed buildings was not matching with the distribution of mapped destroyed buildings and their total number was small. This observation led to the conclusion that crowdsourced photos of damaged buildings could not be considered as a representative sample. For this reason, they were excluded in further analyses.
The self-correlation analysis demonstrated the presence of clusters in the distribution patterns of different building damage types and crowdsourced images. These results indicated that building damages of the same type were registered close to each other. Similarly, crowdsourced photos were taken in neighbouring locations. The cross-correlation analysis highlights the spatial attraction between geolocated crowdsourced photos and the affected buildings. The collected photos of the affected buildings were taken near affected buildings extracted from remote sensing data. The result did not show significant attraction at small distances, which can be explained with the fact that people were taking pictures from safe not affected places. In fact, at small distances the significant attraction was recorded between not affected buildings and crowdsourced photos.
The main limitation of the analysis is that it is very much data driven, the reliability of the results depends on the quality of the input data. The crowdsourced images were geotagged manually by searching clue objects in the pictures, useful for location definition. In some cases, it was found out that the defined location was not precise. It turned out that images sharing the position of capturing were showing the damage of the same building but from different angles. For some buildings, the contradictions between two sources was found for the assessed damage type (see Appendix). The source of pre and post hurricane imagery used by the FEMA is not known and in some cases the reason for inconsistency could not be explained.
To conclude, this work has shown that real-time geolocated crowdsourced photos have potential as early indicators of the patterns of structural damage caused by a hurricane. Yet, it is necessary to apply the proposed analyses on other test cases in order to better understand the relationship between crowdsourced reports and damaged buildings. Analysing larger data sets will help to better assess the parameters needed for spatial modelling structural damage patterns following a hurricane. Crowdsourced platforms and social media are very powerful source of information and by increasing the awareness of their role in disaster management, valuable information could be obtained.