THE EMERGENCE OF SOCIAL MEDIA FOR NATURAL DISASTERS MANAGEMENT : A BIG DATA PERSPECTIVE

Social media is rapidly emerging as a potential resource of information capable to support natural disasters management. Despite the growing research interest focused on using social media during natural disasters, many challenges may arise on how to handle the ‘big data’ problem: huge amounts of geo-social data are available, in different formats and varying quality that must be processed quickly. This article presents a state-of-the-art approach towards the enhancement of decision support tools for natural disaster management with information from the Twitter social network. The novelty of the approach lies in the integration of Geographic Information Systems (GIS) modeling outputs with real-time information from Twitter. A first prototype has been implemented that integrates geo-referenced Twitter messages into a Web GIS for wildfire risk management and real-time earthquake monitoring. Following a highly scalable architecture that relies on big data components, the proposed methodology can be applied in different geographical areas, different types of social media and a variety of natural disasters. The article aims at highlighting the role of social big data, towards a more sophisticated transfer of knowledge among civil protection agencies, emergency response crews and affected population.


INTRODUCTION
The influence of climate change trends and anthropogenic causes has radically increased the number of natural disasters over the last decades (Pausas et al., 2008).The concern is growing worldwide, despite the wide utilization of decision support systems for the confrontation of natural disasters such as forest fires (San-Miguel-Ayanz et al., 2002;Noonan-Wright et al., 2011;Gumusay and Sahin, 2009;Kalabokidis et al., 2016), earthquakes (Strasser et al., 2008;Avvenuti et al., 2014;Clinton et al., 2016) and floods (Bartholmes et al., 2009;Addor et al., 2011;Shiravale et al., 2015).The majority of the aforementioned disaster management systems contain crucial functionalities based on the integration of different models, tools and data analysis, providing timely and cost effective assistance and information for disaster management (Kalabokidis et al., 2013;2016).
Unpredictable factors during a natural disaster, such as any sudden changes of winds during a wildfire, collapse of a building right after an earthquake, or human actions during the emergency are too difficult to be evaluated in the current decision support systems.Real-time information collected near the disaster location might be useful when dealing with disaster situations during or immediately after an emergency (Landwehr et al., 2014;De Albuquerque et al., 2015).
At the same time, the growing use of electronic devices equipped with Global Positioning System (GPS) receivers has increased the amount of geoinformation available in social media platforms (e.g.blogs, chat rooms, discussion forums, wikis, YouTube Channels, LinkedIn, Facebook and Twitter) and transformed them into location-based social networks (Roick and Heuser, 2013;Houston et al., 2015;Beigi et al., 2016).Social media messages with geographic reference can be described by the terms of Volunteered Geographic Information (VGI) (Goodchild, 2007;Sui and Goodchild, 2011;Stefanidis et al., 2013), Neogeography (Hudson-Smith et al., 2009;Goodchild and Glennon, 2010) and Crowdsourcing (Zook et al., 2010;Gao et al., 2011).
Social media has been used to disseminate a wide range of public safety information before, during and after natural disasters by providing assistance towards the establishment of situational awareness (Herfort et al., 2014).Before an incident, social media can be utilized by emergency management organizations to provide citizens with preparedness and readiness information.Social media can be used to send and receive disaster preparedness information, disaster warnings and detected disaster signals (Mathbor, 2007).During the event, they can be utilized for sending or receiving requests for assistance, as well as to inform about the location and the conditions of disaster-affected populations (Alexander, 2014).Obtaining real-time information as an incident can help first responder organizations to determine where people are located, assess victim needs, and alert citizens and first responders to changing conditions and new threats.In post-event phases, social media can provide and receive information about disaster response and recovery (Houston et al., 2015).
Despite the intensive research activities regarding the contribution of social media in natural disasters management (De Longueville et al., 2010;Vieweg et al., 2010;Fuchs et al., 2013;Crooks et al., 2013;Croitoru et al., 2013;Herfort et al., 2014;De Albuquerque et al., 2015;Schade et al., 2013), many challenges are still open.The form of geo-social media is highly unstructured and thematically diverse, while valuable knowledge is often implicit and cannot be easily processed through automation (Sahito et al., 2011).Such data structure heterogeneity has a direct impact on the ability to store, manage or process effectively.Apart from data diversity, there is an enormous volume of social geospatial dataespecially during emergenciesthat must be analyzed as soon as possible.With all the high volume, high speed and varied structure of social media content, one significant challenge is to deal with this 'big data' problem (Athanasis et al., 2017).Even though the amount of the available data is huge, reliable information is rare to find and difficult to locate within the enormous pool of social media postings (Landwehr and Carley, 2014).It is easier than ever to search the web for information, but filtering out falsehood and off-topic discussions from the huge online content still remains difficult (Herfort et al., 2014).Thus, it is still an open research question how emergency management agencies and the public can capitalize on the abundance of geo-social media by reducing the volume to credible and relevant content (Spinsati and Osterman, 2013).
This article presents a state-of-the-art approach towards an enhancement of decision support tools for natural disaster management with social media.Its novelty lies in the enrichment of geospatial content retrieved from Geographic Information Systems (GIS) modeling outcomes with real-time disaster-related information from social media during an emergency incident.Instead of solely relying on social media sources or 'a posteriori' analysis through classification (Lofi et al., 2012), or machine learning approaches (Terpstra et al., 2012), the applied methodology is based on the combination of spatial danger rating models with geo-social tweet messages.As a result, the volume of the underlying tweet messages that have to be processed is significantly reduced and the possibility to include erroneous messages is minimized.By using existing and well-studied geographical models for danger rating, the open problem of handling social media information during the occurrence of natural disasters can be tackled.
A wide range of studies highlight the contribution of social media at natural disasters management.A survey about how individuals and organizations use social media in disaster events is described in Landwehr and Carley (2014).Social media visualization in location-based knowledge discovery has been analyzed in MacEachren et al. (2011) and Terpstra and deVries (2012).2010) focus on the analysis of Twitter data during the Spring 2009 Red River Floods and Oklahoma grass fires events; they identified features of information generated during emergencies and described how Twitter can contribute to enhancing situational awareness.Spinsanti and Ostermann (2013) enrich VGI with geographic context found in Spatial Data Infrastructures (SDI) or other databases with a geographic component; they present a system designed to retrieve, process, analyze and evaluate social media content on forest fires by integrating authoritative data sources with VGI.
Regarding the contribution of social media in flooding management, Vieweg et al. (2010) and Starbird et al. (2010) analyze Twitter messages during the flooding of the Red River Valley in the United States and Canada in 2009, seeking to discern activity patterns and extract useful information.A similar approach is followed in Herfort et al. (2014a;2014b) and De Albuquerque et al. (2015) who explore the relations between spatial information from social media messages and geographic information retrieved from hydrological data and official sensor data.Triglav-ˇCekada and Radovan (2013) show how volunteered geographical information has been used to map serious floods in Slovenia in 2012.Fuchs et al. (2013) followed a visual exploration and analysis methodology for a set of geolocated tweets from Germany regarding the severe flooding throughout Germany in the summer of 2013.
Regarding the analysis of social media in earthquakes response, Sakaki et al. (2010) and Crooks et al. (2013) investigated the use of Twitter for detecting and estimating the trajectory of earthquakes in real time.Acar and Muraki (2011) applied openended questionnaires to selected Twitter users and also analyzed the tweets sent in response to the Tohoku earthquake and the consequent tsunami in Japan.Earle et al. (2010) describe the contribution of Twitter in earthquake response.
A large number of related approaches focus in the contribution of VGI in disaster-related information, but examine the social media as a stand-alone information source (e.g. in Acar and Muraki, 2011;Xu et al., 2014;Houston et al., 2015).In contrary, Tomaszewski et al. (2014) combine Federal Emergency Management Agency reports with related terms found in tweet messages.Spinsanti and Ostermann (2013) follow a ranking and clustering methodology of tweets by enriching the tweet context with different geospatial characteristics found in related SDIs.Albuquerque et al. (2015) follow a similar approach to enhance the identification of relevant messages from geo-social media as VGI and geographic features of flood phenomena derived from authoritative data (sensor data, hydrological data and digital elevation models).
Even in the cases where the geo-social content is combined with GIS data from diverse sources, this accomplishment requires 'a posteriori' analysis of the messages mostly though classification (Gao et al., 2011;Rogstadius et al., 2011;Albuquerque et al., 2015), machine learning (Sakaki et al., 2010) or natural language processing methods (Corvey et al., 2010).This analysis, however, adds crucial time overheads that hinder the timely and effective response to an emergency.
Our approach is based on the enrichment of geospatial modeling results with real-time disaster-related information from social media during an emergency incident.Compared to similar studies, the added value is by combining wildfire behavior modeling outputs with tweet messages in order to increase the accuracy and efficiency of the tweets during an emergency.

METHODOLOGY
The first pillar of the proposed methodology (Fig. 1) consists of the GIS modeling component that utilizes the Minimum Travel Time (MTT) algorithm (Finney, 2002), as the fire behavior prediction system inside a prototype web-based GIS system called GATES.By running fire simulations through the MTT algorithm, parameters such as major flow paths, spread rate, time of arrival and fireline intensity can be calculated.The perimeter polygon of the simulated fire restricts the area where the tweet messages will be filtered before shown in the webbased GIS system.The geographic location of any tweet message is described in the metadata field 'coordinates', which is also known as geotag.In general, users can geo-reference messages in Twitter either manually (e.g. by entering the name of a city in the field 'location') or automatically when a client application has access to the coordinates of a GPS receiver.Because in most situations only a small fraction of tweets are geo-referenced by users, an external gazetteer component of the geocoding API of ESRI1 is used.The component searches the tweet messages for place names (toponyms) and assigns coordinates if a toponym is found.
For identifying messages containing relevant to the incident information, Twitter messages are filtered based on specific keywords that is common practice in the analysis of Twitter messages (Vieweg et al., 2010;Graham et al., 2012;Kongthon et al., 2012).Tweets containing the Greek keywords 'photia' or 'pyrkaya (meaning 'fire'), 'sismos' (meaning 'earthquake') are retained.By following the aforementioned approach, tweet messages can be visualized on top of a Web-GIS system.
In the core of the proposed architecture lies the Apache Kafka component, an source stream processing platform (Fig. 2).2It is a high scalable message queue storage, capable of process streaming data such as tweet messages.Apache Kafka works together with the Apache Hadoop3 framework.The Hadoop open-source work framework provides tools for organizing, managing and transforming large-scale data.On top of Hadoop runs the Hadoop Distributed File System (HDFS) that is a distributed file system designed to run on commodity hardware.Inside the big data cluster, virtual machines called workers receive the twitter messages and distribute them to the brokers, which are responsible for replicating the messages.
Tweet messages are retrieved from the Twitter Source by utilizing the Twitter API and stored in Kafka topics.The Kafka Connect API is utilized that receives messages from any sources (such as Twitter) and redirects them into related sinks (i.e.Cassandra, PostgreSQL, ElasticSearch).In the proposed methodology, tweet messages are retrieved from the Twitter API (that is used as a source) and stored in a Kafka Topic through a TwitterSourceConnector.From the topic, the Producer API is used to connect the source (i.e.Twitter) to any Kafka topic as a stream of records for a specific category (i.e. a specific natural disaster event).From there, the consumer API is used to get out the tweeter messages from the Twitter topics into ElasticSearch4 , a distributed big data search and analytics engine capable for near real-time use cases.Fig. 3 describes how ElasticSearch is used for filtering out the off-topic tweets.
The area of interest for the specific wildfire (i.e. the arrival time based on the MTT fire behavior modeling) is retrieved in a JSON format from the ArcGIS Server that is used to store all output results of the fire simulations.This polygon about the area of interest is used as a GeoPolygon Query inside the ElasticSearch big data store, to exclude the off-topic messages and visualize the meaningful messages through the web-based GIS visualization platform.
Fig. 3 -Filtering on/off-topic tweet messages in the Big Data cluster

RESULTS AND CONCLUSION
The present work aims to highlight the role of the social big data, towards a more sophisticated transfer of knowledge among the civil protection authorities, emergency response crews and the affected population.The results from our case studies show that social media content encloses potentially useful information and can act as an additional communication channel for citizens who have been affected by a disaster.
An obstacle to the efficient use of the GATES platform is the content's sheer volume and its unstructured nature.Very often, neither the available hardware nor software allows citizens to search social media content efficiently, and ensure that all important information is received and read.Therefore, the integration and dissemination of social media content is an important and valuable contribution to the overall disaster management effort.Paper results show that focusing on the geographic context of the VGI provides a useful approach to deal with the information overload by filtering and assessing the social media content based on credible and authoritative spatial information.
Our approach follows a 'big data' architecture to with challenges of huge amounts of data, in different formats and varying quality that must be processed quickly.Big Data technology emerges as a technology capable of successfully addressing contemporary digital challenges.Big Data provide high-volume, high-velocity, and/or high variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.
The proposed highly scalable architecture relies exclusively on big data components; thus, it can be applied to different geographical areas, to different types of social media and to a variety of natural disasters.Even though the Big Data ecosystem integrates many platforms and software components, it is mainly based on distributed storage and processing of very large datasets on computer clusters.
A first prototype of the proposed approach has been tested by the local civil authorities in Lesvos Island, Greece, throughout the 2017 season.The prototype included earthquake real-time tweet analysis for earthquake events.On June 12, 2017 (12:28 GMT), a strong earthquake (Mw 6.3) struck Lesvos Island (Northeastern Aegean, Greece).Based on the preliminary seismological data provided by the University of Athens (Papadimitriou et al., 2017), the earthquake epicenter was located offshore southeastern Lesvos.The main shock was at depth of about 13 km and the fault plane solutions demonstrated a NW-SE striking and SW-dipping normal fault that constitutes the northern margin of the offshore Lesvos basin.In the village of Vrissa, 113 buildings were characterized as inhabitable, while 472 structures were characterized as uninhabitable including 408 residential buildings, 25 business premises, 6 churches and public buildings, and 33 warehouses.Fig. 4 shows the graphical user interface of the Geosocial Tweet System -GATES, which is the front end of the proposed methodology, for this earthquake.In the upper part of the GATES system, the locations of the on-topic tweets are shown overlaid over a web map.In the bottom part of the GATES interface, the users can see the content of each related tweet.
Our goal is to extend the functionalities of the GATES system and apply it to different geographical areas, to different types of social media and to a variety of natural disasters.The extended GATES system is to be tested by local fire authorities in Lesvos Island, Greece, throughout the next wildfire season.The objective is to avoid situations similar as to a forest fire in Lesvos that burned approximately 550 ha on 30 August 2015; a clear wind direction shift occurred when a fast-moving front inside a gully surpassed the mountain ridge after the first hours of fire spread (Kalabokidis et al., 2016).The fire prediction simulations of MTT were conducted for wildfire propagation, but the model did not take into consideration unpredictable factors such as the wind shift.The utilization of the GATES system for wildfire management may potentially not only support civil protection and fire control services in the organization of effective wildfire management and control, but also contribute to the immediate and massive alert of firefighters and/or people who are at risk during a fire outbreak.Kalabokidis, K., Athanasis, N., Gagliardi, F., Karayiannis, F., Palaiologou, P., Parastatidis, S. and Vasilakos, C., 2013 For wildfires, De Longueville et al. (2010) use location-based social networks as a reliable source of spatiotemporal information, by analyzing the temporal, spatial and social dynamics of Twitter activity during a major forest fire event.Sinnappan et al. (2010) categorize tweets during the 2009 Black Saturday bushfires in Australia, while Sutton et al. (2008) describe the analysis of tweets in California during the wildfires of 2007.Vieweg et al. (

Fig. 1 -
Fig. 1 -Conceptual design of the proposed methodology