CONCEPTS – LOCATIONS – EMOTIONS: SEMANTIC ANALYSIS AND VISUALIZATION OF CLIMATE CHANGE TEXTS

Research on knowledge discovery in the geospatial domain currently focuses on semi-structured, even on unstructured rather than fully structured content. The attention has been put on the plethora of resources on the Web, such as html pages, news articles, blogs, social media etc. Semantic information extraction in geospatial-oriented approaches is further used for semantic analysis, search, and retrieval. The aim of this paper is to extract, analyse and visualize geospatial semantic information and emotions from texts on climate change. A collection of articles on climate change is used to demonstrate the developed approach. These articles describe environmental and socio-economic dimensions of climate change across the Earth, and include a wealth of information related to environmental concepts and geographic locations affected by it. The results are analysed in order to understand which specific human emotions are associated with environmental concepts and/or locations, as well as which environmental terms are linked to locations. For the better understanding of the above-mentioned information, semantic networks are used as a powerful visualization tool of the links among concepts – locations – emotions. * Corresponding author


INTRODUCTION
Nowadays, data and information are considered valuable "assets", as humans have realised their importance in terms of solving complex problems and facilitating the decision-making process. However, there is a major challenge; the majority of data and information are stored in unstructured formats, which are difficult to deconstruct, process, and analyse since they have no pre-defined model. Some characteristic examples of such data are text, video, audio, and satellite imagery (Gupta, Bhattacharyya, Khanna, & Sagar, 2020).
Thereby, tools and techniques have been developed, in order to extract valuable information from unstructured data. Information extraction processes transform unstructured texts into structured data in order to derive meaningful information from these texts and enable other related processes such as semantic annotation, analysis, and visualization (Martinez-Rodriguez et al., 2018).
The wealth of unstructured data presents a major challenge and an opportunity for the geospatial domain, since these data contain valuable information in terms of geospatial concepts, locations, events, phenomena, and activities occurring in space. There is a need to explore and develop methods in order to extract such information from unstructured data and link it to geographical locations. This would enable the search of such data using spatial criteria, as well as, a better understanding of complex, interconnected, and interacting environmental and socio-economic challenges. Going a step further, analysing sentiments related to environmental phenomena and problems could help us gain a better comprehension of people's concerns and opinions on the greatest risks of the planet such as climate change.
In this paper, we combine information extraction and sentiment analysis techniques to extract environmental terms, locations and climate-related emotions from online articles about climate change. The semantic links between the extracted elements are also identified in order to support their semantic visualization and exploration.
The paper is organized as follows: Section 2 presents related work on information extraction and semantic analysis mostly focussing on the geospatial domain. The paper's methodology is explained in Section 3, Section 4 details the results of the information extraction framework, while Section 5 presents the visualization of the extracted environmental terms, locations, emotions. Finally, Section 6 draws conclusions and discusses future work.

RELATED WORK
Natural Language Processing is a field of Artificial Intelligence (AI) and an automated process to understand and analyse natural human languages, by simulating the human ability to understand a natural language. It is a subfield of Linguistics, Computer Science, Information Engineering, and Artificial Intelligence (AI) and is driven by advances in Machine Learning (ML). The applications of NLP include inter alia Information Extraction and Sentimental Analysis (or Opinion Mining). Both applications are combined in this paper and thus, briefly presented in what follows.
Information Extraction (IE) aims at analysing either semistructured (html pages) or unstructured content (reports, articles, blogs) and at identifying information therein either places, concepts, and relations among them.
Ontology Based IE (OBIE) is a subfield of information extraction which employs ontologies to guide the information extraction process over unstructured corpora (Wimalasuriya & Dou, 2010). Ontologies provide the formal representation of domain knowledge in terms of concepts, properties, and relations and may significantly improve the traditional IE process. For example, an environmental ontology which defines concepts like climate change, greenhouse phenomenon, sealevel rise etc. can be used to guide an IE process of an environmental corpus. Sentiment Analysis (SA) extracts opinionated sentences, categorises their polarity (positive, negative and neutral) and identifies emotions, opinion targets and authors where appropriate (Liu B. , 2012). In the last years, a lot of research has been conducted about people's opinions and sentiments. Almost in every domain, there is an interest and even a need to analyse human's emotions and points of view. The majority of SA techniques make use of machine learning (ML), but this approach usually works better when large amounts of training data are used. What is more ML approaches are not efficient concerning texts with a specific domain, for instance environmental or socio-economic topics. (Aue & Gamon, 2005). As a result, a new method has been developed, namely knowledge-based approach. This kind of approach is suitable for SA applications focusing on identifying and extracting opinion targets, opinion holders and opinion types, rather than merely spotting the polarity (positive, negative or neutral) of a text (Maynard , 2016).
In the geospatial domain, information extraction processes have been used to elicit various types of information. Special emphasis has been put on extracting place names (Vasardani et al., 2013), locative expressions (Liu et al., 2014), and spatial relations (Zhang et al. 2009). The extracted information has supported various additional processes such as semantic annotation, search, or analysis of texts, as well as spatialization.
Lately, research has also focused on extracting other semantic information besides places and spatial relations. Wang and Stewart (Wang and Stewart, 2015) implemented a rule-based approach for the extraction of natural hazards-related concepts as well as spatial and temporal information from news articles on the web, by using a hazards ontology. The annotation systems DBpedia Spotlight (Mendes et al., 2011) and OpenCalais (Butuc, 2009), have been used to extract entities and classes from maps in the approach by Hu et al. (Hu et al., 2015) who developed an ontology of the ArcGIS Online schema. Ballatore and Adams (2015) implemented an NLP technique based on a vocabulary describing natural and built-up places and the emotion vocabulary WordNet-Affect (Strapparava and Valitutti, 2004), to extract emotions related to places from posts on travel blogs.
The present paper focuses on extracting and linking environmental phenomena and other concepts and emotions to locations. The semantic links are subsequently visualized to explore and understand which specific human emotions are associated with environmental concepts and/or locations, as well as which environmental terms and phenomena are linked to specific locations. For the better understanding of the abovementioned information, semantic networks are used as a powerful visualization tool of the links among conceptslocationsemotions.

METHODOLOGY
The work focuses on the extraction of the following types of semantic information: locations, terms representing environmental concepts, and emotions. The EU-funded research project DecarboNet (2021) that investigates the potential of social platforms in mitigating climate change has implemented Information Extraction services which involve GATE (General Architecture for Text Engineering, 2019), an open-source free software for text processing. The DecarboNet Environmental Annotator, a GATE Cloud web service was used in the present work. Environmental terms are extracted using an ontologybased information extraction approach based on GEMET, the GEneral Multilingual Environmental Thesaurus, (GEMET, 2021). The thesaurus defines general terminology for the environment and currently includes more than 5.000 descriptors (environmental terms).
The service distinguishes between the one who holds the opinion ("opinion holder", e.g. the scientist) and for what evidence this opinion is expressed ("opinion target", e.g. for climate change). This Cloud service is suitable for processing large volumes of data. The service takes as input txt, html, xml, json, etc. files and exports json or xml files.
For the purpose of this paper, 52 online articles describing environmental and socio-economic dimensions of climate change across the Earth were semantically annotated. The workflow of the approach is shown in Figure 1 and involves the following steps: (1) pre-processing, (2) natural language processing, (3) Named Entity Recognition, (4) terminology extraction, and (5) sentiment analysis.

Extraction of Environmental Terms and Locations
Τhe NLP and IE tasks have been undertaken using Gate ANNIE tools and involve the following steps: • Pre-processing: converting texts of different formats (.txt, .html, .pdf, etc.) to plain text and prepare it for the next steps, a process called cleansing. In our case, a collection of articles from the web were used, thus the cleansing process consisted of removing all xml tags from the html texts. • NLP task: The data of the clean text has to be prepared in a way that computers can easily find patterns and deductions. This is usually done by associating a label (metadata) with specific content in a dataset. Any metadata tag used to markup elements of the dataset is called an annotation. Therefore, NLP process includes some initial steps which as far as this paper concerns, are the following: o Tokenization: the process of breaking strings into tokens (words, punctuations, numbers, etc). o Sentence Segmentation: the process of detecting the beginning and the end of sentences in a text. o Part-Of-Speech (POS) tagging: the process of attributing a part of speech to each token in the given sentence (e.g., noun, verb, adjective, adverb, pronoun, preposition, etc.). o Lemmatization: the process of converting inflected forms of words into their base form. o Shallow Structural Parsing (constituency): the process which analyses the syntax of a sentence or a string of words and creates a syntactic tree which groups tokens into specific categories based on their grammatical roles, namely noun phrases, or verb phrases. Noun phrases and verb phrases play a key role later in the opinion mining process, as they help to identify the correct opinion holder and opinion target.

Analysis, visualization, and exploration
Pre-processing Figure 1. Workflow of the proposed approach.

Sentiment analysis
Sentiment analysis using DecarboNet follows a rule-based approach (Maynard, 2016). It is based on Flexible Gazetteer Lookup which allows the detection of emotion words of different lexicalisations, Regular Gazetteer Lookup which allows the detection of emotion words of same lexicalisations and Sentiment Grammars, i.e., JAPE (Java Annotation Patterns Engine) rules (GATE, 2021), which annotate emotions and link them to the relevant targets and opinion holders.
It is worth mentioning that sentiment analysis performed by DecarboNet Environmental Annotator, extracts both polarity and sentiments (emotions) on a sentence base, as displayed in Table 1. DecarboNet Environmental Annotator also uses "neutral sentiment" for the case where there are an equal number of positive and negative elements in a sentence and for the case where there is really no emotion in the text (Maynard, 2016).

Polarity
The next phase involves the identification of the correct connection between each extracted emotion and possible "opinion target". All locations and terms ("topics" in DecarboNet's terminology) are candidate "opinion targets". Generally, the various opinion mining applications connect an emotion with the closest topic of the same sentence. If there are more emotions or topics in a sentence, then the topic is linked to the emotion which is the nearest to it and all the others will be ignored. However, the DecarboNet Environmental Annotator implements a different procedure for opinion targets, based on the so-called "context algorithm".
• If there are one or more topics and emotions with the same polarity (positive or negative) in a sentence then a "SentenceSentiment annotation" is created with the emotion that records the highest "score". Note that the "score" of each emotion is based on that of the gazetteer list for the respective emotion. If, for example, in the same sentence there are "love' and "joy" as emotions, (positive polarity) then "love" is rated higher than "joy" so it will become the dominant emotion of the specific sentence.

•
If in a sentence there is one or more topics and emotions with different polarity (positive or negative) then a "topic context" is created. The topics of each sentence are divided, creating different "topic contexts". Thus, each "topic context" is connected with the emotion closest to it or with the highest "score".
We should underline that "score" leads to the final decision on sentiment polarity and may be altered by various contextual clues. Commonly, a negative word found in a linguistic association with a sentiment word will reverse the polarity from positive to negative and vice versa. Negative words are detected via Verb Phrase Chunker (e.g., "didn't") and via a list of negative terms which form another gazetteer list (e.g., "not", "never").
The figures below (Figs. 2 and 3) show an example of how the DecarboNet Environmental Annotator works; text input and annotation selection, text output with annotations respectively.
By clicking on a sentence with an extracted sentiment (e.g., "The effects of a warming planet are likely to be vast and varied -ranging from increased droughts and coastal flooding to reductions in snow and ice") the results are the following.   Table 2. Annotations on a sentence with an extracted sentiment.
It is clear that, although the sentence includes three terms ("warming planet", "drought" and "coastal flooding") the "context algorithm" identified the term "warming planet" as the "opinion target" (target_string) is.

Identification of Semantic links
The aforementioned tasks result in the following 4 elements: As mentioned, our aim is to link emotions with terms and/or locations as well as terms with locations. However, DecarboNet Environmental Annotator links emotions with terms ("opinion target"), but does not link emotions with locations and terms with locations. In order to address this problem, we applied xml parsing which enabled us to link: • emotions with locations: if the start and end node of a location is part of the start and end node of a sentiment sentence, then the location is linked with the dominant emotion of the sentence as earlier described. • terms with locations: if the start and end node of a location is included in the sentence which includes terms, then the location is linked to every term of the sentence.

RESULTS
From the tested corpus of 52 online articles, 529 emotions, 383 terms and 394 locations were extracted. Table 3 shows the specific emotions that were identified and extracted from the corpus.

Negative Positive Neutral
Emotions anger joy disgust surprise fear happiness sadness good swearing bad Table 3. Polarity and emotions that Gate Cloud Service "DecarboNet Environmental Annotator" extracted from 52 online articles, concerning climate change.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2021 XXIV ISPRS Congress (2021 edition) Out of the extracted terms, 294 were linked with an emotion (this corresponds to a percentage of 77%). Respectively, 235 locations were associated with an emotion (74% of the extracted ones). The emotion with the highest percentage related to both environmental terms and locations is "bad" (30% for terms and 32% for locations). This result was expected as the majority of articles were referring to climate change and its negative consequences. On the other hand, "happiness" and "sadness" are the emotions that correspond to the second highest percentage concerning terms and locations, respectively.
Regarding polarity, Table 4 depicts the percentage of negative and positive emotions associated with terms and locations. 187 negative emotions were linked with terms (this corresponds to a percentage of 64%) and 175 were connected to locations (74%). Ιt is worth mentioning that there is a bigger difference between negative and positive emotions regarding locations, in comparison to terms.

VISUALIZATION
The final step is the visualization of the extracted data. For this purpose, Gephi (Gephi -The Open Graph Viz Platform, 2021) a free, open source software is used which supports visualization and exploration for all types of graphs and networks. Gephi is able to visualize complex graphs and networks with plethora of nodes and edges and to optimise graph readability. It also enables the user to interact with the visualization in order to reveal hidden patterns. Depending on the purpose of a graph or network different algorithms may be implemented.
Graph-based visualizations of the extracted information were created, where nodes correspond to the extracted environmental terms, locations, and emotions, while edges refer to the links among them. Nodes' size is visualized based on the total number of times that each "node" appears in the corpus (how many times it has been identified by the IE process).
Similarly, edges' thickness depends on how many times this link appears in the corpus. For example, the term "carbon" is associated with the feeling "bad" six times.
The visualizations aim to highlight: (1) which emotions are associated to environmental terms and locations and (2) which environmental terms are related to which locations.
After having tested the available visualizations algorithms, we concluded that the most readable network, which properly best serves both our objectives is the Yifan Hu algorithm (Hu, 2005).
Colours play a crucial role in visualization in terms of visual perception and semantics. For visualizing emotions, the Plutchik wheel of emotions was used (Plutchik, 2001). Hence, we chose different colours for each emotion as presented in Table 5. In this context, it was decided that the positive/ negative polarity of emotions be visualized using two hues (red/orange for negative, blue/green for positive), as usually done in works of visualizing sentiment analysis results (Kucher et al., 2018). The colours chosen for both polarities were assigned to the generic emotions of "bad" and "good" respectively ( Figures 4 and 6 represent the networks in terms of positivenegative emotions polarity of environmental terms and of locations respectively. Figure 5 shows the network of environmental terms and emotions for each emotion identified. Accordingly, Figure 7 shows the network of locations and emotions . Finally, Figure 8 represents the network of "Terms" and "Locations". By analyzing networks of polarity (Figs 4 and 6), it is clear that the majority of environmental terms and locations, are associated with negative polarity emotions. The most frequently extracted environmental term is "climate change". Other highly referred environmental terms are: "climate" "global warning", "greenhouse gases", "greenhouse effects", "sea level rise", "emissions" and "warming". Respectively, the most referred locations are "Alaska", "Antarctica", "Arctic", "California", "Europe", "Greenland", "UK", "U.S.", "Arctic sea" and "Australia" are the most popular locations. Figure 8 represents which terms and locations are associated to environmental terms. For example, "U.S." location is linked to "greenhouse effects", "National Climate Assessments", "global temperature" terms etc. The larger the thickness of the edge, the more times a term is linked to a location.

CONCLUSIONS AND FUTURE WORK
In this paper, we explored information extraction and sentiment analysis to identify environmental terms, locations and climate-related emotions from online articles about climate change.
Regarding information extraction, future research could focus on (a) validating the IE results of the presented approach and (b) extracting cause-and-effect relationships among different The second aim constitutes a more research-demanding endeavour and different Relation Extraction techniques may be explored, such as non-statistical techniques and statistical machine learning techniques. The former ones derive relations with the help of rules, taking into account prepositions, verbs and/or nouns, even expressions that indicate cause -effect relations (Asghar, 2016). The latter use statistical methods and especially machine learning, which takes advantage of the abundance of data available online and can be used to train algorithms so as to detect and extract semantic relationships through texts written in natural language.
In the end, it would be really interesting to compare the results of IE and SA techniques between "DecarboNet Environmental Annotator" (Gate Cloud Service) and other NLP open-source tools such as Stanford CoreNLP. The comparison of these two different software may lead to new insights.