Volume XL-3/W3
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XL-3/W3, 629-634, 2015
https://doi.org/10.5194/isprsarchives-XL-3-W3-629-2015
© Author(s) 2015. This work is distributed under
the Creative Commons Attribution 3.0 License.
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XL-3/W3, 629-634, 2015
https://doi.org/10.5194/isprsarchives-XL-3-W3-629-2015
© Author(s) 2015. This work is distributed under
the Creative Commons Attribution 3.0 License.

  03 Sep 2015

03 Sep 2015

A TOPIC MODELING BASED REPRESENTATION TO DETECT TWEET LOCATIONS. EXAMPLE OF THE EVENT ”JE SUIS CHARLIE”

M. Morchid1, D. Josselin1,2, Y. Portilla1,3, R. Dufour1, E. Altman1,3, and G. Linarès1 M. Morchid et al.
  • 1Laboratoire Informatique d’Avignon (LIA), University of Avignon, France
  • 2UMR ESPACE 7300, CNRS, UNSA, France
  • 3INRIA, B.P 93, 06902 Sophia Antipolis Cedex, France

Keywords: Tweets location, Topic modeling, Author topic model, Twitter

Abstract. Social Networks became a major actor in information propagation. Using the Twitter popular platform, mobile users post or relay messages from different locations. The tweet content, meaning and location, show how an event-such as the bursty one ”JeSuisCharlie”, happened in France in January 2015, is comprehended in different countries. This research aims at clustering the tweets according to the co-occurrence of their terms, including the country, and forecasting the probable country of a non-located tweet, knowing its content. First, we present the process of collecting a large quantity of data from the Twitter website. We finally have a set of 2,189 located tweets about “Charlie”, from the 7th to the 14th of January. We describe an original method adapted from the Author-Topic (AT) model based on the Latent Dirichlet Allocation (LDA) method. We define an homogeneous space containing both lexical content (words) and spatial information (country). During a training process on a part of the sample, we provide a set of clusters (topics) based on statistical relations between lexical and spatial terms. During a clustering task, we evaluate the method effectiveness on the rest of the sample that reaches up to 95% of good assignment. It shows that our model is pertinent to foresee tweet location after a learning process.