The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume XLVI-4/W5-2021
https://doi.org/10.5194/isprs-archives-XLVI-4-W5-2021-245-2021
https://doi.org/10.5194/isprs-archives-XLVI-4-W5-2021-245-2021
23 Dec 2021
 | 23 Dec 2021

EXTRACTING TOPICS FROM A TV CHANNEL'S FACEBOOK PAGE USING CONTEXTUALIZED DOCUMENT EMBEDDING

N. Habbat, H. Anoun, and L. Hassouni

Keywords: AraBERT, ELMO, Neural topic model, LDA, ProdLDA, Topic coherence

Abstract. Topic models extract meaningful words from text collection, allowing for a better understanding of data. However, the results are often not coherent enough, and thus harder to interpret. Adding more contextual knowledge to the model can enhance coherence. In recent years, neural network-based topic models become available, and the development level of the neural model has developed thanks to BERT-based representation. In this study, we suggest a model extract news on the Aljazeera Facebook page. Our approach combines the neural model (ProdLDA) and the Arabic Pre-training BERT transformer model (AraBERT). Therefore, the proposed model produces more expressive and consistent topics than ELMO using different topic model algorithms (ProdLDA and LDA) with 0.883 in topic coherence.