The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
Download
Publications Copernicus
Download
Citation
Articles | Volume XLIII-B4-2022
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B4-2022, 91–98, 2022
https://doi.org/10.5194/isprs-archives-XLIII-B4-2022-91-2022
Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XLIII-B4-2022, 91–98, 2022
https://doi.org/10.5194/isprs-archives-XLIII-B4-2022-91-2022
 
01 Jun 2022
01 Jun 2022

TOWARDS AN OPEN SOURCE PYTHON LIBRARY FOR AUTOMATED EXPLORATORY SPATIAL DATA ANALYSIS

N. de Kock1, V. Rautenbach1, and I. Fabris-Rotelli2 N. de Kock et al.
  • 1Department of Geography, Geoinformatics and Meteorology, University of Pretoria, Pretoria, South Africa
  • 2Department of Statistics, University of Pretoria, Pretoria, South Africa

Keywords: Open Source, Python library, Spatial Statistics, ESDA

Abstract. The exploratory spatial data analysis (ESDA) process refers to the use of various functions to gain an initial understanding of a spatial dataset. These include measures of spatial heterogeneity and spatial autocorrelation. Currently, the ESDA process is repetitive and time-consuming. Additionally, while different results arise for different datasets, how these results are generated does not change significantly. Results are also generated individually for each variable which means that they cannot be easily compared or shared.

The automation of the ESDA process would therefore have multiple benefits as it would not only save time, but it would also allow the data analyst to keep up with the rapid rate at which we generate data. This paper aims to introduce the first iteration of autoESDA – a Python library capable of automating the ESDA process by summarising the results into a single report.

In this paper, we present the defined high-level requirements for the implementation of autoESDA. Various dependency libraries are discussed and a high-level overview of the workflow of autoESDA is described. The library is then evaluated against the requirements laid out earlier in the study. Semi-structured interviews were carried out, which yielded a wealth of feedback and suggestions from the participants, describing how the output report could be improved. Finally, a roadmap of proposed further developments and improvements is discussed.

The first version demonstrates that the automation of ESDA is possible and lays the foundation for further development in this regard. This is an important contribution to understanding spatial data as it enables the data analyst to keep up with the magnitude of data that is generated on a daily basis.