ISSUES OF GEOGRAPHIC INFORMATION SYSTEMS AND THEMATIC MAPPING APPLICATION TO ANALYSIS OF EPIDEMIOLOGICAL SITUATION IN LARGE CITIES

The paper we summarizes and discusses experience of medical statistics data processing and mapping, gained in 2019-2020 in the frames of study devoted to the tuberculosis infection mapping. Basing on this experience, we have formalised a set of research issues, which were elaborated and clarified at the previous stages but demand additional investigations. Additionally, the paper summarises results of design and prototyping of a Web mapping interface implemented as a part of developed medical Geographic Information System (GIS). Developed GIS is aimed onto mapping and analysis of tuberculosis infection data. The overall structure of elaborated GIS is covered also with respect to the detected research issues.


INTRODUCTION
More than 144 million people have been infected worldwide and more than 3 million of them have died (as of mid of April 2021) during Coronavirus (SARS-CoV-2) pandemic of 2019-2021 (accordingly to the Johns Hopkins University -https://origincoronavirus.jhu.edu). The pandemic highlights the need of geospatial analysis implementation (Franch-Pardo et al., 2020) to observe, account and spread prevention epidemic development.
While people mobility appears higher and higher in recent decades, monitoring and study of infection spread is applicable (and have to be applied) on different scales from global to city scale. Obviously, the geospatial analysis and mapping of the infection spread and its spatial dynamics (Chistobayev, Semenova, 2013;Gatrell, Bailey, 1996;Gordon, Womersley, 1997;Mayer, 1983;Wei et al., 2020) appear relevant and significant supporting technologies in such a context. In 2019-2021 we conducted a set of case studies  in the domain of GIS-based processing and mapping of infectious diseases statistics data. Conducted studies were aimed onto elaboration of approaches to support of medical administrating and planning of infectious diseases accounting and controlling. The studies were performed on the example of St. Petersburg city located in the North-West of Russia (second largest city in Russia). Medical statistics data were obtained from medical administration of the city. The data were used in depersonalized form accordingly to the Russian law on personal data (Federal law, 2006).

METHODOLOGY
Within the study framework, we applied GIS software to support and automate structuring, processing, mapping and analysis of initial medical statistics data collected in St. Petersburg (by city tuberculosis dispensaries) on socially valuable diseases during last decades. The list of the socially valuable (in Russia) diseases is established by Russian Government order (Government of the * Corresponding author Russian Federation, 2004) and incorporates 16 items (including SARS-CoV-2, cholera, malaria, etc.). In our studies, we operated with medical statistics data on tuberculosis (that were main object of interest), hepatitis and human immunodeficiency virus (HIV). During 2019-2020 we designed a methodology for preprocessing of initial medical statistics data, its geocoding and representation in geospatial database implemented in QGIS (https://qgis.org) open source desktop Geographic Information System. Basic onthe-map visualisation techniques were discovered also . Being focussing onto support of infectious diseases monitoring and analysis made by medical professionals and administrators, now we are providing an effort to implement Web-based GIS interface to operate with collected data and maps. Taking into account that operation with GISs and maps is not standard competence of medicals and administrators, we discover Web interface as maximally user friendly and intuitive. So, the Web interface have to eliminate interdisciplinary barrier for medical colleagues, while desktop GIS have to remain as an interface for involved GIS professionals. Currently we are working on Web interface prototyping. The interface is developed accordingly to the dashboard development methodology (Dong et al., 2020;Wei et al., 2020;Martorell-Marugán et al., 2021) that appears popular in the Coronavirus study support projects. This methodology assumes complex but intuitively perceptible visualisation of medical statistics data using maps and infographics. To illustrate the approach, such worldwide famous examples as Johns Hopkins University COVID-19 dashboard (https://coronavirus.jhu.edu/map.html - Figure 1) and the dashboard of the Regional Office for Europe of the World Helth Organisation (https://who.maps.arcgis.com/apps/opsdashboard/index.html#/a 19d5d1f86ee4d99b013eed5f637232d) can be mentioned, as well as in-Russia famous Yandex (Russian IT-giant) COVID-19 dashboard (https://datalens.yandex/7o7is1q6ikh23 - Figure 2).

EPIDEMIOLOGICAL MAPPING ISSUES
While the GIS-based automation of data processing and mapping is applied, we faced also a number of issues connected with the data character we are dealing with. First of most significant issues we detected is the postal addresses geocoding accuracy. We developed the GeoMedica geocoder application (built upon OpenStreetMap Nominatim geocoding engine) to provide mapping of initial medical statistics data georeferred by postal addresses . Geocoding accuracy we gained was above 78% (Figure 3), which is estimated as an appropriate by medical professionals involved into the project. However, the geocoding errors possible to be corrected to gain this accuracy were corrected in manual mode that is very time consuming. Nevertheless, a part of geocoding errors registered during the geocoding of initial medical statistics data cannot be resolved even in manual mode due to the fundamental incorrectness of postal address (absence of a part of address record or fictitious address).     (Federal law, 2020) states that informing of local authorities and citizens on the activities provided against the biological hazards (including diseases fighting) is the prerogative of regional authorities; while the same law establishes the need of scientific activities organization in the domain of biological safety. Moreover, the president decree (President of the Russian Federation, 2019) denotes the need of geographic information systems development to support rapid response to biological threats.

RESULTS
The data we are operating in GIS are derived through geocoding process from initial medical statistics data collected by city tuberculosis dispensaries (each serves one of administrative districts of the city). All database records related to administrative districts through the geocoded postal address and equipped with time marks marking a period of infection case observation. These features of the database give possibility to estimate and map 9 parameters of the tuberculosis infection spread, which are used by city tuberculosis dispensary when providing operational activities: 1. The population amount in the administrative district (at the end of the calendar year), formed from the Russian federal statistics agency (Rosstat) 2. The tuberculosis infection rate (a relative indicator that reflects a number of newly detected tuberculosis infection cases per 100,000 citizens; the cases newly detected during the treatment or after the death of the patient are accounted) 3. The tuberculosis infection spread rate (a relative indicator that reflects a number of tuberculosis infection cases under the supervision of dispensary medicals per 100,000 citizens) 4. The death rate due to the tuberculosis infection (a relative indicator that reflects the number of death cases due to the tuberculosis infection per 100,000 citizens) 5. The number of newly detected cases of tuberculosis infection 6. The number of citizens detected as newly infected by tuberculosis (including posthumous cases) 7. The number of citizens detected as infected by tuberculosis and HIV 8. The number of patients with low drug resistance 9. The epidemiological burden (the rank of epidemiological danger assigned of the observed territory, it is formed on the basis of the ranks of all the above mentioned indicators The geospatial database is operated in QGIS by GIS professional and used basically for mapping purposes to visualise infection spread and dynamics by administrative districts in selected time period (Figure 6, 7).  To ensure Web compatibility of the formed geospatial database we attracted NextGIS cloud platform (https://nextgis.com) that enables to visualise maps and separate map layers and publish visualised maps on the Web. In our study, NextGIS has established itself as an excellent QGIS compatible tool for Web GIS (Web component of elaborated GIS) development and implementation. NextGIS is a Russian commercial company that builds its business in the geospatial field upon the open source software, data and methods. The company appears also as one of QGIS contributors, that is why the NextGIS software is QGIS compatible and can be easily integrated in the frames of our research. Previously, we implemented a pilot Web mapping project using the data collected for the Moskovsky administrative district of St. Petersburg . It was entitled as FEBRIS GIS. The Web map incorporated not only data about tuberculosis infection cased in the Moskovsky district area, but also information about subdivision of the district, and on tuberculosis clinics (Figure 8).
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2021 XXIV ISPRS Congress (2021 edition)

Figure 8. NextGIS/FEBRIS GIS Web interface.
Due to the issue of analysis methodology development and redesign (mentioned in the section 3 of the paper), we selected a dashboard approach for the Web representation of operated data at next stage of our study. To ensure Web representation of related information in addition to map layers itself we used Tilda (https://tilda.cc) Web platform and elaborated a prototype of FEBRIS GIS Web site. The Tilda is a commercial platform that provides a virtual Web hosting and WISIWIG Web site constructor. It was selected among similar platforms in result of comparison of the available functionality and service costs. The Web site was determined to be having 5 sections (pages) to represent all the necessary information about the project. Since the site editor is block-based, we used blocks for each section. Significant feature of the Tilda is a single-page layout of developed Web sites (as Tilda was developed for small companies and landing Web pages used to collect contact information, but not for full-fledged Web sites). Nevertheless, Tilda provides zero blocks use (empty blocks allowing to design a page from scratch, and to embed necessary external components into the page), has convenient technical documentation, and equipped with easily customizable Search Engine Optimization tools. These were the pro arguments to select it as a Website engine. With respect to new Web interface implementation, data representation chain in developed GIS has became composed of desktop preprocessing and loading into database, uploading to the Web GIS and Web maps visualisation, visualisation and representation of additional data on the Web site (http://febrisgis.ru). Accordingly to the dashboard paradigm, the additional data incorporates lists, tables, diagrams and detailed map legends. However, being at the prototyping stage currently we implemented static representation of all the additional data to elaborate approach to data composition design (Figure 9). Map visualisation is integrated into the Web page as an external resource (derived from NextGIS/FEBRIS GIS).

Figure 9.
Example of data representation on the febrisgis.ru.
It was established that first four of above mentioned indicators will be presented on the Web site currently due to the legislation issue (population amount, tuberculosis infection rate, tuberculosis infection spread rate, tuberculosis infection death rate). These indicators also can be perceived easily with minimal explanations by not qualified user. Visualisation of remaining indicators is available currently on the NextGIS/FEBRIS GIS Web interface. So the FEBRIS GIS application domain list is composed mostly of medical professional activities, and medicals are recognised as able to discover all the available data to resolve professional aims ( Figure 10). Ordinary user can discover basic map visualisations on prototyped Web interface. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2021 XXIV ISPRS Congress (2021 edition)

CONCLUSIONS
As a current result of the work, the Web site was developed that is incorporated as a part of developed Geographic Information System. With the help of this part, it became possible to obtain spatial information about the epidemic situation in St. Petersburg in easy perceptible interface. The main requirements for the data visualisation were met successfully: 1. The Web interface allows to assess the overall epidemic situation in the city using thematic maps (map layers) and additional data 2. The Web interface provides the user with the necessary information about the essence of the conducted (our) research project Next project stages were elaborated: 1. The initial data on the epidemic situation in the city were processed, all the data is provided in an anonymized aggregated form 2. Map layers representing medical statistics data have been created 3. NextGIS-based Web GIS was developed, filled with data, and configured 4. A Web interface has been prototyped that allows to visualise elaborated maps on the Web alongside the descriptive information and other accompanying resources 5. The project Web site was published on the Web under the permanent URL -http://febrisgis.ru Development of similar Web interfaces generally contributes to the expansion of knowledge about socially valuable diseases, allows to support fight against the diseases. Generally, the system (Web interface) is expected to be integrated into the Web site of the relevant medical organization. The software selected as a basic in our study makes it possible. Developed system appears as an excellent basis for interdepartmental cooperation, and allows building a high-quality dialogue between society and the government.
Basing on conducted work and on analysis of gained results we are able to formalize conclusions on currently needed additional elaborations in discovered domain. These conclusions form our research plan for nearest feature, and can be used also by other domain-involved researchers to detect research aims.
In the plane of geocoding facilities, we can pose that it is needed to avoid extra conversions of the initial medical data. The key task is the elimination of intermediaries when transferring information from the medical databases to the map. It can be realised through the integration of the GIS subsystem into medical data management system, and through the application of the GIS subsystem to the postal addresses control on initial stages of medical statistics data formation.
In the plane of methodology development, we can conclude that it is needed to explore new metrics for the spatial distribution of infectious diseases to represent this phenomenon as multidimensional, while location only data becomes not enough to estimate spatial pattern of infection. For example, implementation of such parameter as apartment type in geospatial database probably will ensure wider capabilities for spatial patterns allocation. Finally, in the legislation plane we can pose the need to propose mapping products produced in result our studies as a material for national-level discussion on medical maps standardization and(or) regulation in case of need.