REMOTE SENSING AND MODELING TOOLS EXPLORATION FOR HABITAT DELIMITATION OF LEISHMANIASIS TRANSMITTING VECTORS

Leishmaniasis encompasses a group of vector-borne parasitic diseases, characterized by their diversity and complexity, that affect both humans and other vertebrates. They are caused by different species of parasites of the Leishmania genus, which are transmitted by bites from hematophagous female sandflies. This work proposed to model the occurrence probability of five sandflies species of sanitary interest for South America, from a bibliographic compilation of records of the last 10 years. To develop the model, the free software MaxEnt was used. This exploratory analysis made it possible to visualize the areas where the species are distributed. In addition, we analyzed land changes in vegetation around a town in Jujuy province, Argentina, where a leishmaniasis outbreak occurred during the years 2017 and 2018. For this, Sentinel-2 images were used, and a change vector was calculated for the difference between two dates of the Normalized Difference Vegetation Index (NDVI). This part of the work was made using SNAP software for images pre-procesing, Python for the change vector obtention and QGIS for the result post-procesing. From the exploration of MaxEnt software we were able to know the most suitable places for the distribution of the most important five species in the study region, and therefore, to project future decision-making to prevent and control leishmaniasis transmission. And in turn, obtain an approximation of how anthropogenic activities, as deforestation, can have an influence on leishmaniasis specific outbreaks transmitted by these species. Finally, from the exploration of the different tools used in this work, the importance of validation with field data for the generation of accurate analyses and predictions is highlighted. It implies that more data collection is necessary to validate the models and analyzes generated, to guarantee the contribution of the tools in macro-ecological studies of species linked to disease transmission.


INTRODUCTION
Leishmaniasis encompasses a group of vector-borne parasitic diseases, characterized by their diversity and complexity, that affect both humans and other vertebrates. They are caused by different species of parasites of the Leishmania genus, which are transmitted by bites from hematophagous female sandflies, of the genus Phlebotomus in Africa, Asia and Europe, and Lutzomyia in America (Okwor et al., 2012) (WHO, 2010). The World Health Organization includes leishmaniasis in the category of re-emerging and uncontrolled diseases (WHO, 2010), and it constitutes a growing public health problem in the world, due to the increase in the number of people affected as a result of their greater exposure to disease vectors. In the last decade, the use of algorithms to model ecological niches has been increasing in different applications (Quintana et al., 2013). The models are useful in landscape epidemiology to obtain an approximation of the ecological niche expansion of diseases species vectors in diverse environmental conditions for their development in areas where their presence/absence cannot be confirmed at the moment. At the same time, knowing these conditions allows a better categorization of the risk associated with the transmission of the disease and the implementation of measures for its prevention and control (Quintana et al., 2013, Meneguzzi et al., 2016. MaxEnt is a software very popular in the modeling of environmental niches to know the potential occurrence of species (Phillips andDudík, 2008, Merow et al., 2013). This software uses the maximum entropy principle that instructs to choose among all appropriate distributions, the one with the maximum entropy value (Wachtel et al., 2018). It involves the use of sampling data of species locations, to-gether with relevant environmental variables to model the potential distribution of the species over an larger geographic extent (Muttaqin et al., 2019). Cutaneous leishmaniasis is associated with changes in land use in tropical areas, where humans intervene in nature through deforestation and other activities (Quintana et al., 2013, Germano et al., 2019, and link the wild and the domestic cycles of the disease, expanding it (Olivera Mesa et al., 2017). In this sense, remote sensing-based methods have proven to be an effective tool for detecting changes in land use and in the physical environment, providing a comprehensive view of the spatio-temporal dynamics of land cover and land use patterns (Treitz and Rogan, 2004). In short, geomatics techniques help to detect changes in land use in a timely and accurate manner, and to understand the relationship and interaction between humans and natural phenomena (López Granados et al., 2001, Eastman, 2001, Lu et al., 2004, which allows the necessary measures to be taken to counter negative impacts. In this context, in this work it was proposed to model the occurrence probability of five sandflies species of sanitary interest for South America, using a bibliographic compilation of sanflieds records from the last 10 years. In addition, we analyzed the changes that occurred in the vegetation around a town in the province of Jujuy, Argentina, where a leishmaniasis outbreak occurred during the 2017 and 2018.

Study area
The occurrence probability models were generated for South The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W2-2021FOSS4G 2021-Academic Track, 27 September-2 October 2021 America, due to the availability of species presence data in different countries. However, the study area of this work corresponds to Argentina, more precisely, to the North and Central region of the country (Figure 1) where the sandflies species considered in this study have been detected, as well as leishmaniasis cases (Salomón et al., 2011). This region is characterized by a heterogeneous relief, with areas of plains, mountains and rivers, and abundant precipitations. Some environmental problems are observed, as deforestation and flooding (Link). Also, the first Argentinean cases of cutaneous leishmaniasis were reported in this region in the 20th century and since then, epidemic foci are registered in the transmission area (Salomón et al., 2016). In addition, an area of particular interest was chosen around the town of Caimancito (23°44'28" South, 64°35'33" West) where there was a leishmaniasis outbreak in 2017-2018. Caimancito is a town in the southeast of the Argentine province of Jujuy, in the north of the country, and this area was selected to analyze and try to detect land cover changes that could be associated with the leishmaniasis outbreak.  , 1996) . The Chelsa climatic variables have a spatial resolution of 30 arc seconds (0.93 × 0.93 = 0.86 km 2 , ∼ 1 km in Ecuador), are freely accessible and were generated from the interpolation of climatic data for the period 1979-2013 (Phillips and Dudík, 2008). The temperature layers are in units of • C * 10, precipitation layers in millimetres (per year, month, or quarter, depending on the variable), the seasonality of temperature (BIO4) is represented by standard deviations, the seasonality of precipitation (BIO15) by a coefficient of variation, and the isothermal layer (BIO3) is dimensionless. The GTOPO model covers all of South America, its units are meters and it also has a spatial resolution of 30 arc seconds. This last layer was processed to match the climate data layers in size.

2.2.2
Remote sensing data: To analyze the land cover changes around Caimancito town, a change vector analysis was performed. Sentinel-2 images (10 m resolution) (ESA, 2015) were used, and were obtained through the Copernicus platform (Copernicus Open Acces Hub). The images were pre-processed using SNAP software at BOA reflectance. The dates of the images were December 30, 2015 (pre-leishmaniasis outbreak) and December 14, 2018 (post-leishmaniasis outbreak). For both images, NDVI was calculated using 4 and 8 bands. The study area had a surface of approximately 5834 km 2 , and Caimancito is placed almost in the middle ( Figure 4).   (Phillips and Dudík, 2008) was used. The software is based on presence data (or presence-absence) of the species, and estimates a occurrence probability distribution. The software must have high-resolution climate data that allow determining whether or not an area may be conducive to a species developing (Meneguzzi et al., 2016). Considering the work of Merrow and collaborators (Merow et al., 2013), and as it is an exploratory work on the occurrence of the selected sandfly species, most of the parameters that must be configured in the software were left by default. A logistic type output was requested, product characteristics (in Product features section) were excluded, 10,000 pseudo-absence samples were generated by default, and cross-validation was requested, considering ten replications.
For the analysis of land cover changes related to the leishmaniasis outbreak, the change detection technique by the images difference method was used. The NDVI difference of both Sentinel images (pre and post leishmaniasis outbreak) was used to detect changes in land cover, such as deforestation. Also, a time series analysis using NDVI from MODIS images (MOD13Q1) (USGS, 2000) was carried out with a non-parametric method of local regression, LOESS (locally estimated scatterplot smoothing). This method allows a more smoothly time series visualization and the detection of drops or peaks in variables. The MOD13Q1 series were obtained from the AppEEARS website.
This analysis was performed in an area where the change vector detected an NDVI decrease related to deforestation and was compared with a time series of an area without land change (nodeforestation). The change vector technique was performed using Python language on Google Colab, and the post-processing was made with QGIS 3.16.4. The LOESS local regression was performed using R software, 4.1 version.

Occurrence probability models
The occurrence probability maps obtained as a result of the MaxEnt modelling are shown in Figure 2. It can be observed that for four of the analyzed species, Evandromyia cortelezziisallesi, Migonemyia migonei, Nyssomyia neivai and Lutzomyia longipalpis, the occurrence probability was distributed in the northern region of Argentina. The first three shown a higher probability of occurrence in the centre and northwest, while L. longipalpis is focused in the northeast. On the other hand, it is observed that Nyssomyia whitmani has a low occurrence probability in Argentina, with values of 0.1 and 0.2 in the northeast. However, a higher occurrence is detected in the rest of South America, particularly in Brazil, where more records were obtained.  It was also observed that the variables of wettest quarter mean temperature (BIO8), warmest quarter precipitation (BIO18), coldest month minimum temperature (BIO6) and the mean diurnal range (BIO7) shown important contributions in the models. The models of Nyssomyia neivai and the Evandromyia cortelezzii-sallesi complex had a greater contribution from the variables BIO4, in the first place, and BIO18, in the second. As mentioned above, Lutzomyia longipalpis had the highest contribution from elevation and BIO4, in that order. Nyssomyia whitmani had the highest contribution from BIO4 and in second place from the mean diurnal range (BIO7), and finally, Migonemyia migonei, in addition to BIO4, presented in second place at the coldest month minimum temperature (BIO6). On the other hand, some variables such as isothermality (BIO3), the coldest quarter mean temperature (BIO11), wettest month precipitation (BIO13) and seasonal precipitation (BIO15) did not contribute to the models for any of the five species.

Models evaluation:
To evaluate the MaxEnt results the sensitivity vs. 1-specificity analysis (Figure 3) was considerer. It allowed determining the global precision level of the occurrence probability model for each species through the ROC (receptor-operator characteristic) cross-validation curve and the average AUC (area under the curve) values. The AUC is a threshold that measures the probability that a random sample from a presence place (presence of the species) will be ranked higher than a randomly chosen point (pseudo-absence) (Merow et al., 2013). The highest precision was obtained for the species Nyssomyia neivai with an AUC value of 0.987 and a standard deviation of 0.008, and the species with the lowest AUC value was Nyssomyia whitmani, with 0.858 and a standard deviation of 0.015. The models for the five species presented acceptable to high performance according to this metric (West et al., 2016). In turn, Lutzomyia longipalpis presented the highest variability between the models (0.234), as it is shown in Figure 3.

Models validation:
Using presence data models, it is possible to calculate the proportion of observed occurrences correctly predicted through the sensitivity statistic (Figure 3) or "fraction of true positives". Also, the statistic called the omission rate (1 -sensitivity) indicates the "fraction of false positives". The sum of both measurements is equal to unity. The closer the sensitivity to 1 (1 -skip rate) indicates the model is correctly identifying true positives and ruling out false positives. The sensitivity values are high in all the models, except for Nyssomyia whitmani, which reaches a high sensitivity value at 0.5 fractional prediction area. Also, the prediction/omission graph (not shown here) for this species shown a high cumulative threshold that indicates a greater omission in the prediction area. It can be observed in the Nyssomyia whitmani model in Argentina (Figure 2). In the north of the country, points of presence of the specie were detected in the eastern part, but the occurrence probability assigned by the model was very low. In this case the true positives are being missed.

Change detection analysis
To detect the most important changes that occurred between the two dates (December 2015 and December 2018), the mean and n-times standard deviation values of the NDVI difference result were considered as a threshold value. Different values were tested for the coefficient n, generating different possible thresholds to delimit the increase and decrease of the index. The value n = 1.75 was the best to define the change threshold, allowing changes detection between 2015 and 2018.    3.2.1 Temporal analysis: In Figures 5 and 6, the result of the non-parametric LOESS local regression method is shown. When applying the smoothing LOESS for the period 2010-2018, it can be observed that the trend for the deforested area ( Figure 5) indicates a decrease in NDVI values over the end of the time serie, which is not observed for the no-deforested area ( Figure 6). However, this methodology does not seem to be the most appropriate to identify this type of event.

CONCLUSIONS
Given the importance of leishmaniasis for public health in the world, and in particular, in our region, we consider the distribution maps of the five most important phlebotomine sandfly species in Argentina to be a relevant and necessary contribution. The distribution characterisation of these species is useful in determining the potential risks of vector-borne diseases. According to the results obtained, L. longipalpis is mainly distributed in the northeastern region of the country, being the main vector of visceral leishmaniasis (VL) (Pires et al., 2017), coinciding with most of the case reports for VL since the first detected case (Gould et al., 2013). In contrast, the E. cortelezzii-sallesi complex, M. migonei and N. neivai are distributed in central and northwestern Argentina and are considered to be the main vectors of cutaneous leishmaniasis. About the use of MaxEnt, it can be recommended to implement some other metrics, apart from AUC, to evaluate the model, since AUC serves as a measure of evaluation in terms of sensitivity but not in terms of specificity (Merow et al., 2013) because the specificity evaluation relies on the background points not in the real absences. This type of models tends to overestimate the commission error rate and results either in models with a large distribution probability but a low AUC, which would indicate a bad model, or models with a high AUC and a high distribution probability, but restricted to a small region ("inflated model") (Yackulic et al., 2013) (Lobo et al., 2008). This complication can be corrected by relying on absence data. However, such data are often not available, especially when working in a very large region. On the other hand, this work confirms what has been shown in previous work, MaxEnt works best with small samples. The N. whitmani model, with the largest amount of data, presented the worst metrics values. Although some papers implement environmental variables other than those used here, such as NDVI, EVI, LST (Rodgers et al., 2019), climatic variables make the greatest contribution to the modelling of species occurrence. However, it may be that the implementation of elevation and, as observed in a study carried out in Brazil (Meneguzzi et al., 2016), slope, may allow better modelling of the occurrence of certain species, especially in areas with high variability in relief.
On the other hand, the local analysis of land cover change detection in a specific region where a disease outbreak occurred allowed the exploration and application of different remote sensing tools, such as Sentinel-2 images for change vector and MODIS products for time series analysis. The analysis for the Caimancito town (Jujuy) allowed us to associate a disease outbreak between 2017 and 2018 with a decrease in the NDVI, as a result of a possible deforestation event and agricultural advance. However, this result has to be improved with local information of the movement and activities of people from Caimancito, to know if really the outbreak was related to the land change detected. Also, these same techniques can be implemented in other software that provides more tools to optimise the process, with the possibility of using images from different suppliers and with different resolutions that allow better detection of land cover types. The time series analysis provided not to be a good tool for this kind of detection in a tropical area. Finally, from the exploration of the different tools used, the importance of validation with field data for the generation of accurate analyses and predictions is highlighted. It implies that more data collection is necessary to validate the models and analyzes generated, to guarantee the contribution of the tools in macro-ecological studies of species linked to disease transmission.