BENCHMARKING OF HIGH-RESOLUTION LAND COVER MAPS IN AFRICA

This paper addresses the issue of increased validation demands due to growth in the production of land cover (LC) maps, especially those with large coverage and high-resolution. The inter-comparison of two high-resolution LC (HRLC) maps GlobeLand30 for the year 2015 (GL302015) and S2 Prototype LC 20m map of Africa for 2016 (CCI Africa Prototype) – was done to estimate the degree to which they share the information, as this can serve as a benchmark of their accuracy. Since the two maps compared are independently classified, there is a higher probability that areas where they share information are correctly classified. CCI Africa Prototype and GL302015 have not been yet validated for whole Africa and therefore benchmark accuracy can be used to better design the validation and to make it more efficient. Based on the pixel-by-pixel comparison of GL30-2015 and CCI Africa Prototype, the error matrix and accuracy indexes (Overall, User’s and Producer’s accuracy) were derived. Overall accuracy on the continent level is estimated to be around 66%, which is not considered satisfactory. The low value of overall accuracy is mostly due to the low accuracy of classes Shrubland, Wetland, and Permanent ice and snow, as their User’s and Producer’s accuracies are below 0.4. On the opposite, benchmark accuracy is fairly high for Forest (0.68), Water bodies (0.86) and Bareland (0.93). Nevertheless, class benchmark accuracies are different from country to country, so as the Overall accuracy. Benchmark accuracy was not estimated for Cultivated, Grassland and Artificial surface classes due to the large difference between User’s and Producer’s accuracies.


INTRODUCTION
Land cover (LC) information is the key element for many models including those for climate change predictions (Bontemps et al., 2013), biodiversity and ecological processes (Pfeifer et al., 2012), natural resource management (Cui et al., 2011), etc. Capacities of satellite systems that support global LC observations are constantly improving regarding spatial resolution, revisiting time, accessibility, etc. (Belward and Skøien, 2015). Hence, the LC maps, which rely on these capacities, are improved in terms of spatial resolution, periodical updates, large spatial coverage, etc.
Thanks to the availability of quality data, the production of LC maps has increased. Consequently, the validation efforts required to confirm the reliability of the information of such maps has become larger. Every validation requires reference ("ground truth") data. The reference data can be collected on the field, by photointerpretation (Congalton, 2001), by comparison with existing higher accuracy LC maps (Strahler et al., 2006), etc .
The aim of this paper is support of the the quality assessment of two high-resolution LC (HRLC) maps in Africa by determining similarity between them which can serve as a benchmark for their accuracy. Although the similarity is not a reliable information about accuracy, it could be meaningful for delineating areas and classes in which accuracy might be low, so these areas need to be more investigated (Strahler et al., 2006). Two recent HRLC are considered for inter-comparison: GlobeLand30 for year 2015 (GL30-2015) and S2 Prototype LC 20m map of Africa for 2016 (CCI Africa Prototype). The accuracy of these HRLC has not * Corresponding author been estimated for African continent before, although accuracy of the CCI Africa prototype has been assessed for several countries (Lesiv et al., 2019). The inter-comparison relies on the assumption that the probability of the two maps to be accurate is higher for the locations where the two of them share the same information, as they have been produced independently of each other. In other words, the better they represent the reality, the more similar they will be.
The amount of information shared between the two maps -benchmark accuracy -was estimated by means of accuracy indexes based on the error (confusion) matrix -Overall accuracy (OA), Producer's accuracy (PA), User's accuracy (UA) (Congalton, 2004). OA on a continent level is estimated to be around 66%, while the same index is diverse among the countries (e.g. 28% for Djibouti and Botswana and 100% for Ma'tan al-Sarra). The results obtained for OA by our approach were compared with the official validation results in the countries in which CCI Africa Prototype validation was performed -Ivory Coast, Gabon, Kenya and South Africa. Except for the Ivory Coast, results for other countries are similar. In case of class accuracies, they were not estimated for the individual countries, but only for whole Africa and their variation in the countries. Benchmark accuracies for individual classes were determined only for the class of Forest, Water bodies and Bareland -0.68, 0.86 and 0.93 respectively. These classes were selected as their UA and PA on a continent level, as well as summary statistics of UA and PA on a country level are proportional. This indicates that two maps share the same information regarding these classes. On the opposite, for other classes' results of UA and PA are disproportionate, which indicates that the information shared between the two maps are not consistent, even if it is unknown which one of them is correct. Although inconsist-ent, UA and PA statistics of Shrubland, Wetland and Permanent ice and snow show very low agreement between GL30-2015 and CCI Africa Protype with respect to these classes. Moreover, variation of PA and UA was observed for every class, being the most evident for the Bareland class.
Inter-comparison of the two maps -CCI Africa Prototype and GL30-2015 -for year 2015 was done before, but it was limited to the country of Rwanda and focused on the analyses of disagreement patterns (Bratic et al., 2019).
The remaining paper is organized as follows. Section 2 contains the description of the HRLC datasets used in this work. In Section 3 the methodology of data processing is described. Results are reported in Section 4. Section 5 is dedicated to the discussion of the results, while conclusions are made in Section 6.

DATASETS
Two LC datasets -GL30-2015 and the CCI Africa Prototypewere compared in order to verify consistency of the information they provide. Although the former dataset has global coverage, inter-comparison was done for African continent because the later dataset is limited to that area. The analyses were done also on a country level to observe possible spatial variation. Since the original datasets are not partitioned by a country, an auxiliary vector file of African countries was used to support partitioning.

CCI Africa Prototype
CCI Africa Prototype is a HRLC prototype map at 20m resolution. It is one of the products of the Climate Change Initiative Land Cover (CCI-LC) project of European Space Agency (ESA). It is based on Sentinel-2A satellite imagery (ESA, 2015) acquired in the period from December 2015 to December 2016. To derive CCI Africa Prototype cloud-free reflectance composites were classified separately by two supervised classification algorithms. The outputs of the two classifications were fused into single LC map. CCI Africa Prototype can be downloaded and/or visualised on the official web site (ESA CCI Team, 2017) free of charge. The data is provided in WGS84 (EPSG:4326) coordinate reference system. Legend of the CCI Africa Prototype consists of 10 classes (Table 1). The legend was formed after the revision of the classification system guide for LCCS (Land Cover Classification System) and LCML (Land Cover Meta Language), and legends of other global and national LC maps (e.g. GLC-share, GlobeLand30, Africover, SERVIR-RMCD, etc.).

Code Class 1
Trees cover areas 2 Shrubs cover areas 3 Grassland 4 Cropland 5 Vegetation aquatic or regularly flooded 6 Lichen mosses / Sparse vegetation 7 Bare areas 8 Built up areas 9 Snow and/or ice 10 Open water   is the operational approach used for classification of Landsat 7 (NASA, 1999) and HJ-1 (Huan Jing-1) (NDRCC/SEPA, 2008). These data are organized in 853 tiles in UTM/WGS84 coordinate reference system. The accuracy of GL30-2010 is estimated be around 80% on average (Chen et al., 2017) at regional (Manakos et al., 2014), national (Brovelli et al., 2015, Yang et al., 2017, and subcontinental level (Jacobson et al., 2015); an exception is recorded in Central Asia where accuracy is 46% (Sun et al., 2016). The legend of the GL30 has 10 classes as shown by the Table 2.
The new version of GL30 for year 2015 (GL20-2015) has been developed with a combined image source of 30m resolution Landsat 8 (NASA, 2013) and 16m resolution China's GF1 (Gaofen-1) (HDEOS-CNSA, 2013) images. Based on the change vector analysis in posterior probability space methodology, an improved split-and-merge strategy was adopted to update previous versions of GL30. The methodology divides the ten classes into three levels. In the first level there are class types that are easy to extract automatically like Water bodies and Forest. The second level contains class types that are relatively easy to extract automatically when introducing information about regional class' characteristics such as Cultivated land and Artificial surface. The third level consists of the class types that are easy to confuse and depend more on other characteristics than spectral and textural, like Wetland. Following the sequence of the levels, each class type is split, updated and merged to the previously classified types. The procedure is repeated for each type until all of them are updated. The legend of GL30-2015 is unchanged with respect to the previous versions of GL30 ( Table 2). As of April 2020, it is not released for public access. Moreover, the accuracy of GL30-2015 has not been estimated.

Vector of African countries
Vector of the African countries is a shape file whose features represent boundaries of African countries. 55 features of this dataset were used as they coincide with our region of interest. The dataset is downloaded from openAFRICA web site (https: //open.africa/dataset).

METHODOLOGY
Derivation of accuracy metrics was based on the pixel-by-pixel comparison of the GL30-2015 and CCI Africa Prototype. For such inter-comparison it was necessary to adapt the data to have the same coordinate reference system (CRS), class values and pixel size.

Data processing
To perform the data preparation and the computation of accuracy indexes GRASS (Geographic Resources Analysis Support System) software and GDAL (Geospatial Data Abstraction Library) The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2020, 2020 XXIV ISPRS Congress (2020 edition)

Code Class
Definition 10

Cultivated land
Lands used for agriculture, horticulture and gardens, including paddy fields, irrigated and dry farmland, vegetation and fruit gardens, etc.

Forest
Lands covered with trees, with vegetation cover over 30%, including deciduous and coniferous forests, and sparse woodland with cover 10 -30%, etc.

30
Grassland Lands covered by natural grass with cover over 10%, etc.

Shrubland
Lands covered with shrubs with cover over 30%, including deciduous and evergreen shrubs, and desert steppe with cover over 10%, etc.

Wetland
Lands covered with wetland plants and water bodies, including inland marsh, lake marsh, river floodplain wetland, forest/shrub wetland, peat bogs, mangrove and salt marsh, etc.

Water bodies
Water bodies in the land area, including river, lake, reservoir, fish pond, etc.

Tundra
Lands covered by lichen, moss, hardy perennial herb and shrubs in the polar regions, including shrub tundra, herbaceous tundra, wet tundra and barren tundra, etc.

Artificial surfaces
Lands modified by human activities, including all kinds of habitation, industrial and mining area, transportation facilities, and interior urban green zones and water bodies, etc.

Bareland
Lands with vegetation cover lower than 10%, including desert, sandy fields, Gobi, bare rocks, saline and alkaline lands, etc.

Permanent snow and ice
Lands covered by permanent snow, glacier and ice cap. Table 2. Description of the GL30-2015 classes tools were used. The volume of the data to be processed was rather large -3GB for GL30-2015 and 6 GB for CCI Africa Prototype. For this reason GRASS was employed on HPC (highperformance computer) system GALILEO of CINECA (http: //www.hpc.cineca.it/) and it was automatized using Python programming language. On the opposite GDAL was used through OSGeo4W Shell (https://trac.osgeo.org/osgeo4w/) on a regular desktop computer (Intel R Core TM i5-7500 Processor, 12.0 GB Installed Physical Memory (RAM)).
Data preparation started with the reprojection of GL30-2015 from the UTM to WGS84 coordinate reference system in order to have the same coordinate reference system as CCI Africa Prototype. 138 tiles of size of 5 • latitude x 6 • longitude were reprojected by gdalwarp utility before being imported into GRASS. One may notice that the reprojection is the only operation not done in GRASS. This is due to the fact that GDAL tool takes significantly less time to reproject data.
Remaining data processing has been done in GRASS iteratively for each of the 55 countries within the region of interest. In every iteration operations were firstly limited to a country extent (r.region module). The vector representing a country boundary was used to create the mask layer to exclude the areas not belonging to a country (r.mask module). Then GL30-2015 tiles covering a country were imported (r.import module) and merged in order to have a single raster (r.patch module) for a country. On the opposite, CCI Africa Prototype was clipped to the extent of a country as it is a single dataset for the whole continent. The clipping was done while importing the dataset in GRASS. Firstly the region was bounded by a country boundaries (r.region module), and then only the portion of the dataset that is within the defined region was imported (special flag extent=region of r.import). Finally, it was reclassified to match the classes of GL30-2015 (r.reclass).
The rules for reclassification are shown in the Snow and/or Ice 100 Permanent snow and ice 10 Open Water 60 Water bodies Table 3. Link between the classes of CCI Africa Prototype and GL30-2015.
Finally, the two datasets were compared pixel-by-pixel (r.kappa) to derive the error matrix and the accuracy indexes. Note that resolution of GL30-2015 was adapted to the resolution of CCI Africa Prototype (approx. 20m) "on-the-fly", since r.region was defined according to this dataset.

Accuracy indexes
An error matrix was derived for each country to compute OA, PA and UA per country. Also, the error matrices were summed up to obtain error matrix and accuracy indexes for whole Africa.
The error matrix in this work was set up so that columns represent count of GL30-2015 pixels and rows of CCI Africa Prototype pixels. Therefore, compared to a traditional error matrix (Congalton, 2004) set up, GL30-2015 took the place of the reference data, but not also the role. Consequently, meaning of the accuracy indexes in this work is slightly changed with respect to the original and they have role of comparative metrics or benchmark, i.e. they represent to which extent the information of the two dataset of interest are in agreement.
OA is an overall metric which in this particular case shows to which extent classifications of GL30-2015 and CCI Africa Prototype are coherent. It represents the sum of all the pixels that have the same classes in the same locations in both of the maps divided by total number of pixels. PA for a class represents the number of pixels classified in that class by both of the maps with respect to the number of pixels classified in that class by GL30-2015. Similarly, UA represents the proportion of pixels classified The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2020, 2020 XXIV ISPRS Congress (2020 edition) by both maps in a certain class with respect to the total number of pixels of that class in CCI Africa Prototype.

RESULTS
The main outcomes of this work are accuracy metrics for each African country and for Africa as a whole.  Table 4 is summarized in the Table 5 using statistics like minimum, maximum, median, mean, weighted mean and standard deviation. It also contains the value of OA for whole African continent as it is mean of OA scores for individual countries weighted by the proportion of their areas in the total area of the continent. In addition, standard deviation weighted by a country area was computed and included in the table of summary statistics.
Furthermore, OA for different countries were shown by the map (Figure 1) to depict spatial distribution of OA . Since there are many countries, and UA and PA are indexes for each class, the results obtained are numerous (Appendix 1: Table 8 and Appendix 2: Table 9). Therefore, the indexes are not analyzed individually, but through summary statistics: minimum, maximum, mean, standard deviation, weighted standard deviation, and median. UA for whole Africa and PA for whole Africa are also included in the summary statistics as they are nothing else but weighted mean of these indexes. Weights for the weighted standard deviation and weighted mean are based on a class area. If a weighted statistic is related to PA, it is weighted by the class area in a country with respect to the total class area in Africa as for GL30 classification. Weights of the UA statistics are determined in a same way, but relying on CCI Africa Prototype classification.
Summary statistics of PA are shown in the Table 6. PA Africa (weighted mean), mean, median, standard deviation and weighted standard deviation are shown in the Figure 2, as they can be useful to derive conclusion about variation of the results.  Table 6. Summary statistics for PA (Class labels corresponding to the class code from the header can be seen in Table 2) Likewise, for PA, summary statistics were computed also for UA (Table 7). Again, UA Africa (UA weighted mean), mean, median, standard deviation and weighted standard deviation are plot- The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2020, 2020 XXIV ISPRS Congress (2020 edition)   Based on the previously shown results, the classes with similar UA and PA results were selected for analyses of spatial distribution. Therefore, values of PA are displayed for Bareland in Figure  4 and for Forest in Figure 5, while values of UA are shown for Water class in Figure 6.

DISCUSSION
The results discussed below refer to the benchmark accuracy of both datasets -CCI Africa Prototype and GL30-2015. It is split into two sections -Overall accuracy and Producer's and User's accuracy.

Overall accuracy
OA for whole Africa is 0.66. It means that 66% of the area of the continent is classified in the same way in the CCI Africa Prototype and GL30-2015. If this was considered as regular OA it would not satisfactory as it is below 0.85 (Thomlinson et al., The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2020, 2020 XXIV ISPRS Congress (2020 edition) OA is not equal for all African countries. It is evident from both, records of OA for individual countries (Table 4), as well as from standard deviation of these records (0.19) ( Table 5). Mean (0.62) and median (0.60) of OA in different countries are relatively close to the OA for the whole continent (0.66). In combination with standard deviation, mean and median are indicating that values of OA vary in different countries, but OA values larger than the mean and smaller than the mean are balanced in terms of occurrence and intensity. This can be also seen from the map of OA values in African countries (Figure 1). There are several countries in the north of Africa with very high values (>0.8), but also several countries in the south with very low values (<0.4) Lesiv et al. (2019) suggest that high accuracy in Gabon is related to the prevalence of Forest class in the country landscape. This could explain the high OA values obtained through the inter-comparison in Liberia and Equatorial Guinea as well. Applying the same reasoning, we can consider that high OA in the northern parts of Africa are most probably due to the prevalence of Bareland class. Moreover, Bareland in this area consists mostly of sand which has specific spectral signature in satellite imagery that simplifies its discrimination from other classes during classification.

Producer's and User's accuracy
The interpretation of the results for UA and PA is not straightforward as for OA. UA depends on the classification of CCI Africa Prototype map, while PA depends on the classification of GL30-2015 map. As the validation of these maps has not been done for Africa, their reliability is unknown, and consequently the indexes depending on one or the other map are not reliable. For this reason we considered that when the two indexes have similar values for the same class, it can be an indicator that the two compared maps show consistent information regarding the considered class. Thus, for classes with similar UA and PA it is possible to make conclusion about benchmark accuracy. Since we analysed only summary statistics of UA and PA, and not individual countries' records, only benchmark accuracy on a continental level and variation of values among the countries were observed.
Comparison of PA (Table 6 and Figure 2) and UA (Table 7 and Figure 3) in terms of similarity of their summary statistics showed that they are similar in case of Forest, Water bodies and Bareland classes. Benchmark accuracy of these classes is average of their UA and PA for Africa (UA and PA weighted mean) since these two indexes are not equal. More specifically, Forest benchmark accuracy is 0.68, for Water bodies 0.86, and for Bareland 0.93.
The standard deviation showed that class benchmark accuracies can vary largely from country to country, but significantly less variation is observed for Forest and Water bodies class, than for the class of Bareland. For all three classes weighted standard deviation is smaller than regular standard deviation, which indicates that the class accuracy is affected by its abundance in countries. The effect of class abundance is also evident from comparison of mean and weighted mean ratio in each class. Weighted mean is larger than the mean, which indicates that higher accuracies are concentrated in the countries where the area of a class is larger.
In particular, effect of class abundance is significant in case of the Bareland one. This is confirmed by Figure 4 where PA of Bareland is shown for every African country. It is evident that values of PA are either above 0.8 or below 0.2. Just few countries have PA for Bareland in between 0.2 and 0.8. High values are mostly related to the countries in the region of Sahara desert where it is mostly dominant the LC class in the countries. The composition of Bareland class is probably another factor determining its PA and UA values. In the north it is mostly pure sand, while in the southern parts it can also include bare soil with small amount of vegetation that is harder to distinguish from other classes. In addition, while matching classes of CCI Africa Prototype and GL30-2015, we included class Lichen mosses / Sparse vegetation to Bareland class which may add up to the lower values of its PA and UA. According to summary statistics, the variation of accuracy for Forest and Water bodies is not as extreme as the one for Bareland. It is exemplified in Figure 5 for Forest through PA values per country, while for the Water bodies UA values per country are shown in Figure 6. The choice of showing UA or PA index was arbitrary, since they are relatively similar for every of the three classes in the context of spatial variation. Values of PA for Forest have wide range, but they are equally distributed among countries. The variation of UA for Water bodies is evident, but slightly less than in case of Forest PA, and the accuracy score is on average higher than for the Forest.
Statistics for Shrubland and Wetland class are not consistent as for the previously mentioned classes, thus exact number for benchmark accuracy cannot be expressed. In any case, the weighted mean, mean and median values of these classes are below 0.4, which is far below the threshold suggested by Thomlinson et al. (1999) of minimum 0.7. Thus, the agreement between CCI Africa Prototype and GL30-2015 regarding these classes is not satisfying.
The class Permanent snow and ice is in an unusual situation because its summary statistics of UA and PA are equal as in case of Forest, Shrubland and Bareland, but they are equal to zero. That is because this class is absent from the CCI Africa Prototype, while it exist in GL30-2015. It means that it is either wrongly excluded from CCI Africa Prototype, or wrongly included in GL30-2015. As there are some high mountain peaks in Africa where permanent ice and snow are expected to be found, there is a high probability that this is an omission error in CCI Africa Prototype.

CONCLUSIONS
The trend in LC maps production is growing, which also has effect on increase of validation exercises. The main objective of our work is to compare two HRLC maps in Africa in order to benchmark areas and/or classes with high probability to be correct or incorrect. The outcomes could be useful to better structure the validation and thus reduce validation efforts by focusing more on the areas and/or classes with high probability to be erroneous, and less to the areas where accuracy is potentially higher. The datasets considered in this work are CCI Africa Prototype and The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2020, 2020 XXIV ISPRS Congress (2020 edition) GL30-2015. The former map has continental coverage, while the latter has global coverage. The inter-comparison was bounded by the area of the smaller dataset. Nevertheless, the area concerned is African continent, which makes the area of interest rather large. This amplifies the importance of a proper validation structuring to obtain best results with minimum possible costs.
The two maps -CCI Africa Prototype and GL30 -were involved into pixel by pixel inter-comparison, based on which error matrix and accuracy indexes were derived. The computed indexes are OA, PA and UA, and they were computed on a continent level, as well as for countries of the continent. The main tools for inter-comparison were Free and Open Source Software (FOSS): GDAL, GRASS and Python. The data were handled on HPC GALILEO as their volume was large.
The obtained OA values represent amount of adherence of information contained in CCI Africa Prototype and in GL30-2015, and they were ranging from 0.28 in Djibouti and Botswana to 1.00 in Ma'tan al-Sarra. OA estimated for whole Africa was 0.66 and it is considered as benchmark accuracy of both CCI Africa Prototype and GL30-2015. These results were compared with the results of validation of CCI in several African countries by IIASA, and the results were similar in three out of four countries, which suggests that this type of inter-comparison might be useful as preliminary accuracy assessment for both of the maps. UA and PA were not analysed in individual countries, but their accuracy on the continent level was analyzed in addition to the variation of scores in individual countries. Also, only benchmark accuracy of classes for which UA and PA were similar was delineated. These classes are Forest, Water Bodies and Bareland and all of them had satisfactory benchmark accuracy (0.68, 0.86, and 0.93 respectively). On the opposite, it has been observed that the accuracy of Shrubland, Wetland and Permanent ice and snow is low, although value of benchmark accuracy was not possible to be derived. For Cultivated land, Grassland and Artificial surface no conclusion about accuracy were made due to difference in the UA and PA scores.
The results might be useful feedback for producers of CCI Africa Prototype and GL30-2015 as it highlights the classes and countries in which there is high probability to have errors. Similarly, users of these HRLC maps can understand in which countries or for which classes it is reliable to use these maps as they are now.     Table 9. UA in African country (Class labels corresponding to the class code from the header can be seen in Table 2)