DECISIONAL TREE MODELS FOR LAND COVER MAPPING AND CHANGE DETECTION BASED ON PHENOLOGICAL BEHAVIORS. APPLICATION CASE: LOCALIZATION OF NON-FULLY-EXPLOITED AGRICULTURAL SURFACES IN THE EASTERN PART OF THE HAOUZ PLAIN IN THE SEMI-ARID CENTRAL MOROCCO

Great effort has been recently employed for the development of a modern and competitive agriculture in Morocco, growth in the agricultural sector is determined largely through the realization of thousands of new projects, and the support of the smallholder farmers at a national scale. Modernization of irrigation systems, and enlargement of the extent and spatial distribution of irrigated areas holds the key to increase annual productions. In this context, we established a unique procedure for monitoring the agricultural surfaces not fully exploited in terms of potential and production, in the semi-arid zone of the Haouz plain, central Morocco. We derived Normalized Difference Vegetation Index (NDVI) time series from Sentinel-2 (S2) and Landsat 8 (L8) high spatial resolution satellite images from 2016 to 2018. Seasonal phenological changes and land-cover dynamics, in addition to elevation models and landscape slopes, helped determine periods and thresholds suitable for classes separability, and establish a set of rules to be implemented in a Decision tree classifier model for a detailed land-cover mapping of the last three years. The agricultural zone was successfully separated from mountains and hills, and the derived maps of the three years yielded satisfying result with an OAthat reached above 91% for quite detailed landscape-type information. The outputs of this work hold promise to provide valuable information for planners, decision-makers and regional offices, to help smallholder farmers. Although this approach has been developed at regional-scale, it holds the potential to be adapted to larger scales, with the appropriate selection of land-cover types, and carful adjustment in the threshold values.


INTRODUCTION
The agricultural sector in Morocco is one of the main centers of the economy. It is characterized by a great annual and spatial variability, as it follows strong seasonal lifecycles, as well as unstable climatic conditions and diverse agricultural management practices. Irrigated agriculture occupies only about 15% of the overall cultivated area, yet the weight of its contribution is very important, especially during the drought years when the production of Bour (rainfed) zones is severely affected. As part of the implementation of the Green Morocco Plan, one of the main objectives of the state is to improve the living conditions of the smallholder farmers and to increase agricultural incomes in the most vulnerable areas, financial aid is granted to encourage agricultural investments. In this context, demands for determining reliable, cost efficient methodologies for assessing and evaluating vegetation changes are increasing. Remote sensing has become a significant contribution mean for providing a timely and accurate description of the agricultural sector, as it is presenting the advantage of gathering information over immense territories with high temporal repetitivity (Atzberger 2013), and reducing the cost of the field investigations. Many applications of remote sensing for vegetation monitoring can be found in literature: (Matton et al. 2015, Sakamoto et al. 2006, Benhadj et al. 2012, Wang et al. 2016, Er-raki et al. 2016, Moumni et al. 2019, Moumni et al 2020. From various methodologies and approaches, the Normalized Difference Vegetation Index (NDVI) (Tucker 1979) represents the most common and popular technique used to indicate and analyze the greenness of the earth's surface (Omdi, Daoudi and Adiri, 2017), and to fully exploit its usefulness, further consideration and analysis of time-series are required, especially when the study area is highly heterogeneous (Ghorbani et al. 2012). By the use of the multi-temporal aspect, remote sensing means can also be employed to assess change on earth's surface, particularly land-use/land-cover changes, and shifting agricultural patterns. A variety of reviews regarding techniques assessing change and their applications can be found in Lu et al. (2003), Coppin et al. (2004), Halmy et al. (2004), and Alqurashi and Kumar (2013).The classification of the remotely sensed products is known to be effectuated by the mean of a selection of algorithms. The ultimate goal of researchers in this field, is to find an automatic and robust procedure that does all the work.
The current study is an example of improvisation, for the fact that we were faced with several constraints and found good alternative solutions. In the study area, in central Morocco, there is mainly tree different areas, mountainous, Jebilat hills and between the two, the agricultural zone where the crops are mainly located. For an efficient land-cover mapping and due the topographical complexity, we were brought to adapt and provide for the classifier -in addition to high Spectro-temporal information-Digital Elevation Models and slopes data, as for example it was the only way to avoid confusion between forest trees in the mountains and crop trees in the agricultural zone, despite the classification algorithm used. The main objectives of this study are to exploit the means provided by satellite imagery for the creation of detailed and accurate maps of the complex study area, and then, to locate and map 'Non-Fully-Exploited Agricultural Surfaces (NFEAS)' in order to present to the public authorities, and more specifically the regional offices, useful documents for schemes of development of the most vulnerable zones, to improve access for smallholder farmers to state aid, and help augment the annual yields and incomes.

Study area
The study area is located 40 km south-east of Marrakech city in the eastern part of the plain of Haouz (about 2800 km²) in the center of Morocco. It extends between the mountain ranges of the High Atlas in the south, and the Jebilat hills in the northwest, with an altitude varying from 466m to 700m in the central area (slopes not exceeding 6º), and up to 1588m in the mountains. The semi-arid climate of the region is characterized mostly by absence of rainfall during the summer except for some storms, and irregular falls during the rest of the year with an annual average of 240mm and highly inter-and intra-annual variability (Tensift Water Basin Agency data). Statistics and field campaigns carried out by the regional public agency responsible for agricultural development in the Haouz plain (ORMVAH), show that the main production sectors are cereals, mainly wheat and barley, and tree plantations: olives (about 85 %), citrus and apricot.

Satellite imagery
All the available S2 atmospherically and geometrically corrected, cloud free and 10 m multispectral images, since 2016, were downloaded from the Theia land data center. L8 OLI (Operational Land Imager) 30 m multispectral, on demand level-2 cloud free pre-processed data (bands 1-7), L8 products represent about 18% of all the used here in satellite data, they were added to cover some particular dates where S2 have cloudy scenes, and in order to complete a time series that is well ranged over the seasons, with an average of two scenes per month when possible. The images were taken between January 2016 and December 2018 and were projected in WGS 84 UTM, zone 29N. From 60 (49 Sentinel-2 and 11 Landsat-8 images) cloud free high spatial resolution satellite images (Table 1), we derived NDVI scenes, before laying over ground truth information, and starting the extraction of the NDVI profiles. Table 2. Size and source of the used ground truth references, for the different classes training and testing over the last three years.

Ground truth data
Ground data collection was done by field campaigns carried out during 2018, and by the use of Google Earth's 10m to around 0.5m very high spatial resolution imagery for 2016 and 2017 ( Table 2). The latter source was also used to extract samples from inaccessible/hard to reach areas (e.g. mountain and hills as field investigation were mainly conducted in the agricultural zone). We focused on 8 main land-cover types in the agricultural zone: irrigated cereals (IC), olive trees (Ol), agricultural zone). We focused on 8 main land-cover types in the agricultural zone: irrigated cereals (IC), olive trees (Ol), citrus trees (Ci), prunus trees (Pr) (mostly apricot, and peach, almond, cherry, plum…), autumn crops (AC) (potato, bean, pea…), water bodies (WB), fallow or Bour (Fa/Bo) (cereals depending on rainfall) and bare soil (BS) (the latter two classes were used to identify the NFEAS). The mountainous area is known for vegetation diversity patterns divided into forested vegetation and non-forested vegetation For the purpose of clarity, a relatively simple separation was set for the mountain's land-cover types, and three classes were chosen: dense forests (DF), sparse forests (SF) (forests with lower canopy cover of trees), and mountain vegetation (MV). We also added a particular class that we called 'Jebilat' (Je), to take into account hills crowns with high slopes (>20%), as those lands are not feasible for agriculture.

Methodology
The workflow of the methodological approach used in the present work and the main steps are illustrated in figure 1. Figure 1. The workflow of the approach employed for landcover mapping and NFEAS localization.

Data processing and NDVI profiles extraction:
Analyzing land-cover dynamics over a 3-years period is of great importance for sufficiently understanding the general spectral behaviors of the land-cover types. We started processing data by the creation of NDVI time-series and the generation of landcover samples from ground observations. Yearly statistical measures of vegetative classes, and non-vegetative land-covers, namely the maximum, minimum, mean and standard deviation values were extracted from the training samples overlaid on the fused NDVI data, and the resulting averages were used to plot continuous phenological or dynamical patterns from 2016 through 2018 ( Figure 2). The chosen land-cover classes compared to each other, apart from crop trees (olive and citrus) and forest trees, follow different NDVI temporal patterns, yet, each land-cover type still particularly follows a general trend throughout the years, although values of the same periods from different years are not always identical and the reasons can differ from water availability, to cultural calendar, to acquisition dates... Olive and citrus trees have evergreen vegetation throughout the years, with decreasing trend of NDVI values in July and August (Figure 2 graphs a and c) being very dry months, with precipitation not exceeding 2 mm. Nevertheless, citrus trees biomass reflects higher in infra-red than olives, hence have higher NDVI values. Figure 2, graphs d and f show almost similar general trend between irrigated cereals and fallow or Bour. The sowing of the cereal crops generally starts early in November, reach their maximum in Mars through April, and are harvested at the beginning of June. Fallow compared to cereals, emerges and become quite detectable at the same period of Mars to April, but have most of the time lower NDVI maximum values than healthy cereals getting enough water amounts. Furthermore, these natural vegetation types are most of the time hard to separate from rainfed cereals, as their spectral signatures overlap during all the growing season. Although this may m like a land-cover mapping problem, fortunately it has on the contrary eased the process of achieving the final purpose (NFEAS detection), as we were targeting agricultural lands lacking water and means. The latter facts were the reason why those two classes (Fallow and Bour) were originally combined together as one. Autumn crops, as shown in Figure 2 graph e, reflect a unique agricultural calendar making them relatively easy to spot, they are normally sowed in autumn and reach a maximum NDVI value in January. The phenological patterns of the 'Prunus' class ( Figure 2 graph b), show low NDVI values during most of the autumn and winter and high values in spring through summer, for the fact that, every year, these deciduous tree crops follow the cycle of shredding leaves every autumn and growing them in late winter to early spring.  For overlaying and reading problems mountainous classes profiles were plotted apart, also 'Bare soil' and 'Water bodies' classes were not presented in the graphs, for being relatively easy to spot compared to the rest throughout the year, as they respectively present near null and negative NDVI values. When looking at the annual evolution of the curves, we notice the presence of three critical periods for the Spectro-temporal separation of the classes, and the set of the optimal thresholding values: Figure 3. 2018 NDVI profiles extracted from the agricultural zone on the left, and mountainous area on the right. • Late Mars to early April: this period benefits from great amounts of precipitations in terms of annual distribution. Thus, natural vegetation and cereal crops reach full development. As the two classes are different in terms of density and reflectance, the fallow or Bour attain lower NDVI averages (from 0.40 to 0.63) than irrigated cereals (from 0.79 to 0.82), and a threshold value based on that difference makes this period suitable for their discrimination.
The same temporal analysis stays valid for the separation of mountainous area's classes as they spectrally and temporally behave like citrus, olive and cereals. The resemblance could of have created the problem of those groups' confusion, except that forest and mountain vegetation are located at high altitudes, and as mentioned in the long-term analysis, the addition of the DEM information was helpful to resolve this issue. On the other hand, we couldn't apply the elevation criteria to exclude 'lands not suitable for agriculture' north-west in the Jebilat hills, for being not too high, but the derived slope raster, was sufficient to detect and locate the crowns of those hills using the description 'high slopes (>20%) and low altitudes'.
Reduced-data time series are designed to reduce the dimensionality of a dataset while maintaining a multi-raster temporal sequence (Bunker, Tullis, et al., 2016). For this case, each full yearly NDVI time-series was reduced to the cited three critical dates. This reduction was sufficient in terms of information needed for the phenological separation, hence for the classification process.

Decision tree modelling and classification:
The previously described separation strategy is quite particular, we therefore needed an algorithm capable of understanding the processing structure and chronology, and that was the reason behind the use of the Decision Tree classifier. This machine learning technique, is an approach to multistage decision making (tree-like structure), as basically it turns a complex decision into a union of several simpler ones, to obtain a final solution that resembles the desired one (Safavian and Landgrebe 1991). We constructed a software-based Decision Tree model using ENVI (Figure 4). For feeding the nodes, small changes from one year to another should be taken into account, and the NDVI threshold values should be slightly adjusted. Those values are usually empirically identified on a trial and error basis, and normally we start by correctly classifying as much of the training sample as possible, then we generalize beyond to validation samples so that they could be classified with as high of an accuracy as possible (Safavian and Landgrebe 1991). But for a matter of consistency and robustness, and instead of determining the threshold values iteratively, we built a set of rules based on the NDVI separation analysis and statistical distribution assumptions of these data, which made it easier for adapting the tree to the previous years (2016 and 2017) or even coming years(future studies).

Change detection and NFEAS localization:
For the purpose of our study, we focused on the spatial distribution of the 'Bare soil' and 'Fallow/Bour' classes over the last three years, as they were the key to successfully detect our targeted class. We couldn't assign a pixel to NFEAS for just being bare soil, fallow or Bour for only one year, as the intra-annual state of a land-cover type is not sufficient for change detection and its dynamic is susceptible of changing the year after that. Meanwhile, inter-annual comparison is better suited, as it has been shown to be efficient to many socioeconomic and natural processes (Campbell 2011). In a similar process from the definition of FAO and Pointereau et al. (2008) which used a minimum of four fallow years in five consecutive years to label a field as abandoned, and in addition to our knowledge of the area, we located pieces of land that has been mapped as 'Bare soil', 'Fallow/Bour' or a combination of both for three consecutive years, as for sure they contain the NFEAS. Yet, each of those detected pieces of land, could present different possible scenarios, hence, further examination could reflect the level of absence of management or water in this area, and highlight the degree of change. A set of the different possible change combination levels is constructed as table 4 Summarizes.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIV-4/W3-2020, 2020 5th International Conference on Smart City Applications, 7-8 October 2020, Virtual Safranbolu, Turkey (online) Figure 5. The hierarchical structure of the change detection Decision Tree model.
To our knowledge, Decision Trees have not been used for comparison of the post-classification results to detect change. Another Decision Tree model was built for the purpose. As it is shown in Figure 5, we exploited the post-classification resulting maps of 2016, 2017 and 2018 as input variables, in addition to the DEM raster to mask the mountainous area in the root node. We used the descriptions and combinations listed in the table above to determine splitting rules in the internal nodes, and for each year, classes other than 'Bare soil' or 'Fallow/Bour' were eliminated. Table 4. Levels representing the type and degree of change as the different possible combinations over the last three years.

Land-cover mapping
The implementation of the phenology-DEM-slope-based Decision Tree land-cover mapping model resulted in 3 maps showing the spatial distribution of the different classes in the study area. Table 5 shows the percent correct of all the classes, the OA and K statistics of the land-cover classification over the three years (2016)(2017)(2018). The model performed greatly, and all the overall classification accuracies exceeded 91% (Kappa 0.90). The greatest value was recorded for 2018, with OA of 95% (Kappa 0.94). When using 3-years average of the threshold values, the results stayed quite impressive, which was expected as all the standard deviations of the yearly threshold values do not exceed 4%. The accuracies have slightly decreased and lowest OA of about 88 % (Kappa 0.87) was recorded for 2017 where obviously the threshold values are the highest in terms of mean deviation. When assessed using calibration samples, the model gives slightly better results than when validated with validation samples. As it was expected, the classes that were clearly identified for all the years are 'Bare soil' and 'Water Bodies' in addition to 'Jebilat', One noteworthy mention to 'Prunus' class for being poorly classified, which can be explained -based on how our model is constructed-by the facts that either there was actually olive and citrus before they were replaced by a prunus specie, or the latter had younger trees with some annual crops on the understory. In fact, the inclusion of vegetation understories beneath some tree plantations, along with trees density and age, are the main reasons of confusion between annual vegetation and orchards, as it has been discussed in Benhadj et al. (2007). Detailed confusion matrices, indicate also the presences of three other confused groups: 1) between citrus trees and olive trees, 2) between dense forests and sparse forest, and 3) between Irrigated cereals and fallow/Bour. The belonging of those groups' classes is sometimes misjudged by the algorithm, not mistakenly, but by the overlapping of signatures between them, as each pair of those groups have the closest properties in terms of spectral reflectance. Like Otukei and Blaschke (2010), for a qualitative assessment of our model, we first tried to compare it to two of the most Support Vector Machine-based techniques. However, famous classification algorithms: Maximum Likelihood and Support Vector Machine-based techniques. However, unfortunately the latter two classifiers couldn't identify mountains and hills, even after multiple trials, thus the three algorithms performances couldn't be compared for all the classes at the same time. Nevertheless, we separated the mountainous area from the agricultural zone, then classified the two zones using both Maximum Likelihood (ML) and Support Vector Machine (SVM), and even when using full yearly NDVI time-series (and not only the previously described separation periods), the OA have not exceeded 87% for all the simulations. We furthermore tested the Random Forest (RF) classifier, which is an increasingly gaining-popularity ensemble learning technique applied in land-cover classification suggested by Breiman as a new and promising classifier in (2001). The results showed that the differences in stability after 100 trees are very small and the computation time increases for larger ranges of possible values of the number of trees. Numerous RF models were created with different number of trees (100, 150, 160, 180, 200 and 1000 Table 5. Percent correct of all the classes using calibration (Cal) and validation (Val) samples as ground reference data for evaluating classifications.
The RF showed better results compared to the ML and the SVM classifiers, as it was able to accurately detect the mountainous area. Yet, the OA showed slight performance inferiority when compared to our made Decision tree model, ranging between 88% and 91%. As the RF shows normally better results than the Decision Tree, a further investigation and analysis of the Random forest classification results were carried out and showed aberrations regarding Prunus and Jebilat classes. The Prunus class was mostly confused with Olive class, while sparse Jebilat pixels were detected all over the agricultural zone.

NFEAS localization
Over the three years, 'Bare soil' and 'Fallow or Bour' classes have been satisfactorily classified, from 2653 calibration and validation of 'Fallow or Bour' pixels, 2393 were correctly identified (percent correct of 90.20%), and from an average of 1720 'Bare soil' pixels, 1759 were well detected (percent correct of 97.78%). Those statistics were a good indicator, before even feeding the resulting maps to the change detection Decision Tree model, as in the post-classification approach, we rely on the quality of the classified images from each date obtain a good final accuracy. The model has yielded the final map of our work, showing the spatial distribution of the NFEAS along with levels 1 and 2 classes. Confusions were observed in the irrigated sectors, regarding unpaved roads that should normally belong to Level 1 group, yet were detected as NFEAS, which can be due to the fact that, with the presence of water (rainfalls, leaks after flood irrigation…), vegetation has grown on them. The confusion matrix presented in Table 6 shows details of the percent correct/incorrect of the different groups. The resulting map shows that about more than 70% of the NFEAS are located in the agricultural zone. Their spatial distribution indicates high concentration in three parts: south of the Jebilat area, north-east of the agricultural zone and central-south around a small city called Sidi Rahhal, which are known to be rainfed and sometimes flood irrigated. Level 1's permanent buildings, dry river beds, paved roads and abandoned lands are sparsely distributed over the study area. The validation was done by extracting samples from each type of those sub-classes. Well known urban zones were identified, namely Chouiter, Ait Ourir and Sidi Rahhal villages, in addition to river Tensift's bed, and paved roads larger enough to be detected from the 10m square pixels. For the abandoned lands validation, we explored Google earth's images archives for product from the period 2016-2018, and although the acquisition dates of those images don't follow regular time steps, fortunately those from the growing seasons were most of the time available and sufficient enough to detect sign of emptiness in those deserted lands over the years. Level 2 areas are mostly located in Jebilat hills, this type of land-cover reflects important shortage of water resources and almost impossible feasibility of agriculture, as obviously signs of vegetation weren't apparent until 2018, being the water surplus year amongst the three. In an attempt of furthermore ensuring the accurateness of our final map by implicitly evaluating the resulting land-cover maps, we layered out the resulting classes (levels 1-4) on a continuous NDVI time series from 2016 to 2018, in order to check the spectral dynamics of those lands. Figure 6 shows the NDVI profile of one random representative sample from each class. The continuous monitoring of the profiles of the randomly chosen samples translate very precisely classes descriptions in table 4. The visual interpretation of the dynamic of those samples is acceptably reliable, and the resulting profiles are interpretable, as it does not require great expertise nor the use of an algorithm to judge the absence of irrigated crops or tree plantations and the presence of only bare soils and fallow or Bour.

CONCLUSION
We followed an unusual but effective procedure for mapping land-cover and cover's change using Decision Tree models in the study area. Firstly, this approach provides a detailed description of land-cover's spatial distribution of twelve classes (Citrus, Prunus, Olive, Irrigated cereals, autumn crops, Fallow/Bour, Dense forests, Sparse forests, Mountain vegetation, Bare soil and Water bodies) on a yearly basis from 2016 through 2018. Secondly, in the context of land-cover change, it exploits the post classification resulting maps to locate potential agricultural lands that are not fully exploited. S2 and L8 imagery were used to extract 3-years NDVI profiles of each of the most present land-cover types in the study area, in order to analyze the vegetation phenology and land dynamics. The profiles matched with the pre-known phenological features, and the inter-annual stability of NDVI trends was quite apparent as classes profiles generally slightly differ from one year to another. A 1-year analysis (2018) was then carried out to determine Spectro-temporal separation periods and figure out discrimination rules for the classification. This analysis shows that only three critical periods were needed for the separation, namely mid-August, mid-January, and late Mars to early April. The presence of the mountains and hills has led to the use of DEMs and slopes data for a topographic separation. Based on the yearly profiles' analysis, elevation and slopes information, we implemented a threshold values rule in a Decision Tree classifier to map land-cover types from 2016 through 2018. The interpretations and analysis of the land-cover maps derived for three consecutive years (2016)(2017)(2018), confirm the performance of the model. Compensations between annual crops and barren lands spatially and temporally reflect high coherence with the water availability statistics and crop rotation strategies, while stability of permanent classes' spatial distribution and extent shows logical long duration space-time consistence. The OA and kappa of the model for the 3-years classifications reached above 91% and 0.90 respectively. The model is robust since even when not taking into account inter-annual adjustments, and implementing the averaging threshold values in the Decision Tree classifier, it's application over the three years, still gives satisfying results, and slight decreases in the OA and Kappa are noticed, with a minimum value recorded of 87.81% (Kappa 0.87).
We used the post-classification resulting maps to monitor the change. As we were targeting lands affected by climate fluctuations and drought that we called NFEAS (Non-Fully Exploited Agricultural Surfaces), we constructed another Decision Tree model to identify the lands where only interannual combinations of "Bare soil" and "Fallow/Bour" has been detected on the three maps. We carefully described the different combinations possible, to exclude some particular cases. For example, three consecutive year of barren land would indicate the presence of build-up areas, river beds, paved roads or deserted lands.
The results show sparse distribution of the NFEAS in the agricultural zone except for three locations: south of the hills, north-east of the study area, and the surroundings of a village called Sidi Rahhal. In coherence with the qualitative knowledge of the study area, those parts are known to be productive of rainfed cereals which helped validate the results. The evaluation of the results was done using two methods: 1) quantitative and qualitative comparison to Google Earth's very high spatial resolution images, and 2) random independent extraction of the spectral profiles of those lands from a continuous 3-years NDVI time-series. Overlaying the resulting map to the real images showed good similarities, and the generation of samples yielded in an OA of 87.07%. Meanwhile, the random profiles extraction from 3 consecutive years NDVI time-series from representative samples of the different change levels exactly matched their descriptions.

ACKNOWLEDGMENT
This study was conducted within the laboratory of LMME in the faculty of science Semlalia Marrakesh. We would like to thank the staff and the directors of the international joint laboratory TREMA for their technical assistance in the collection of the data, and for their scientific help during the course of this study.