SPATIO-TEMPORAL OBJECT STABILITY FOR MONITORING EVOLVING AREAS IN SATELLITE IMAGE TIME SERIES

: Monitoring observable processes in Satellite Image Time Series (SITS) is one of the crucial way to understand dynamics of our planet that is facing unexpected behaviors due to climate change. In this paper, we propose a novel method to assess the evolution of objects (and especially their surface) through time. To do so, we ﬁrst build a space-time tree representation of image time series. The so-called space-time tree is a hierarchical representation of an image sequences into a nested set of nodes characterizing the observed regions at multiple spatial and temporal scales. Then, we measure for each node the spatial area occupied at each time sample, and we focus on its evolution through time. We thus deﬁne the spatio-temporal stability of each node. We use this attribute to identify and measure changing areas in a remotely-sensed scene. We illustrate the purpose of our method with some experiments in a coastal environment using Sentinel-2 images, and in a ﬂood occurred area with Sentinel-1 images.


Motivation
Due to the climate change, it is crucial to have information about earth dynamics regularly. For instance, sea level rise is changing rapidly and up to date information about our earth is required. Sea level change may be caused by flood or tide, and observation capability of coastal side is increasing with data availability (Salameh et al., 2019). Remote sensing images enable us to observe our earth with improved technology. New satellite missions such as Sentinel provide effective temporal resolution with approximately 5 days revisit time.
Unsurprisingly, Satellite image time series (SITS) analysis continues to gain popularity in the literature. Such data combine spatial and temporal dimensions and they can be used for many problems such as assessing the temporal evolution of phenomena or objects (Méger et al., 2019). Supervised classification algorithms currently represent the state-of-the-art in automatic mapping and monitoring of our planet, e.g. with deep neural networks that have reached very good performances on some well-defined scenarios. However, due to the difficulty to collect training data for each time sample, these methods might look less appealing than unsupervised algorithms (Petitjean et al., 2012). Indeed, although there are several reference data provided by industrial or institutional players, such data remain very expensive and time-costly to produce. Accurate unsupervised methods are thus needed to understand our entire Earth without requirement of reference data.

Related work
This paper aims at assessing evolving objects or phenomena in an unsupervised fashion. More precisely, we consider two use cases where observation of water areas is crucial: tides in Sentinel-2 images and floods in Sentinel-1 images. To do so, we * Corresponding author propose a novel methodology relying on morphological hierarchies. In this section, we review some related works, from both an application and a methodological point of view.
Tide and flood mapping. Monitoring natural disasters is an important task for post-disaster management. Flood is one of the most common disasters and it is necessary to distinguish changes related to water from the other changes in order to correctly find affected regions. Another natural event caused by water activity occurs in intertidal zones. Both events result from water mobility. According to (Salameh et al., 2019), availability of multitemporal images can provide an optimal solution for beach topography monitoring.
On the tide observation side, a k-means clustering is used in (Soares et al., 2012) in order to monitor intertidal zones. As a postprocessing, the authors apply morphological filtering (closing) to the SAR images. Their work relies on bitemporal analysis, thus making a limited usage of the temporal information. Pixelwise monitoring of intertidal zones with multitemporal Synthetic Aperture Radar (SAR) images is proposed in (Catalao, Nico, 2016) with a method to analyse pixelwise intensity variations. Although their method is promising, objectbased analysis is known to be more efficient than pixel based analysis for wetland landscape applications (Berhane et al., 2018) . In (Gonçalves, Henriques, 2015), authors used optical airborne images to extract Digital Surface Model (DSM) from coastal areas. Nevertheless, such DSM are not always available.
As far as flood detection is concerned, Synthetic Aperture Radar (SAR) is known to be a valuable data source (Martinis et al., 2015, Tang et al., 2018. The SAR backscatter depends on the physical properties of the objects and, since water is reflecting less than other materials (Tang et al., 2018), it is a discriminative feature for water detection. Therefore, radiometric thresholding of backscatters is known as an efficient way to extract water areas (Giustarini et al., 2013). However, such a method is very sensitive to noise and spatial regularization thus appears as a relevant strategy to improve robustness. Recently, flood detection was addressed with hierarchical representations (Tuna et al., 2019). A min-tree was used to find an optimal threshold and extract water regions. Although thresholding is an effective method, it is struggling when the scene includes soil moistures. Computation of the Normalized Difference Flood Index (NDFI) from SAR images was recently introduced in (Cian et al., 2018), but its effective usage requires long time series data. Our aim is to propose an unsupervised method that extracts continuous spatio-temporal information from SITS since water shows a highly active behavior.
Morphological hierarchies and stable features. We aim at proposing a new, unsupervised method which is able to conduct an object-based analysis of the images in the temporal domain. Morphological hierarchies are multiscale representations of an image that provide access to the objects it contains at various scales-of-interest. More specifically, the question of building a morphological hierarchy for modelling a time series was addressed in our previous work (Tuna et al., 2020). We have used a component tree to observe the spatial structures in the time domain through temporal connectivity. As a related object-based temporal analysis work, we can mention the use of Maximally Stable Extremal Region (MSER) for video sequences in (Donoser et al., 2010). A similar methodology was used for real-time text detection in (Gómez, Karatzas, 2014). They did not use temporal connectivity, but instead built a hierarchical representation for each time sample. To monitor the temporal evolution, a graph-based hierarchical representation was also used in (Khiali et al., 2019). The authors built a graph from predefined objects and then analyzed the evolution of an object through time. They aim to find objects which share a similar evolution. We also focus on such evolution but propose to do so through the definition of a novel spatio-temporal stability measure computed from a morphological hierarchy. Indeed, during the last decades, morphological hierarchies have been used for many applications (see (Bosilj et al., 2018) for a recent survey). However, to the best of our knowledge, such a spatiotemporal attribute was never used jointly with a morphological hierarchy, while such a definition would allow us to rely on efficient and scalable algorithms and process large satellite image time series.

Contributions
In this paper, we inspire from the MSER concept built through a component tree approach to measure the stability in the temporal domain. Our contributions are twofold: i) we propose a new attribute called spatio-temporal stability to describe the evolution of regions through time; ii) we demonstrate the practical interest of this attribute for Earth Observation with two applications related to monitoring of water areas: flood mapping from Sentinel-1 and intertidal monitoring with Sentinel-2. Let us emphasize that our method is unsupervised so it does not require training samples to learn the observed phenomena.
Organization of our paper is as follows. Section 2 provides some brief information about morphological representations usually built from still images but recently extended to satellite image time series. The spatio-temporal stability attribute and its post-processing will be presented in Section 3. Experimental results will be discussed in Section 4, before we conclude our paper in Section 5.

TREE REPRESENTATION
We recall here how to build a morphological hierarchy, that represents an image through a tree structure. Furthermore, we describe the space-time tree model recently proposed in (Tuna et al., 2020) that we use to derive our spatio-temporal stability attribute.

Component tree
As a representative example of morphological hierarchies, we consider in this paper the component trees (a.k.a max and mintrees). They were introduced by (Jones, 1999) as hierarchical image representation structures based on connected components. A tree consist of vertices and edges such as T = (V, E). We recall here their definition using the following notations. Let I be a grayscale image defined on the domain Ω ∈ N 2 and taking values in V , i.e. I : Ω ∈ N 2 → V ∈ Z, x → I(x) = v, with x and v denoting respectively the 2D pixel coordinates and intensity. The multi-scale representation is built using successive thresholdings with a threshold λ ∈ Z, the lower and upper threshold sets being defined as where λ is an intensity threshold taking values within the image intensity range.range of the image intensity. From these lower and upper sets, it is possible to extract the set of connected components C(I) (given a predefined connectivity, e.g. 4or 8-connectivity in images). These components are also called as nodes, and the node set of a tree is the union of its components: where C denotes a connected component, k its index in the level λ of the image I. For the sake of simplicity, we will omit k and I notations in the sequel. The tree is created by finding parent-child relationships of each node. The root is the only node which has no parent and it covers the whole image. Leaves of the component tree include maximum (i.e. brighter objects) or minimum (i.e. darker objects) values of the image, for the max-tree and min-tree respectively.

Space-time tree
While morphological hierarchies have been built from still images for decades, their definition over spatio-temporal data such as satellite image time series is more recent. In (Tuna et al., 2020), several strategies to build such a hierarchy have been introduced, and we focus here on the space-time tree. It assumes the SITS being seen as a spatio-temporal cube, where the two usual spatial dimensions are completed by a third dimension related to time. From this cube it is possible to build a single tree T (I1, . . . , In) where n is the length of the time series. The nodes of this space-time tree contain elements from multiple time stamps (i.e. pixels from multiple images in the series). If we denote the time dimension as a subscript of a node C λ,t , space-time tree nodes can be represented as union of connected components from each time stamp separately: Figure 1 shows a space-time max-tree example with a simple 4 × 4 × 3 matrix and the 6-connectivity rule which is created by gathering the 4 spatial connectivity and the 1 temporal connectivity (i.e. two temporal neighbors, previous and next). Each color represents a specific image and each arrow shows one node of the tree. The 6-connectivity rule can be formulated as It should be noted that, if a pixel faces some intensity change in the series (i.e. Ii(x) = Ii+1(x)), there will be no temporal connectivity and thus will result in two different nodes in the tree. We recall the root of the tree covers the whole time series and includes all three images. We can see that some nodes include pixels from every image/date, while some others include pixels from only one. Additionally, a real (but still simple) example is provided in Figure 2. We selected a few nodes from the whole tree for the sake of visualization. As we can see, such a model allows us to deal with spatio-temporal patterns present in the satellite image time series.

METHOD
We now explain how we build the spatio-temporal stability attribute from the tree. Then, we will use this attribute to select some specific nodes of interest in the tree. Finally, an optional step consisting of area filtering will be discussed.

Spatio-Temporal Stability
As already stated, we propose to extend the stability concept used in the famous MSER method to deal with the temporal dimension. Thus, we define the spatio-temporal stability of each node from the area ratio of the successive connected components though time. Here the area refers to the amount of pixels in the node (Marcotegui et al., 2017) and we denote area of a node as A(C). We formulate the stability attribute St as where A represents the area attribute of the relevant C. The root node covers the whole spatial support all along the time series and thus has a spatio-temporal stability of 1 independently of the length n of the series. Figure 3 illustrates the spatio-temporal stability attribute of the nodes from the Figure 2. For each node of the tree, we provide the area occupied at each given time sample, as well as the spatio-temporal stability. Spatio-temporal stability attribute of the root node equals 1 as expected. As we mentioned before, some nodes may cover a single date, thus have a null area in the other images. We set the stability attribute to 0 for these nodes.

Node Selection
The stability measure can then be used to identify evolving or non-evolving objects in SITS. One way to extract information from trees is filtering. Filtering a tree consists in pruning the nodes according to some predefined criteria, usually by comparing node attributes to some threshold. If our interest relates to evolving or unstable objects, nodes with low stability will be sought. To do so, we first retain all nodes having a stability lower than a given threshold h. We then assign each pixel to the level of the node that is closest to the root (i.e. with the lowest λ value in case of a max-tree) to build the reconstructed image I ′ : and we set remaining pixels to 0. Let us note that a first pruning step is systematically applied to remove all nodes with stability equal to 0, that correspond to noisy regions appearing at a single time stamp.
To illustrate, let us consider the tree in Figure 3a. Since we aim to find unstable regions, we should select some relatively low threshold, e.g., h = 0.5. We then reconstruct the image (Fig. 3b) as usually done with tree-based filtering approaches.
An additional example is provided in Figure 4 to show the behavior of our approach. We can see a series of synthetic images with intensities that evolve through time together with the result of unstable region detection (or stable regions filtering). As we can notice, the spatio-temporal stability allows us to distinguish changing objects and static ones despite of intensity variations. We used the same threshold value h = 0.5.

EXPERIMENTS
To illustrate the practical interest of our method, we have conducted two sets of experiments. The first aims to detect floods in Sentinel-1 SITS, while the second is focused on intertidal monitoring in Sentinel-2 SITS.

Flood mapping from Sentinel-1
For this first use case, we consider a series of Sentinel-1A images acquired over Montmirail, North of France and East of Paris. More precisely, the data in use come with a spatial resolution of 10m, and consists in Ground Range Detected products   Table 1. Acquisition dates of Sentinel-1 images with the Interferometric Wide Swath (IW) and VV polarization. The SITS is made of 3 images (892×1941) as detailed in Table  1.
In order to assess the ability of our method to achieve flood detection, we use the Copernicus Emergency Management Service flood mapping 1 shape files as reference. A flooding event occurred on 22 January 2018, i.e. between second and third images of the series, as shown in Figure 5. Flood mapping can be considered as a specific application of change detection focused on water areas. So we compare the flooded scene with a non-flooded one (e.g. before the flood event) to identify among water regions those that are actually changing (i.e., floods) that we distinguish from water bodies. We then end with the following difference image (where i is the time when flooding is visible): Since constant waters have the same location in successive images, they are removed in I f . Finally, build a binary map through a simple threshold that discard null values: After finding the unstable nodes with a threshold empirically set to h = 0.2, some artifacts can remain (Cian et al., 2018) due to the double bounce effect, the backscatter similarity of dry soil, etc. In order to overcome these small errors, we post-process Method F1 Proposed 0.78 (Tuna et al., 2019) 0.73 (Cian et al., 2018) 0.60 Table 2. Quantitative evaluation of flood detection results for different methods using the F1 measure.
the binary change detection map I f with a small area filtering (i.e., with an area threshold ha = 20).
We report in Table 2 some quantitative evaluation obtained with the F1 score, i.e. the harmonic mean of precision and recall measures, defined as 2 TP/(2 TP + FP + FN)) where TP are true positives (i.e., flooded pixels that were correctly detected by the method), FPs false positives (i.e., non-flooded regions detected as flooded) and FNs false negatives (i.e. flooded regions that have not been detected). We compare our results with those provided by two existing methods: the min-tree based radiometric thresholding (Tuna et al., 2019) that also relies on spatial attributes extracted from morphological hierarchies, and the Normalized Difference Flood Index (NDFI) (Cian et al., 2018) thresholding approach.

Intertidal monitoring with Sentinel-2
We used Sentinel-2 images around Morbihan, France. Morbihan and the Brittany region are well-known for their high tide behaviours. We limit ourselves to a sample SITS made of small extracts (632 × 927px) to ease visualization, considering 5 images which were acquired in 2018 with a spatial resolution of 10m. We selected only cloud-free images in this illustrative example. We used level 2A products provided by the THEIA land data center 2 . Since there is no ground truth data for this application, we only report some visual assessment of our results. Acquisition dates of these images are given in Table 3 Table 3. Acquisition dates of Sentinel-2 Images our focus in this paper is on the spatio-temporal nature of the data, we simplify each multispectral image into a grayscale one by computing the Normalized Difference Water Index (NDWI) (McFeeters, 1996) for each pixel. We recall that NDWI can be calculated from green (G) and near-infrared (NIR) bands of an image as N DW I = (G − N IR)/(G + N IR). Since water pixels appear brighter than other classes on the NDWI image, we used the max-tree for this experiment. Original images can be seen in the first row. We set the spatiotemporal stability of each pixel according to the stability of the deepest nodes they belong to (i.e., the one with the highest intensity value). More explicitly, water areas show higher values than the other parts of the images. We provide the spatiotemporal stability images I st i in the second row. The third row is the result obtained after filtering the space-time tree nodes with a spatio-temporal stability threshold empirically set to h = 0.4 and reconstructing the filtered SITS. We can notice some overlap between the detected regions and the water regions visible in the original images. As expected, no land or sand region are detected by our method. For instance, I3 shows some sandy areas on the water at the top left of the image and this is not detected as water in I ′ 3 . We have also added a red rectangle in I ′ 3 , to emphasize where the temporal connectivity has helped to ensure a correct detection. Although there is a gap in the spatial domain, this region still belongs to a node with high stability thanks to the successive images. Another red rectangle is given for I ′ 5 . This part is dramatically growing at that time and correctly detected as an intertidal area with our method. Even if there is no temporal relationship, water pixels are connected in the spatial domain. For the sake of comparison, we provide in the last row the result of a pixel-based variation analysis I var obtained with the method from (Catalao, Nico, 2016). We can see in this last image that the tide areas are detected but the method does not provide a result per date. Besides, there are wrongly detected areas caused by intensity changes though time.

CONCLUSION
In this paper, we have introduced a new spatio-temporal stability attribute that can be efficiently measured from a space-time tree, i.e. a multiscale, hierarchical representation of a satellite image time series. This attribute relies on size variability of the tree nodes in the temporal domain. We used this attribute to monitor dynamic water regions such as floods in Sentinel-1 and intertidal zones in Sentinel-2. Since component trees are limited to analyze objects that are brighter or darker than their surroundings, future work will include using this attribute with other morphological hierarchies (e.g. tree of shapes, multiscale segmentations).