SAR-based change detection using hypothesis testing and Markov random field modelling

The objective of this study is to automatically detect changed areas caused by natural disasters from bi-temporal co-registered and calibrated TerraSAR-X data. The technique in this paper consists of two steps: Firstly, an automatic coarse detection step is applied based on a statistical hypothesis test for initializing the classification. The original analytical formula as proposed in the constant false alarm rate (CFAR) edge detector is reviewed and rewritten in a compact form of the incomplete beta function, which is a builtin routine in commercial scientific software such as MATLAB and IDL. Secondly, a post-classification step is introduced to optimize the noisy classification result in the previous step. Generally, an optimization problem can be formulated as a Markov random field (MRF) on which the quality of a classification is measured by an energy function. The optimal classification based on the MRF is related to the lowest energy value. Previous studies provide methods for the optimization problem using MRFs, such as the iterated conditional modes (ICM) algorithm. Recently, a novel algorithm was presented based on graph-cut theory. This method transforms a MRF to an equivalent graph and solves the optimization problem by a max-flow/min-cut algorithm on the graph. In this study this graph-cut algorithm is applied iteratively to improve the coarse classification. At each iteration the parameters of the energy function for the current classification are set by the logarithmic probability density function (PDF). The relevant parameters are estimated by the method of logarithmic cumulants (MoLC). Experiments are performed using two flood events in Germany and Australia in 2011 and a forest fire on La Palma in 2009 using preand post-event TerraSAR-X data. The results show convincing coarse classifications and considerable improvement by the graph-cut post-classification step.


INTRODUCTION 1.1 Problem Description
Natural disaster can affect residence of human beings, cause economic damage and even take lives of thousands of people every year.It is essential to monitor natural disasters and to provide valid near real-time crisis information, so that rescue work can be conducted in time to keep the loss as less as possible.In this context Synthetic Aperture Radar (SAR) turns out to be an excellent sensor for disaster monitoring because of its ability of all-weather data acquisition.In view of this virtue it is valuable to develop appropriate algorithms to extract more information from SAR data.This study is dedicated to one of them called change detection.
Change detection is the process of identifying differences in the state of an object or phenomenon by observing it at different times (Lu et al., 2004).The goal of change detection is to detect "significant" changes while rejecting "unimportant" ones (Radke et al., 2005).In general significant changes may reflect natural phenomena or anthropogenic activities on the Earth's surface.A typical application of change detection is to monitor abrupt changes caused by natural disaster and extract relevant crisis information to support crisis response and rescue work.

Related Works
There are many change detection techniques in the literature.Lu et al. (2004) and Radke et al. (2005) provide comprehensive reviews for the present methods.The following categories of change detection are summarised in (Lu et al., 2004): 1. Algebraic methods such as image differencing, image ratioing, image regression, vegetation index differencing, change vector analysis (CVA) and background subtraction.2. Transformation methods like principal component analysis (PCA), Gramm-Schmidt and Chi-square transformations.3. Classification methods for multi-spectral images such as post-classification comparison, spectral-temporal combined hybrid change detection and artificial neural network (ANN).4. Advanced models, for example a biophysical model related to the scattering process of vegetation to find changed areas.
There is no ideal algorithm for all problems.Algebraic methods are efficient and easy to implement, but it is hard to find a suitable threshold to make an optimal classification.Other methods in the list above outperform the algebraic methods, but they are complex and time-consuming.This paper focuses on the application of SAR data in the field of disaster monitoring.For this purpose efficient algebraic methods are preferred in order to get near real-time information as soon as possible.
Several studies discuss methods to determine the optimal threshold for SAR change detection.Moser et al. (2006) derives the optimal threshold based on Bayes' rule.Bazi et al. (2005) exploits a similar idea but utilizes a generalised Gaussian probability density function (PDF).Instead of using Bayes' rule Oliver et al. (1996) discusses a simple classification method based on a hypothesis test which dates back to the constant false alarm rate (CFAR) edge detector (Touzi et al., 1988).
Despite of the convinced experimental results in these papers there are still some problems that are not so far discussed but inevitable for applications of SAR change detection in the field of natural disaster monitoring, such as: 1. Most of these methods only discuss a two-class problem, i.e. they only classify a pair of images into a changed and no-changed set of pixels.In general there are three classes in change detection: positive change, negative change and non-change.For example, in flood monitoring, flood and receding water areas should be classified into negative-and positive-change class.In case of partially submerged vegetation there are also frequently positiveand negative-changed classes to distinguish.2. Most of the discussions in the literature are restricted in pixel-based thresholding algorithms.Because of speckle noise in SAR images, the effect of a pixel-based thresholding approach is limited regardless of which criterion for the optimal threshold is defined.Most of the methods filter SAR images before classification in order to reduce the speckle noise.However, many details in SAR images could be removed or modified by this processing step.In general, a post-classification step can be used to improve the coarse classification result obtained by thresholding (Radke et al., 2005).

Organisation of the Paper
In this paper we inherit the key ideas in the literature mentioned in subsection 1.2 to solve these two problems.Section 2 describes the proposed change detection method that consists of a coarse classification and a post-classification.In subsection 2.1 the hypothesis test for SAR-based change detection is at first reviewed and modified in a compact form using the incomplete beta function.In subsection 2.2 the concept of Markov Random Fields (MRF) is introduced to formulate the problem of post-classification.Afterwards a graph-cut algorithm is applied to solve this general optimization problem.The concrete energy functions for this algorithm are defined in this subsection.Section 3 reports on the results of the application of the proposed automatic method to SAR images acquired in many different disaster scenarios.At the end conclusions and outlook and drawn in Section 4.

METHODOLOGY
Before starting discussion on the proposed method in this study some notations used in this article are summarized in table 1 to provide an overview of involved physical variables.

Review of hypothesis test for SAR change detection:
The one-sided hypothesis test for positive change detection is generally formulated as: where  is true ratio of intensity mean values of a homogeneous area in two coregistered SAR images, as listed in table 1.The constant false alarm rate α U describes the probability of rejecting H 0 given that it is true.It can be formulated as: Introducing an exceedance probability function defined by: E ( ) the constant false alarm rate  U can further be related to the exceedance probability function by rewriting the quantile ( ) The formulation above also applies to the corresponding onesided hypothesis test for negative change detection.That provides: Equation 1.4 and 1.5 enable us to calculate the corresponding quantiles with respect to certain constant false alarm rates  U ,  L if a certain exceedance probability function  E () is defined.An example of  E () formulated by a series expansion is described in Oliver et al. (1996): where Γ(•) is the Gamma function.One problem of this formula is that it is only restricted to the two-class problem.In the next we review this formula from the aspect of some properties of Gamma and Fisher distribution functions.

Ratio of Gamma random variables:
Assuming that SAR images have been segmented into many homogeneous areas with pixel number   , the pixels in each area of the first image follow the same Gamma distribution: where  1 ,  1 are parameters for the Gamma distribution (Lindgren, 1993) that are constant for all pixels in a homogeneous area of the first SAR image.For a homogenous area in the second image the situation is similar: The parameterization of a Gamma function is different in statistics and SAR data processing.The parameters in Equation 1.7 and 1.8 are related to the parameters in table 1 by the following equation: (1.9) If an area of pixels has changed, there must be also a difference between their mean values.The task is now to find the PDF for the mean value of  Gamma distributed random variables.This can be solved based on following three properties of the Gamma distribution (Lindgren, 1993): The summation and the scaling property with  = 1/ provide: Gamma( , ) Using the scaling property with  = 2  /  for  = 1, 2, respectively, provides: where the last equation uses the third property specified in table 2. In addition,   2 distribution is related to Fisher distribution by the following property: (1.12) Applying Equation 1.12 to Equation 1.10 provides: With Equation 1.9 and the relation in table 1, equation 1.13 is equal to: Using the true and observed ratio specified in table 1 to rewrite the equation 1.14 gives: According to Pearson (1968), the cumulative distribution function CDF of F-distribution (;  1 ,  2 ) can be formulated as: ( ) where   ( 1 ,  2 ) is the incomplete beta function that is a commonly available built-in function of MATLAB or IDL.The exceedance probability function can then be described as: P q P y r q r dr P y q y dy q y N L N L (1.17) Combining equation 1.4, 1.5 and 1.17 provides: Given two constant false alarm rates  L ,  U , the parameters  L ,  U can be calculated from equation 1.18.Then the decision rules for three-class change detection are:

Table 3. Decision rules for three-class discrimination
There are many ways to apply equation 1.18 and the decision rules in Table 3.If  1 =  2 = 1, this method degenerates into a pixel-based one.It can be used to perform a quick coarse classification that provides a seed for the following postclassification.If  1 > 1,  2 > 1, this method is region-based.If a segmentation mask is available, this region-based method provides a way of direct classification without the need of any post-processing.The region-based version can also be applied to averaged and subsampled SAR images in order to reduce the process time.In this study we only use the hypothesis test as a pixel-based method to provide a seed for its post-classification.
The purpose is to demonstrate the effect of the graph-cut algorithm on this coarse initial result.

Post-classification based on Markov Random Fields and Graph-Cuts
Classification results based on thresholding are generally liable to speckle noise in SAR images.This type of noise is an effect of most coherent imaging systems and results in a "dirty" classification.A post-processing step is necessary to reduce the randomness of the classification results due to the speckle noise in SAR images.As a classification result can be considered as a mask, in which each pixel is labeled with its class ID postprocessing can be seen as correction of the initial classification of the image.In the literature this task is formulated in the frame of Markov random fields (MRF) (Li, 1995).If pixels labeled in a class have similar values, they will receive low data energy values, and vice versa.In this way the data energy quantizes the quality of a certain classification.On the other hand, the smoothness energy function takes the neighborhood of pixels into consideration.Classifications with scattered regions of pixels have higher smoothness energy than those with connected regions of pixels.In this way the smoothness energy describes if the pixels of a same class clutter together or scatter randomly.

Markov
Mathematically the set of pixels in an image  is denoted as  0 .
A classification or labeling of this image is then denoted as  = �  , ∀ ∈  0 � where   means the class label of pixel  ∈  0 .The set of all possible class labelings is described as ℒ.The labeling  can also be considered as a mapping from  to ℒ.If pixel  is classified to the class   , its data energy is denoted as   (  ), the smoothness energy of two pixels  and  in a neighbor system  is denoted as   (  ,   ) that is also called potential.With these notations the whole energy of the MRF under the labeling  is described as: The problem is to find the labeling  that minimizes () globally provided that the data and smoothness energy functions are suitably defined.In the literature there are many algorithms to solve this general optimization problem, such as iterated conditional modes (ICM) (Besag, 1986) and simulated annealing (SA) (Geman et al., 1984).In the following an effective algorithm based on graph-cuts is applied to solve the optimization problem on a MRF.

Graph-cut Algorithm:
A graph is composed of nodes and edges.Each edge in the graph can transport an amount of flow between its two nodes.The maximum of the flow is limited by the capacity of the edges.If the flow in one edge has reached its capacity, this edge is blocked and cannot transport flow anymore.The connection between the source and the sink nodes through edges forms a path for flow transport.The maximum of flow in this path (max-flow) is reached if the capacity of one edge in the path has been fully used.In this situation this path is called blocked.In order to find the maxflow from source to sink, every possible path should be fully used to transport flow until they are all blocked.Then the sum of flow in these paths is the maximal flow.
Another interesting viewpoint is given by Ford et al. (1956).
According to their theorem the maximal flow in each path is only determined by those edges with minimal capacities, i.e. the blocked edges.So the max-flow in the whole graph is equal to the sum of capacities of the blocked edges.In addition, these blocked edges cut the graph into two parts.One part is connected with the source, denoted as   , the other is connected to the sink, denoted as   .The blocked edges are the channel of these two parts.In the case of max-flow, the capacities of the blocked edges are fully used and determine the maximal flow.
In this viewpoint Ford et al. (1956) equates the max-flow with the best cut of a graph.Since there are many possible cuts which divide the graph into two parts   ,   , the best cut is so defined that the sum of capacities of the cut edges is minimal (min-cut).These cut edges are equivalent to the blocked edges in the max-flow problem.Kolmogorov et al. (2004) transforms the optimization problem on a two-class MRF to the problem of max-flow/min-cut in a graph.The purpose of the transformation is to benefit from some global optimization methods in the graph theory.Boykov et al. (2004) apply the graph construction method in Kolmogorov et al. ( 2004) to a multi-label graph and proposes a novel algorithm to perform max-flow optimization.According to their experiments their new method significantly outperforms other standard algorithms.In this study we use the graph-cut and max-flow methods by Kolmogorov et al. (2004) and Boykov et al. (2004) to solve the optimization problem on a MRF.Their algorithms have been implemented in C++ code (Boykov et al., 2011).Two functions in their code are iteratively used to improve a coarse classification from the hypothesis test in Section 2.1: a graph constructor function and -swap function.In addition, a data energy function and a smoothness energy function should be defined to initialize the optimization process.In the following more details are given about these two energy functions.

Data energy and Smoothness energy:
As pixels of the same class have similar intensity values, an appropriate PDF can be used to measure how similar the pixel intensities are.Given an initial classification, the parameters of each class can be estimated from current samples of each class.With these parameters the PDF value for every pixel of a class can be calculated and used to quantize the quality of the current classification in an iterative process.The following equation is used to set data energy values: makes appropriate corrections.The post-processing by the graph-cut algorithm demonstrates an obvious improvement compared with the coarse classification by the hypothesis test.
There is also a high correlation between the results obtained by the semi-automated object-based algorithm of Bernhard et al. (2011) and the graph-cut algorithm.
The remaining problem for change detection is the ambiguous interpretation of the detected changes.Further study will concentrate on the integration of textual information, auxiliary data and other biophysical models to support the interpretation of different detected changes.
Random Fields: Post-processing usually integrates priori knowledge in the classification step to improve the classification.One of the assumptions in MRF is that real objects are homogeneously distributed.Thus objects of the same category should almost gather together in images.Based on this consideration the expected classification result should be represented in areas or blocks instead pixels.This idea of MRFs is formulated by two functions: data energy function and smoothness energy function.The data energy function focuses on the similarity of pixel values in each class.

Figure 1 :
Figure 1: Study area of Queensland, Australia.Subsets of TerraSAR-X amplitude data (© DLR 2013) of a) 05/01/2011 and b) 21/01/2011, c) intensity ratio image between b) and a), d) classification based on the hypothesis test, e) classification based on the hypothesis test and the graph-cut algorithm.In d) and e) positive-and negative-change are shown in red and blue colors, no-change is transparent.

Figure 2 :
Figure 2: Study area of Halle, Germany.Subsets of TerraSAR-X amplitude data (© DLR 2013) of a) 17/01/2011 and b) 08/02/2011, c) intensity ratio image between b) and a), d) co-registered aerial RGB image data, e) classification based on the hypothesis test and the graph-cut algorithm, g) classification with the binary mask derived from d).In e) and f) positive-and negative-change are shown in red and blue colors, no-change is transparent.

Figure 3 :Figure 4 :
Figure 3: Study area of Halle, Germany.Subsets of TerraSAR-X amplitude data (© DLR 2013) of a) 17/01/2011 and b) 08/02/2011, c) intensity ratio image between b) and a), d) classification based on the hypothesis test and the graph-cut algorithm, positive-and negative-change are shown in red and blue colors, no-change is transparent.

Initial Classification based on Hypothesis Test
i σ Radar cross section of a homogeneous area in the i-th SAR image