A CLASS-OUTLIER APPROACH FOR ENVIRONNEMENTAL MONITORING USING UAV HYPERSPECTRAL IMAGES

In several remote sensing applications, detecting exceptional/irregular regions (i.e, pixels) with respect to the whole dataset homogeneity is regarded as a very interested issue. Currently, this is limited to the pre-processing step aiming to eliminate the cloud or noisy pixels. In this paper, we propose to extend the coverage area and to tackle this issue by regarding the irregular/exceptional pixels as outliers. The main purpose is the adaptation of the class outlier mining concept in order to find abnormal and irregular pixels in hyperspectral images. This should be done taking into account the class labels and the relative uncertainty of collected data. To reach this goal, the Class Outliers: DistanceBased (CODB) algorithm is enhanced to take into account the multivariate high-dimensional data and the concomitant partially available knowledge of our data. This is mainly done by using belief theory and a learnable task-specific similarity measure. To validate our approach, we apply it for vegetation inspection and normality monitoring. For experimental purposes, the Airborne Prism Experiment (APEX) data, set acquired during an APEX flight campaign in June 2011, was used. Moreover, a collection of simulated hyperspectral images and spectral indices, providing a quantitative indicator of vegetation health, were generated for this purpose. The encouraging obtained results can be used to monitor areas where vegetation may be stressed, as a proxy to detect potential drought.


INTRODUCTION
Recently, hyperspectral sensors, deployed on UAVs (Unmanned Aerial Vehicle), is emerging as an irreproachable means for earth observation and environmental degradation monitoring.This evolution leads to a refined aerial recovery of all spectral and spatial features within the site of interest.Nevertheless, spatially uncorrelated pixels is a rather challenging issue in statistical and cognitive researches.This is due to the large intra-class variability and view-point addiction.Efficient surveying, not only, implies to detect spectral homogeneous/heterogeneous regions, but also to properly separate the noise from outliers and then to induce the origins of suspicious areas.Therefore, approaches for modeling and detecting such outliers are drawing growing thinking (Hodge andAustin, 2004, Zimek et al., 2012).Generally, outliers mining is the question of identifying rare events, irregular individuals, and exceptions.Nowadays, it is seen as an emerging data interpretation which arouses a great interest in diverse application areas.Latterly, several works have been devoted to this problem and have tried to design effective techniques for outliers detection.It mainly concerns many fields such as fraud detection (Konijn and Kowalczyk, 2011), network security (Dogan and Dalkilic, 2010), data mining (Agyemang, 2006), etc. Formally, an Outlier is an object that deviates considerably from other objects.This fact incites suspicion that it possesses a different structure engendered by a divergent mechanism (Hodge and Austin, 2004) or generated by a different distribution.An outlier is then a sample that does not adhere to the the general nature of the data (mainly diagnosed as noise or exception) which is considerably fruitful for remote sensing data analysis.Overall,methods devoted for outliers detection can be mainly categorized into statistical based, depth based, distance based, density based methods (Pasha andUmesh, 2013, Bakar et al., 2006).A successful way for outlier detection is to explore the distance between a sample and it's nearest neighbors (Figure 1) (Hewahi and Saad, 2007).Recent comparative studies have shown the noteworthy effectiveness of distance based approaches.We shall discuss, in this paper, these methods in detail due to its association to the proposed approach.If the neighboring samples are approximately close, then the example is seen as regular.But otherwise (i.e, neighboring data are spaced apart), then the sample is seen as irregular.The contribution of distance-based mathods are that, no explicit distribution needs to be specified to regulate irregularity.Therefore, such methods are perfectly suited for any feature space including a reliable distance metric (Hodge and Austin, 2004).In this paper, we tackle this issue by detecting regions that contain heterogeneously classified pixels using an adapted class-outlier detection algorithm.While conventional approaches aim to detect exception cases in the scene independently of their class labels, our first contribution intents to find suspicious regions (pixels) by taking into account the class label.
For this purpose, the Class Outliers: DistanceBased (CODB) algorithm is enhanced to take into account the multivariate highdimensional data and the partially learning aspect of our data.

PROBLEM STATEMENT AND RELATED WORKS
Pixels or regions which can be observed as "Outliers" are anomalous due to different reasons (e.g.climate change, natural disaster, epidemic, etc.).After detection, anomalous points can be retained because they contain interesting information or can be discarded/deleted.Outlier detection methods are used, mostly, to reduce the impact and effect of outliers in the ultimate stage of the proposed model, or as a prior pre-processing step before the data is being processed.In more interesting applications, such as change detection and anomalies detection, the concept of outliers are more attractive and helpful to identify abnormal regions and outliers detection algorithms should be upgraded to properly locate them.By updating the aim of outlier processing for the case of remotely sensed images, traditional approaches are often not suitable to treat hyperspectral data.Hence, recent researches were interested on an adapted outlier detection for these kind of information.Specially, a significant number of contributions based on artificial intelligence and image processing have been proposed in order to develop new innovative approaches that can be more suitable in different application cases.Malpica et al. (Alonso and Malpica, 2009) propose an innovative technique for outlier detection in hyperspectral images.As well known, each pixel of the hyperspectral cude is associated to a spectral vector and electromagnetic spectrum.The authors develop an approach based on Projection Pursuit (PP) to detect potential anomalies.It is based on the use of linear combinations of the original features with the goal of maximizing an index representing an interestingness measure.The results show that PP technique can detect group of outliers or isolated outlier; the proposed algorithm was applied to AHS and HYDICE hyperspectral imageries.The first issue experiencing the reviewed works is that the outlier identification process depends on the underlying distribution of the dataset.Thus, this field is became a productive area of applied statistical research.One solution is to make the assumption that the distribution is univariate (following an approximately normal distribution) (Hodge and Austin, 2004).Nevertheless, with real hyperspectral multivariate dataset, this hypothesis is not satisfied, and the outlier identification process will be guided by the type of the data rather than the presence of an outlier.In fact, due to high number of bands, the big amount of data can result redundant and the most interesting information is difficult to extract because of the high dimensionality of data themselves.Smetel et al. (Smetek and Bauer, 2007) introduce the use of multivariate outlier detection approach for detecting anomalies in hyperspectral image data.They demonstrates the insufficiency of statistical methods for this end.Li et al. (Liu et al., 2014) adopt the use of outlier detection concept to detect small target un hyperspectral images.

PROPOSED APPROACH
Let's note by Z an hyperspectral image compsed of N pixels (samples).Each sample is assumed to belong to one of C classes.ζ the learning set is then defined as following : Each sample is characterized by an attribute vector x ∈ R p and its similarity measure to all other samples (proximity data).
The class membership of each object may be: • Completely known, described by class labels (supervised learning); • Completely unknown (unsupervised learning); • Known for some objects, and unknown for others (semisupervised learning).
A successful strategy to detect outliers is by considering the distances to an example's nearest neighbors (Knorr et al., 2000).In this approach, we precisely examine the local neighborhood of an object mostly defined by the K nearest examples.If the neighboring points are almost close, then the object is seen as regular; but if the neighboring points are far away, then the example is seen as irregular.
The distance-based outlier approach was introduced by Knorr and Ng (Knorr and Ng, 1998), where an outlier is considered as: "An object O in a dataset T is a DB(p, D)-outlier if at least fraction p of the objects in T remained at a distance greater than D from O", where D : neighboring set of an outlier ; and p is the minimum set of objects that should stay outside of D. In most cases, the Mahalanobis distance is used as outlying degree.
In this paper we investigate the adaptation of Class Outlier Mining formulated here as : given a set of hyperspectral pixels with class information, detect those that arouse suspicions, considering the neighborhood classes and the related spectral indices.
Based on the Class Outliers: DistanceBased (CODB) algorithm (Hewahi and Saad, 2007), the irregular pixels are those satisfying the following criteria : 1. has the minimum distance to its K nearest neighbors.
2. has the largest deviation ; 3. its class label differs from the K nearest neighbors class.
The originality of this algorithm is to consider that it is judicious to take samples having a class label which is different from the majority of the KNN while considering the Class Outlier Factor (COF) for a sample X defined as : (2) where : • P CL(X, K) is the probability of the class label of the instance X with respect to the class labels of its K nearest neighbors ; • Deviation(X) is the degree of deviation that makes the sample X from data of the same class, • KDist(X) is the sum of distances between X and its K nearest neighbors • α and β are parameters to manage the effect of Deviation(X) and KDist(X) In real world, only a limited knowledge of class information is available.This situation is a transitional issue between supervised and unsupervised learning known aspartially supervised learning.In this case, the class membership is commonly predicted with uncertainty and the probability P CL(X, K) may not be suited to deal with hyperspectral data.
To overcome this problem, we propose in this paper to adopt the classical CODB algorithm by using he theory of belief functions which is suitable for modeling the partially supervised learning problem and to better handle uncertain and imprecise class information (me et al., 2009).In fact, the theory of belief functions showed a major potential and a reliable framework for modeling uncertain and imprecise class information in related fields such as classification, unmixing and feature combination (Hemissi et al., 2012).

Real Data
The airborne hyperspectral image was acquired in the vicinity of Baden, Switzerland.The study area (Figure 3) is on the banks OSD data is provided along with ground truth information of 6 classes through a SPECCHIO spectral database (Kallepalli, 2014, Schaepman et al., 2015).The proposed approach is compared with CODE and ISODepth algorithms.To better evaluate our approach, the receiver operator characteristic (ROC) is generated by fluctuating the distance threshold.The area under the curve (AUC) is also chosen for accuracy assessment afterwards.Table 4.2 illustrates the AUC results for proposed approach compared to the chosen baseline methods.It can be remarked, that the proposed approach performs quite well compared to other techniques.It now remains the problem of interpretation of the outlier abnormality.Figure 4 shows the performance with different values of k.Besides comparing the outlier detection accuracy, also the run time of the algorithms is an important factor to take into account.
The mean execution time was taken using an Intel Xeon Nehalem processor.The proposed approach took 21.7ms on average and the reference method took 28.0ms to process the real data set.
In general, all nearest-neighbor methods perform very similar since the worst case this algorithms is the nearest-neighbor search (O(n 2 )).If mass function are computed beforehand, we noticed that the proposed approach is much faster than standard methods, especially on large data sets.

DISCUSSIONS AND FUTURE DIRECTIONS
The application of outlier detection algorithms finds its interest is several remote sensing applications.Outlier analysis has a tremendous scope for research, especially in the area of structural and multivariate analysis.In this paper, we stated that the essence of all outlier detection algorithms is the creation of a density, statistical or algorithmic model which describes the natural behavior of the data.The alterations from this model are considered as outliers and must be interpreted to access the irregularity causes.
We have also discussed the limited way in which the problem has been addressed in the literature.Hence, every unique problem formulation has a different specifications and requires an adapted approach, resulting in a large variety of algorithms.Each of this algorithms has been proposed to target a particular application domain.This survey can hopefully indicates some ways to map existing approaches to other application domains.We also concluded that a valuable domain-related knowledge of the data distribution and model is often important in order to design efficient and accurate approaches which do not overfit the underlying data.
When dealing with hypespectral images, the question of outlier detection becomes notably challenging.The main issues are the high dimentionnality, the significant relationships among pixels and the ralated uncertainty relative to class labels.Therefore, the modeling of learning set and the choice of an adaptable distance metric plays the key role in defining the outliers.
After outliers detection, the future direction of our work is to develop a knowledge-oriented process which must be investigated to access the sources of irregularity.This is done by giving a fruitful responses to the question : "What are the causes of abnormality?".In fact, yet limit attention has been paid to the problem of interpreting the abnormality causes; most related works focus on detecting and eliminating them.So, the main contribution concerns the problem of discovering the set(s) of attributes that account for the abnormality belong to a class within a given land-cover type.It's finding the minimal subset of features that explains the outlierness of a pixel, i.e., in which the pixel is still a doubtful observation.This will be achieved by proposing a knowledge discovery schema using the outlying subsets search algorithm for a class outlier (OSSA).This investigation can help the decision maker to restrain abnormality causes.

CONCLUSION
Outlier detection is an extremely fundamental issue with direct application in a wide variety of remote sensing fields.A preliminary notice observation with outlier detection is that it is not a well-explored and-formulated problem for remotely sensed images.The proposed approach in this paper alleviates the drawbacks of the "curse of dimensionality" on processing hyperspectral data where classical distance-based approaches often fail to afford better accuracy.Relative to the basic CODB algorithm, we proposed two contributions : a learning metric for distance computing which is more suitable for high-dimensional data sets, and a belief function for class label which is more suitable for partially learning problem and also for high-dimensional data.In a thorough evaluation, we demonstrate the effectiveness of our new approach to detect the right outliers with high precision and recall.Furthermore, the evaluation discusses efficiency issues and explains the influence of the runtime.

Figure 1 :
Figure 1: Difference between distance-based approach (LEFT) and Density-based approach (Right) Let now denote by Ω the set of classes and the learning dataset becomes : ζ = {(xi, mi), i = 1, . . ., N } (3) where : xi is the attribute vector of object xi and ci ∈ Ω.A potential outlier sample x is classically assigned to the majority class in Ω k (x), where Ω k (x) is the k nearest neighbors ox x in ζ.Each sample ei = (xi, mi) ∈ Ω k (x) is seen as a part of knowledge regarding Ω and the class of the sample x.The exactness of this evidence depends on the distance between x and xi.It may be illustrated by the following equations : mi({ci}) = α.φ(d(x,xi)), mi(Ω) = 1 − α.φ(d(x, xi)) (4) whereα is a constant and φ is a decreasing function from R+ to [0, 1] : lim d→+∞ φ(d) = 0 (me et al., 2009).So, the proposed version of the CODB algorithm is illustrated by algo.1.. Algorithm 1 Pseudo-code of the modified CODB algorithm Require: ι = {(xi, mi), i = 1, . . ., N }, K : Number of neighbors.1: for g = 1, . . ., N do 2: Compute mi({ci}) and mi(Ω), formula 4 3: Compute COF for all instances, formula 2 4: end for 5: return Resort the top P list according to their COF value.The second contribution is to adapt the classical distance metric in order to take into account the dimentionality of the hyperspectral data.This distance is computed based on some spectral indices relative to vegetation proprieties.The definition of spectral indices began with the Simple Ratio (SR) of bands.One of the most widespread and prevalent index for vegetation is the Normalized Difference Vegetation Index (NDVI) utilized the reflectance of the infrared and red regions to reveal the presence vegetation in the study zone.The retained spectral indices of our research are synthesized by

Figure 2 :
Figure 2: Results of applying PA to the synthetic images.

Figure 3 :
Figure 3: Study area, Baden, Switzerland of the Limmat River.The originality of this site is the diversity of natural and manmade covers (vineyards, pastures, forested regions, buildings, railways, roads, highways, etc.).The APEX Comparing PA performance (AUC) with baseline methods.PA(a):with Mahalanobis distance, PA(b):learned distance metric.