A TWO-STEP DECISION FUSION STRATEGY : APPLICATION TO HYPERSPECTRAL AND MULTISPECTRAL IMAGES FOR URBAN CLASSIFICATION

Very high spatial resolution multispectral images and lower spatial resolution hyperspectral images are complementary sources for urban object classification. The first enables a fine delineation of objects, while the second can better discriminate classes and consider richer land cover semantics. This paper presents a decision fusion scheme taking advantage of both sources classification maps, to produce a better classification map. The proposed method aims at dealing with both semantic and spatial uncertainties and consists in two steps. First, class membership maps are merged at pixel level. Several fusion rules are considered and compared in this study. Secondly, classification is obtained from a global regularization of a graphical model, involving a fit-to-data term related to class membership measures and an image based contrast sensitive regularization term. Results are presented on three datasets. The classification accuracy is improved up to 5%, with comparison to the best single source classification accuracy.


INTRODUCTION
Mapping urban environments requires very high spatial resolution (VHR) optical images (<5 m).Indeed, such spatial resolution is necessary to individualize and precisely delineate urban objects and to consider sharper geometrical details (e.g.Herold et al, 2003;Cleve et al, 2008).However, high spatial resolution sensors have generally a poor spectral configuration (i.e. usually three or four bands, RGB/RGB-NIR), limiting their ability to discriminate fine classes and limiting classification performances (Thomas et al, 2003;Carleer, 2005;Yu et al, 2006) compared to superspectral or hyperspectral (HS) sensors.Unfortunately, these latter generally exhibit a lower spatial resolution.To overcome the weakness of both sensors, multispectral (MS) and HS imageries could be jointly integrated to benefit from their complementary characteristics, so as to have; 1) rich geometrical and textural details to finely delineate objects, and 2) rich spectral information to efficiently separate the classes.Thus, the fusion of such sensors should enhance the classification performance at the highest spatial resolution.However, such fusion scheme would have to cope with both spatial and semantic uncertainties.Fusion of heterogeneous datasets (i.e.different spatial and spectral resolution), have been widely investigated in the literature (e.g.Pohl and Genderen, 1998;Schmitt and Zhu, 2016).The fusion procedure can be carried at three distinct levels: 1) observation level: the most popular method for that purpose is the Pan-sharpening.It consists in merging high resolution panchromatic (PAN) image with low resolution MS one to produce high resolution MS image.A review of such methods can be found in (Loncan, 2015).2) feature level: (e.g.Fauvel et al, 2007;Wegner et al., 2011;Ban and Jacob, 2013); consisting in applying a single classification using features extracted from both sources.For instance, Wegner et al. (2011) proposed a conditional random field (CRF) model for building detection using InSAR and orthophoto features.
This paper aims at designing a generic fusion method that could be applied to other sources than HS/MS, involving poor spatial resolution but rich semantic information sensor and very high spatial MS sensor.Thus, the fusion at observation level is not conceivable.Only the fusion at decision level is investigated in this paper.The challenge here is to merge complementary single's expert classifications to enhance them, in a case where training datasets are quite poor.
Fusion methods operating at the decision level can be applied in two situations, considering either they try to merge multiple classifiers applied to a same dataset, or multiple datasets.Concerning the second group, the aim is to take advantage of the information provided by multiple data sources.Fusion methods consisting in using all probabilistic outputs of several single source classifications as a new feature set for a classifier as in (Ceamanos et al., 2010), must be avoided here because of their strong requirements in terms of training dataset to prevent overfitting.Thus, this study will rather consider decision fusion rules, such as the ones based on probabilistic, fuzzy, possibilistic fusion or evidence theory.Benediktsson and Kanellopoulos (1999) combined neuronal and statistical Maximum Likelhood classifiers using several consensus theory rules (i.e.majority voting, complete agreement, CONSNN-NN) to classify MS and HS images.A characterization of the spatial organization of SAR images elements is investigated by Tupin et al. (1999) by merging the responses of multiple low-level detectors applied to the same image.The fusion is carried by Dempster-Shafer (Shafer, 1979) evidence theory rules.Fauvel et al. (2006) investigated the use of fuzzy decision rules to combine the classification results of a conjugate gradient neural network and a fuzzy classifier over an IKONOS image.Paisitkriangkrai et al (2015) combined convolutional neural networks and random forest classifiers using a multiplication scheme, the final result was regularized by a conditional random field (CRF).Jeon (1999), investigated the fusion of multi-temporal thematic mapper images, using decision fusion based methods (i.e.Jointly likelihood and weighted majority fusion).Waske and Benedikttson (2007) applied SVM to classify SAR and MS independently, the probabilities maps were then fused using different strategies (i.e.SVM's, majority voting, and absolute maximum rule).Hervieu et al. (2016) combined the classification results of HS and MS images, within a graphical energetic CRF model optimized using a graph-cut algorithm.This paper proposes a fusion scheme at the decision level to merge a low resolution LR-HS and a VHR-MS images.The method aims at dealing with both semantic and spatial uncertainties, and is based on two steps; decision fusion at pixel level and classification optimization through a global regularization framework.Several decision fusion rules are tested including fuzzy, Bayesian, margin and Dempster-Shafer based rules.The fusion is optimized in as second step by a graph-cut algorithm including a contrast sensitive spatial regularization term.

PROPOSED METHOD
The method is based on three main steps: a) classification of a HS and a MS images and generation of the posterior probabilities, b) fusion of the posterior probabilities at the decision level, and c) classification optimization (Figure 1).First, two input images are classified.A gaussian kernel SVM classifier (Vapnik, 1999) was used here, but other supervised classifiers could be used (e.g.random forest).The posterior probabilities are retrieved with the Platt's technique (Platt, 2000).At the end, a posterior class probabilities map is generated for each classification map.Secondly, a decision fusion is applied to these posterior class probability maps.A variety of rules were tested.These rules could be divided into 4 classes; fuzzy decision rules (i.e.Min, Max, Compromise, Prioritized, Accuracy dependent), Bayesian combination (i.e.Sum and Product based rules), evidence theory (i.e.Dempster-Shafer rule), margin theory (i.e.Margin-Max rule).These decision fusion rules will be detailed hereafter.They enable to combine two class probability maps into a more accurate one at the highest resolution.The last step consists in performing a global regularization of the classification map obtained at step 2, so as to deal with spatial uncertainties between both sources.A graphical model was used, involving a fit-to-data term and a contrast sensitive regularization term.This formulation has been used successfully for many purposes related to image fusion (e.g.Kolmogorov and Zabih, 2004).

DECISION FUSION
The decision fusion rules used in this paper are exclusively based on the class membership probabilities generated at pixel level by the classifier.The fusion is carried out pixel by pixel, combining the class membership probabilities of each source.For this study, ten decision rules have been tested; these rules are derived from different probabilistic, possibilistic and evidential theories.

Fuzzy rules
Introduced first by Zadeh (1965), fuzzy rules become popular tools which were often used to process uncertain data (e.g.Dubois, 1992;Fauvel, 2006).

Theory and general properties:
Let us consider a reference set ℒ of classes, then a fuzzy set in ℒ, is a set of ordered pairs: where  ∶ ℒ → [ , ] is called the membership probability (also called membership function) of in ℒ, it ranges a nonnegative real numbers, whose supremum is finite.The membership probability of the fuzzy set is a crisp function (i.e.real valued).The intersection of two fuzzy sets P A and P B is given by the minimum of their membership probabilities: ). (2) The union of two fuzzy sets P A and P B is given by the maximum of their membership probabilities: ). (3) The complement of a fuzzy set P A is given by: 3.1.2Measure of conflict between two sources; Let us consider two sources and , with the corresponding membership probabilities  and  , the conflict between these sources is quantified using Dubois & Prade measure (Dubois and Prade, 1994) (1-), where:  = Sup Min( ,  ).

Confidence measure:
To reduce the influence of unreliable information within each fuzzy set, a weight called pointwise accuracy as proposed by Fauvel et al. (2006).Considering a multi-image classification case, let us consider the fuzzy set  , with the number of source/classification images and a pixel belonging to a source .Rationally, a classifier is considered reliable if regarding a pixel , one class has high membership and the others are low, conversely, if more than one class presents a high membership, the fuzzy set will present a high degree of fuzziness, and the classifier will be considered unreliable regarding the pixel .Starting from this assumption, one can weight each fuzzy set by to reduce the influence of unreliable information as follows: where is the number of sources, and   the fuzziness degree of source .
Let us consider two sources and with class membership probabilities  and  .The pointwise measure was integrated in the fusion rules calculation, to favor the most reliable source; each membership is multiplied by the pointwise measure as follows:  = . ̃ ,  = . ̃ , where  ̃ ,  ̃ are the original membership probabilities, and , are respectively the corresponding pointwise measures.
-When the conflict between and is low (i.e.− K ≈0), the operator behavior is conjunctive.
-When the conflict between and is high (i.e.− K ≈1), the operator behavior is disjunctive.
-When the conflict is partial (i.e.< − K < 1), the operator behaves in a compromise way.
Let us consider the Compromise rule (9), where is normalized to a membership belonging to [0, 1] (i.e.division by K), unlike "Min( , −  )".Indeed, will be favored at  level.To deal with this, the next alternative Compromise rule was proposed.Let us consider = ∈ℒ  , and = ∈ℒ\  , the intra-membership conflict is measured as follow: = − , a threshold t c = 0.25 is proposed for the decision process: 4) Prioritized operators (Dubois and Prade, 1994) Prior 1 (Equation 10) and Prior 2 (Equation 11) rules are considered: For both operators, when the conflict between and is high (i.e. ≈ ),  contradicts  , and only the information provided by  is taken into consideration,  is considered as a specific piece of information.Contextual dependent operators are more adequate to deal with conflictual situations than conjunctive and disjunctives operators (i.e.Bloch, 1996;Fauvel, 2006) which are ill suited to handle the conflict.

5)
An Accuracy Dependent (AD) operator (Fauvel, 2006), integrating both local and global confidence measurements: where is the global confidence of source regarding class ,  a class membership of source , and a normalization factor (see section 3.1.3).This operator ensures that only reliable sources are taken into consideration for each class, via the predefined coefficients .The idea seems interesting, nevertheless, the final result will be dependent on the classifier reliability and also on the ground truth availability since it is necessary to generate term.

Bayesian combination
In addition to fuzzy rules we used simpler Bayesian and  combinations of membership probabilities (e.g.Bloch, 2006).Each membership is multiplied by the pointwise measure as for (see section 3.1.3).This permits to evaluate the limit of such operators compared to more complex combinations.The fusion is carried using a Bayesian Sum (Equation 13) and product (Equation 14) operators as follows:

Margin-based rule (Margin-Max)
Let's consider two sources and , where  = { , }, and . Let  , be the pointwise membership probability of pixel to a class , according to a source .The margin of source in is: Once the margin defined, different fusion rules could be defined based on it.For this study, a Max-Margin fusion method is tested.The aim is to combine the sources, conserving in each pixel the most confident source.To calculate the combined membership probabilities of two sources and , where  = { , }, and different classes where = ∈ .

Dempster-shafer (DS) evidence theory based rule
In DS formalism, the information from a source for a class is represented by a mass function | ∈ [ , ] (Shafer, 1976).The evidence theory implies the use of simple classes ∈ ℒ as well as compound classes (i.e.′ = … .
). Composed classes were here limited to the union of -at most-2 simple classes (i.The masses are then normalized as: ∑ ∈ℒ′ = .DS conflict measure between two sources and , is computed as follow: where , ∈ ℒ′, are compound classes with = ∅.At the end, probability masses are merged as follows:

Global regularization
This section is dedicated to the global regularization model used as a post-processing step to enhance the classification fusion performance.The problem is expressed using an energetic graphical model and solved as a min-cut problem.
The global energy is minimized using a graph-cut method named quadratic pseudo-boolean optimization method (QPBO) (Boykov and Kolmogorov, 2004) associated with, an -expansion routine to deal with the multi-class problem.

Model definition:
The energy term is composed of a data term  and a regularization term  , the model from Hervieu et al. (2016) was adapted to deal only with classification rectification instead of fusion.It uses a graphical model, where the energy model is a probabilistic function of the posterior probability  .For a classification map , the energy term is written following Equation ( 20): where : [ is a tradeoff parameter between data and regularization terms,  is the 8 connexity neighbors.
is a fit-to-data attachment term, function of the probability map  which models the result of the classification fusion, and defined by the function , as follows: The function ensures that if the probability for a pixel to belong to class is close to 1,  will be small, and will not impact the total energy .Conversely, if the probability for a pixel to belong to class is low,  will be near its maximum, and will penalize such configuration. is a regularization term defining the interactions between a pixel and its 8 neighbors, and enabling to smooth the initial classification map , by favoring the neighboring pixels to belong to the same class (i.e.minimizing energy).A Potts model (Schindler, 2012) is a typical configuration defined as follows: In this study, a slightly enhanced Potts model is used.The model integrates contrast information (Hervieu et al, 2016) that is included in the MS image  and verifying: Where, ∈ [ , ∞[ is a tradeoff parameter between the smoothing criterion and the importance of in the model,  is a contrast measure, is a tradeoff between the basic model led by the decision fusion classification , and the integrated contrast term  , ,  , and  a parameter modifying the standard deviation in the exponential term.The contrast term  (Rother et al, 2004) is calculated as follows: where  , = exp

Parameters setup:
Given the energy term E (see Equation 20), 4 parameters controlling the regularization degree are used; , , , and .Each of these term is attached to a particular sub-terms of E.  ∈ [ , ∞[ is a tradeoff parameter between the terms  and  .The more  increases the more is the regularization effect, the choice of this parameter will depend on the distribution of the decision fusion map to be optimized.∈ [ , ] is a tradeoff parameter between the basic energy model and the rectified model integrating the contrast measure.Last,  ∈ [ , ∞[ is a parameter controlling the influence of the contrast measure in the energy term.A Potts model is obtained using the following parametrization: The parameters were fixed otherwise to a Potts configuration which over-smooths the decision fusion classification: = ., = , and  = .For , two configurations were tuned:  = 0.1 for the Min and Dempster-Shafer rules, and  = 10 for the Compromise rule.This configuration was a good tradeoff and gave the results presented further in this paper.

Data
Three datasets were used over the cities of Pavia (Italy) and Toulouse (France) (Figures 2.a-c).For all the datasets, a SVM classifier was trained using 50 samples per class extracted from the images.Concerning Pavia city, two datasets called "Pavia University" and "Pavia Center" were used; these datasets have respectively 103 and 102 spectral    considering the HS classification map in the fusion process, this explains their better accuracy (Table 1).The Min rule acts in cautious way when taking the best of the lowest memberships, the Compromise rule acts depending on the degree of conflict between sources.The Bayesian Product rule is a good and simple tradeoff if the initial classification maps are not highly conflictual, otherwise, the result will be degraded by wrong information.Concerning Pavia Center, all the rules seem accurate (Figure 4.c, example: Dempster-Shafer rule), with an overall accuracy superior to 98% (Table 1).When inspecting the classification images, all rules gave similar good results excepting Prior 1, showing a result guided by the HS classification map rather than the MS one.
Toulouse dataset, is the largest one, with up to 15 classes.Thus, the accuracies are lower.The best results were given by the Max, Prior 2, Bayesian Sum and Dempster-Shafer rules.In practice the Max, Prior and Sum rules seem to overestimate certain classes, especially tile roofing's and vegetation, the best rendering in terms of classification fusion, is given by the Min, compromise and Dempster-Shafer rules.Despite a reasonable global accuracy, the AD rule presents many misclassifications regarding tiles roofing (i.e.underestimation), metal roofing 1 (i.e.overestimation), and a bad detection of the gravel roofing, this is mainly due to the global accuracy measure which is included in the rule and calculated thanks to the ground truth data.
As a conclusion, the quantitative accuracies doesn't necessarily transcribe the real potential of the fusion rules, and the best ones from a quantitative and practical qualitative point of view are; the Compromise, the Bayesian Product and the Dempster-Shafer rules.

Global classification regularization:
Once the decision fusion done, a global regularization procedure was applied to enhance the classification results and eliminate the artefacts.Table 2 presents the optimization results for the best rules per dataset, the optimization procedure permits to enhance further the classification.Quantitatively, global regularization process permits to slightly enhance the decision fusion classification (by 1-2%) but offers a better visual rendering with an elimination of the artefacts, a better decimation of the classes borders, and a regularization of the scattered pixels (Figures 3.d,4.d,5.d).These optimized maps seem better modeling the real scene.The optimization effect is more visible over Pavia University and Toulouse Center, concerning Pavia Center, the decision fusion gives already good results and thus, the optimized maps are only slightly improved (Table 2).Quantitative accuracy results obtained over the Pavia datasets are comparable to other studies (e.g.Fauvel et al, 2007).For Pavia University; the painted metal sheets are better recovered and no mismatches with the surrounding road are noticeable.The proposed method permits to extract some bitumen buildings that were difficult to differentiate from roads (i.e. up right and down right, Figure 2.b), the gravel buildings could be better refined.For Pavia center, the global rendering is enhanced with a minimization of the classification artefacts.65.4 55.9 68.0 70.9 Table 2. Classification accuracy of images HS and MS separately, after decision fusion, and after global regularization.

CONCLUSION
This paper proposes a two-step method dealing with multisource data fusion and global regularization.Several decision fusion methods were tested and compared, including fuzzy rules, Bayesian combinations, margin-based rule, and Dempster-Shafer based rule.Among the fuzzy rules, the Min and Compromise rules are the most efficient.The Max rule is often affected by misclassifications due to the fact it pays more confidence to the highest membership.The prioritized rules favor a source rather than the other, indeed, the reliability is not ensured as noticed for Prior 1, which gives confidence to the less reliable source.AD rule accuracy is dependent on the ground truth reliability; the rule gives encouraging results for Pavia datasets, the accuracy was not sufficient for Toulouse dataset.The Bayesian Sum and Product rules could be interesting in case of low conflict between sources, since they give acceptable results over Pavia Center and Toulouse.Concerning the proposed margin-based rule, it performs well over Pavia center, and correctly over Toulouse, but it is not enough sufficient over Pavia University.Finally the Dempster-Shafer rule gives homogeneous performance over the three datasets leading to interesting results.
Even if the decision fusion enables to increase the classification accuracy compared to the initial classification maps, the results are affected by classification artefacts, and unclear borders, the final maps are either guided by one of the initial maps or by both, the final result is, therefore, a better version of the initial maps.The final step is a global regularization optimization of the decision fusion results, for classification enhancement.The principle is based on regularization regarding each pixel membership and its neighborhood, and regarding an image contrast measure when comparing neighboring pixels.The optimization procedure gives encouraging results, with clear borders among the different classes, and artefacts elimination.
The method also has the possibility to integrate other decision rules in a fully tunable way, the optimization model is simple and flexible and could be tuned depending on the used dataset.Further work will investigate the explicit use of conflict measures from fusion step within the regularization framework.At the moment the optimization parameters selection is rather manual, some automation could be included, and other contrast measures could be tested to improve the accuracy.The proposed fusion scheme is quite generic and could also be applied to other similar configurations, such as low spatial resolution times series and VHR-MS data.For this study, the HS image was generated from the VHR-MS one leading thus to quite optimistic results.Real cases will be tested, including temporal differences between two images.

Figure 1 .
Figure 1.Two-step strategy for multi-sources data fusion.
e. ′ = ), let's note ℒ′ the new classes ensemble.Masses are associated to each class as: -∅ =0 -Simple classes: ∀ ∈ ℒ, ∀pixel , and ∀ ∈  , = , where is the mass affected to class by source , and  is a pointwise membership probability of the considered class.-Compoundclasses: The compound classes masses were here generated as follows: ∀ , ∈ ℒ, ∀ pixel , and ∀ ∈ bands from 430 to 860 nm.Pavia University is a 335 x 605 image, Pavia Center is a 715 x 1096 image, and both have a ground sample distance (GSD) of 1.3 m.Both Pavia images are composed of 9 land cover classes (Figures 2.e, 2.f): Asphalt, Meadows, Gravel ,Trees, Painted Metal Sheets, Bare Soil, Bitumen, Self-Blocking Bricks, Shadows for Pavia University and Water, Trees, Meadows, Self-Blocking Bricks, Bare soil, Asphalt, Bitumen roofing, Tiles roofing, Shadows for Pavia Center.For both datasets, MS and HS images were generated.MS images were generated using Pleiades satellites spectral configuration (limited to three RGB bands), with a GSD of 1.3 m, while HS images were resampled at a lower spatial resolution of 7.8 m and at the full original spectral range (i.e.103 and 102 bands).To sum up, both datasets include a MS image (RGB with 1.3 m GSD), and a HS image (Full spectral range with 7.8 m GSD).The last dataset called "Toulouse Center", is a scene over Toulouse city in France.It has 405 spectral bands ranging from 400 to 2500 nm, and an initial GSD of 1.6m.Its associated land cover is composed of 15 classes (Figure 2.d): Slate roofing, Asphalt , Cement, Water, Pavements, Bare soil, Gravel roofing, Metal roofing1, Metal roofing 2,Tiles roofing, Grass, Trees, Railway tracks, Rubber roofing, Shadows.MS and HS images were created for the fusion purpose; a MS image using Pleiades satellite spectral configuration (four RVB -NIR bands), with a GSD of 1.6 m, and a HS image which is a resampled version of the original image at a lower spatial resolution of 8 m.To sum up, Toulouse dataset has a MS image (RGB with 1.6 m GSD), and a HS image (Full spectral range with 8 m GSD).

Figure 2 .
Figure 2. RGB Datasets and corresponding ground truth; (a),(d) Toulouse center, (b),(e) Pavia University, (c),(f) Pavia Center.4.2 Results and discussion 4.2.1.Sources comparison: MS image is characterized by a high spatial resolution and a few bands, while the HS one has a low spatial resolution and a hundred(s) of bands.The SVM classifier was applied over these images leading to 1) a sharp objects delineation in the MS image due to its good spatial resolution, but also a lot of artefacts (Figures 3.b, 4.b, 5.b), and 2) a good discrimination of the different classes in the HS image but also a blurry objects delineation due to its low spatial resolution (Figures 3.a, 4.a, 5.a).The corresponding classification accuracies are listed in Table 2: the accuracies are better using HS image.
, 4.b, 5.b), and 2) a good discrimination of the different classes in the HS image but also a blurry objects delineation due to its low spatial resolution (Figures 3.a , 4.a, 5.a).The corresponding classification accuracies are listed in Table 2: the accuracies are better using HS image.4.2.2.Decision fusion classification: 10 different decision fusion rules were first tested and compared over the 3 datasets, the quantitative results of Table 1 let us consider that the Compromise, Bayesian Product, Margin-Max and Dempster-Shafer rules are the most efficient.The comparison must also take into consideration the visual inspection of the results, as ground truth data remains very limited on these datasets.For Pavia University, four of the best accuracies were reached for Min, compromise, Bayesian Product, and Dempster-Shafer rules.In practice, the Min/Compromise rules, gives the best classification rendering, especially regarding class Self-Blocking Bricks which is a conflictual class (Figures 3.a, 3.b, Magenta color class), the two other rules seem to overestimate this class, and are more

Figure 3 .
Figure 3. Pavia University classification results using the best decision fusion rule; (a) SVM classification of HS image, (b) SVM classification of MS image, (c) Classification fusion by Min rule, (d) Global classification regularization.

Figure 4 .Figure 5 .
Figure 4. Pavia Center classification results using the best decision fusion rule; (a) SVM classification of HS image, (b) SVM classification of MS image, (c) Classification fusion by Dempster-Shafer rule, (d) Global classification regularization.

Table 1 .
Classification accuracy after fusion procedure, 10 fusion rules at decision level are compared.