SEMANTIC SEGMENTATION OF REMOTE SENSING IMAGERY USING OBJECT-BASED MARKOV RANDOM FIELD BASED ON HIERARCHICAL SEGMENTATION TREE WITH AUXILIARY LABELS

In the remote sensing imagery, spectral and texture features are always complex due to different landscapes, which leads to misclassifications in the results of semantic segmentation. The object-based Markov random field provides an effective solution to this problem. However, the state-of-the-art object-based Markov random field still needs to be improved. In this paper, an object-based Markov Random Field model based on hierarchical segmentation tree with auxiliary labels is proposed. A remote sensing imagery is first segmented and the object-based hierarchical segmentation tree is built based on initial segmentation objects and merging criteria. And then, the object-based Markov random field with auxiliary label fields is established on the hierarchical tree structure. A probabilistic inference is applied to solve this model by iteratively updating label field and auxiliary label fields. In the experiment, this paper utilized a Worldview-3 image to evaluate the performance, and the results show the validity and the accuracy of the presented semantic segmentation approach.


INTRODUCTION
Semantic segmentation is defined as a multi-label classification problem (Hu et al., 2018), which aims to assign category labels to each pixel on the image. It contains two tasks, image segmentation and target recognition. Semantic segmentation is of great significance to the understanding and analysis of the imagery and is widely applied in various fields, such as automatic driving, land-use and land-cover classification. Semantic segmentation of remote sensing imagery can be regarded as a typical multi-class image classification (Chen et al., 2013) by assigning predefined semantic classes to a remote sensing image. With the development of remote sensing satellite technology, the temporal, spatial and spectral resolution of remote sensing image have improved gradually, which makes the information obtained from remote sensing image more and more abundant. Due to a large number of categories in the natural scene, some show a high degree of similarity in the spectral or texture features, so how to accurately recognize and distinguish different landscapes is still a challenge in the research of remote sensing. The semantic segmentation of remote sensing image provides an effective solution for image retrieval and analysis.
There are two groups of semantic segmentation methods for the remote sensing imagery, traditional method and deep learning method (Hu et al., 2018). The deep learning method mainly utilizes Convolution Neural Network (CNN) (Pedro H. O., Collobert, 2013, Yu et al., 2018, Fully Convolutional Network (FCN) (Long et al., 2014 and other improved neural network methods (Paszke et al., 2016, Kampffmeyer et al., 2016 to extract features and obtain semantic segmentation results. Traditional methods include Support Vector Machine (SVM) (Gupta et al., 2013, Huang, Zhang, 2013, Random Decision Forest (RDF) (Gupta et al., 2013, Hermans et al., 2014, Markov Random Field (MRF) (Zheng, Wang, 2015, Zheng et al., 2017, Zheng et al., 2019 and Conditional Random Field (CRF) (Volpi, Ferrari, 2015, Thøgersen et al., 2016, etc. Compared with the deep learning method, the application of the traditional method in remote sensing imagery semantic segmentation is more conducive to analyze the physical information of landscapes. MRF is an effective method for semantic segmentation in traditional methods. One of the classic problems of the MRF is the Maximum a Posterior (MAP) (Kollar, 2014) estimation of the state vector. Based on the given parameters of model and data sets, the most likely state of the sequence is estimated through the posterior distribution. In the application of image semantic segmentation, the algorithm obtains the optimal result by solving MAP estimation of the label field. In semantic segmentation, the MRF model makes full use of spatial information constraints, which is suitable for spectral and texture processing of remote sensing imagery.
The classical method of semantic segmentation based on the MRF model is Pixel-based Markov Random Field (PMRF) (Geman, S, 1984, Besag, 1993. This method is defined on pixels and is utilized to measure the similarity between pixels. The most advantage of the model is its regular context, which is convenient for spatial relationship description and model solution. However, due to the improvement of spatial resolution of remote sensing imagery, the classical model is not suitable for capturing complex macroscopical features, and the calculation is time-consuming (Zheng, Wang, 2015). The Multi-resolution Markov Random Field (MRMRF) (Noda et al., 2002, Zheng et al., 2010 defined on the image pyramid structure extends the classic PMRF model. The method improves the computational efficiency and extends the descriptive spatial features to a certain extent, but it is still a pixel level Markov Random Field. With the application and development of Object-Based Image Analysis (OBIA) in remote sensing image, the Objectbased Markov Random Field (OMRF) (Yu, Clausi, 2008, Wang, Zhang, 2009, Blaschke, 2010, Zhang et al., 2017 has been widely used. A image is divided into several over-segmented regions, then the relationship between regions is expressed as Region Adjacency Graph (RAG), and finally the semantic segmentation is completed by the MRF model. The OMRF model is defined based on RAG, which breaks through the limitation of the pixel level model in the description of spatial features, and can better capture the macroscopical features of the image.
The OMRF used in the semantic segmentation of remote sensing imagery still faces many challenges. For example, there will be some misclassification if the OMRF does not fully emphasize spatial relationships. On the contrary, if spatial relations are overemphasized, the OMRF model will lead to oversmooth results. Hierarchical processing provides an effective strategy for spatial relationship analysis (Marfil, Bandera, 2015). With the application of multi-scale strategy and object-based method in remote sensing imagery segmentation, the OMRF and the MRF model based on hierarchical information is widely used in the process of remote sensing imagery semantic segmentation. Zheng et al. (Zheng et al., 2019) established the probability graph with a multilayer structure, integrates pixel label and object label to build the hybrid label field, and utilized joint distribution to capture the isotropy of the same layer and the anisotropy of different layers. In the iterative update of hybrid label field, multiple-granularity is integrated and interaction between different granularity layers is achieved. In the other article (Zheng et al., 2017), they defined two auxiliary labels in different categories, and built the conditional probability distribution of label field and auxiliary label fields.
Inspired by the interaction between layers based on the conditional probability distribution model (Zheng et al., 2017), this paper establishes the Object-based Markov Random Field based on hierarchical segmentation tree with auxiliary labels (OMRF-HA) to realize the semantic segmentation. In the hierarchical segmentation tree, there is a good corresponding relationship between objects in different layers, hence the classification reference of the segmented objects between adjacent layers is more reliable. When the model updates the label field and auxiliary label fields for the iterative probabilistic inference, the auxiliary label fields will have a more positive impact on the label field.
The remaining of this paper is organized as follows: Section 2 introduces the details of the proposed approach. The experimental results, analysis and evaluation are presented in Section 3. Finally, the conclusion is given in Section 4.

METHODOLOGY
This paper proposes an object-based Markov random field model with auxiliary label fields based on hierarchical segmentation tree structure. This algorithm first segments the remote sensing imagery into objects and establishes the object-based hierarchical segmentation tree. Based on the hierarchical tree structure, the OMRF model with auxiliary label is built. The result of semantic segmentation is obtained by iterative updating the label field and auxiliary label fields. Post-processing is applied to remove noises. An illustration is shown in Figure 1.

Object-based Hierarchical Segmentation Tree
Considering the homogeneity of regions and edge features, the watershed algorithm (Li et al., 2010) is utilized to divide the remote sensing image I into initial over-segmented regions. The initial segmentation set is recorded as R and the number of objects is n. Each segmentation region is recorded as Ri(i = 1, 2, . . . , n) and there is no intersection between divided objects. The initial segmentation regions are gradually merged to build a hierarchical segmentation tree (Wu et al., 2019) by using a merging criteria (Hu et al., 2013) combining color and textural features with spatial constraint. The level set of the hierarchical segmentation tree is H, the number of levels is m and the level is recorded as Hj(j = 1, 2, . . . , m). Each node on the hierarchical segmentation tree corresponds to each segmentation object on the remote sensing imagery. The The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B3-2020, 2020 XXIV ISPRS Congress (2020 edition) nodes of H1 corresponds to the objects of the initial segmentation result. Each node in the upper layer corresponds to a segmentation object obtained after j − 1 times merging. As shown in Figure 2, (b-1)-(b-3) corresponds to the segmentation image at node i-iii respectively. The position of the segmented object(I) corresponding to the upper node i on the image will fully cover the segmented objects (II and III) corresponding to the lower nodes ii and iii. After m − 1 merging, the initial segmentation objects will be merged into one object covering the whole image, corresponding to the node on the top level of the hierarchical segmentation tree.

OMRF Model
The OMRF model first initializes the label field X and assigns a random label field Xi(i = 1, 2, . . . , n) for each region Ri.
x = {xi|i = 1, 2, . . . , n} is marked as the realization of X. MRF model uses the MAP estimation to transform the problem of semantic segmentation into the optimal realization of the given observation image (Xia et al., 2006). The optimal result of semantic segmentation is obtained through the estimation of P (x|I) for the optimal realizationx. The posterior probability is solved by Bayesian formula and the optimal realization (Zheng, Wang, 2015) is simplified as follows: In the Equation 1. the likelihood function P (x|I) shows the conditional probability of the image which belongs to a realization x. The joint distribution P (x) is utilized to model the spatial interactions between objects and possesses the Markovianity property in the MRF model.
where Ni is the adjacent objects set of Ri. According to the Hammersley-Clifford theorem (Kollar, 2014), P (x) obeys the Gibbs distribution, where, Z is the normalizing constant Z = x exp(−U (x)), and U (x) denotes the energy function which is equal to the sum of the clique potentials U (x) = c∈C Vc(x) over all possible cliques C. Based on multilevel logistic(MLL) model defines Vc(x): Ii denotes a vector composed by the spectral value of pixels in Ri. P (Ii|xi) is assumed to obey the Gaussian distribution. µ l and l (l∈{1, 2, . . . , k}) denote the mean and variance, respectively.
where Θ u l and ∆ u l are the mean and variance of Gaussian distribution for auxiliary label field ax u l (l ∈ Θ u , u = 1, 2).

OMRF-HA Model
OMRF-HA model selects segmentation objects for the label field and auxiliary label fields based on hierarchical segmentation tree structure. The proposed model chooses a layer in hierarchical segmentation tree as the label field, and the upper and lower levels near it as the auxiliary label fields. If we build too many layers in the hierarchical segmentation tree, there will be little difference between the label field and auxiliary label fields, which will weaken the constraint of auxiliary label fields to the label field. On the contrary, if the number of layers is small, there will be great differences between adjacent levels.
Once the regions of different categories are merged in the highlevel, large errors will be produced in the corresponding auxiliary label, which will also have a negative impact on the label field. In this paper, we select an appropriate segmentation layer and control the number of segmentation layers to obtain a satisfied label field and auxiliary label fields. As shown in Figure  3, the red lines i, ii and iii in (a) are shown as the selected layers corresponding to the segmentation images (b-1), (b-2) and (b-3), respectively. The black dotted lines are shown as the corresponding relations of the orange objects between these layers. (b-1) and (b-3) are selected as auxiliary label fields and (b-2) are selected as the label field. Our approach uses the same number of classes for three fields. Pixels in auxiliary label field layers and the label field layer get labels from the same label set Θ = {1, 2, . . . k}, (in the Figure 3, k = 3).
For the auxiliary label field corresponding to highest level in three selected layers, each object is randomly assigned at frist, that is, the initial value of the pixel in each object is the same. The other auxiliary label field and the label field are marked as same value with it. The optimal realizations of the label field and the auxiliary label fields are calculated iteratively according to Equation 7 and Equation 8, respectively. Due to the weakening of randomness for MRF by object-based method, in the process of iteration, pixel-based operation is carried out. The semantic segmentation results are finally produced after postprocessing.

EXPERIMENT
Sub images captured from a Worldview-3 image were used to evaluate the effectiveness of the proposed approach. The Worldview-3 image was acquired on 5 January 2017, and it covers part of urban areas in Shenzhen City and Hong Kong City, China, with different types of land covers, like residential areas, roads, mangroves and water. It has multispectral band (1.2m/pixel) with eight spectral channels (coastal blue, blue, green, yellow, red, red edge, near-infrared 1 and near-infrared 2). 3339 pixels were labeled identified manually as seven categories (artificial structure(including ships, cars and buildings), shadow, road, lawn, wood, water, mud).
Through experiments for remote sensing images, OMRF-HA model could not only highlight the details on the image, but also remove noises to some extent. We compared the semantic segmentation results of PMRF, OMRF and OMRF-HA models before post-processing with the same bands, iteration times and classification numbers.
As shown in Figure 5, Compared with the classification of PMRF on the shadow of buildings, OMRF and the proposed method could get shadow with more complete shape, and the extraction effects of vegetation and shadow are also better. Compared with OMRF, PMRF and OMRF-HA are more complete on road classification with less noises. In the red box at the lower-left corner, PMRF and OMRF-HA could show the differences between shadows of wood and water, while PMRF produces more noises.
The experimental results show that the average accuracy of the method is 90.44% for the classification of the artificial structure. In Figure 6(a) and Figure 6(b), there is a large homogeneous building area, so that the classification accuracy of the artificial structure is high (98.48%). In Figure 6(c), although the buildings are dense and incoherent, they are arranged regularly and the difference of spectral values is small with the accuracy of classification results as 92.45%. In Figure 6(d), the spectral values of buildings at the sunny side and shade are quite different and ships may carry mud and sand in Figure 6(c), and thus the classification precision of artificial structure are lower which are 89.93% and 72.13%, respectively. The spectral texture values of mud and water are relatively single, so they can be distinguished accurately (97.67% for mud and 94.20% for water).
The results of lawn and wood show well. In the Figure 6(b), the lawn in the middle of the road can also be correctly classified. The average accuracies of these two classes are all more than 90% on test images, and the highest accuracy of wood is 99.33%. However, at the edge of woods, with the decrease of the leaf density, the spectral and texture characteristics decrease as well, so that the misclassification appears.
Compared with other categories, shadow and road detection results show common. In Figure 6(e), there are many obvious characteristics of woods under the shadow which causes many errors. Some misclassifications of road and shadow appear in Figure 6(b). However, it's remarkable that the method can distinguish roads with long and thin shape in Figure 6(d), and also get results of road detection with less noises in in Figure 6(c).

Conclusion
This paper establishes the Object-based Markov Random Field based on hierarchical segmentation tree with auxiliary labels (OMRF-HA) to realize the semantic segmentation. This algorithm first segments the remote sensing imagery into objects and establishes the object-based hierarchical segmentation tree. Based on the hierarchical tree structure, the objectbased Markov Random Field model with auxiliary label is built. Then the iterative updating process of Markov random fields is implemented. The results are finally obtained, after the postprocessing combining geometric morphology, spectral features and texture features.
We build the label field layer and auxiliary label field layers on the hierarchical segmentation tree which can enhance the positive effect of auxiliary field labels on the label field layer via the constraint relationship of objects on the hierarchical segmentation tree. Considering the influence of object-based segmentation method on the reduction of sample size for Markov Random Field, this paper combines the PMRF with the OMRF semantic segmentation. Through experiments for seven different categories landscapes on remote sensing image, the performances of the method for high-resolution remote sensing image semantic segmentation were analyzed and compared.