RANDOM PROJECTION BASED BIAS-CORRECTED FUZZY C-MEANS ALGORITHM FOR HYPERSPECTRALREMOTE SENSING IMAGE SEGMENTATION

To address the issue of the information redundancy for hyperspectral remote sensing image, this paper presents a novel ensemble algorithm that merges Random Projection (RP) and Bias-corrected Fuzzy C-means (BCFCM) algorithm. Since RP matrix has the abilities of preserving information nicely, it can be used to reduce the dimension of the image. To make full advantage of neighborhood relationship, BCFCM algorithm is improved to segment the low-dimensional image, in which Euclidean distances are retained to define the similarity between hyperspectral remote sensing image and the low-dimensional image. Finally, BCFCM algorithm is used to segment the fuzzy membership matrix of the ensemble algorithm. The proposed algorithm is evaluated by real Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) hyperspectral remote sensing images. Segmentation performance is estimated by kappa coefficient and overall accuracy. Experimental results demonstrate that the proposed algorithm can achieve higher segmentation accuracy at a lower computational cost than that from conventional algorithms.


INTRODUCTION
Due to the consecutive and massive spectral bands information, it is a large challenge to segment hyperspectral remote sensing image. For this reason, dimensionality reduction for hyperspectral remote sensing image has been playing an increasingly significant role in hyperspectral image segmentation (Xia, Chanussot, Du, He, 2017, Li, Prasad, Fowler, 2013, Shabna, Ganesan, 2014. However, it is difficult to represent the data structure in which data points are very sparse in high dimension space and has a problem of curse of dimensionality (Pedram, Chen, Zhu, 2017). Thereby, dimensionality reduction is an effective way to segment hyperspectral remote sensing image. There are kinds of techniques for dimensionality reduction. The most prevalent one of them is Principal Component Analysis (PCA) algorithm (Tipping, Bishop, 2010, Vidal, Ma, Sastry, 2012, which needs to calculate the largest variance direction of data. Nevertheless, such algorithm has heavy computational burden due to eigenvalue decomposition and lack in any data structural guarantee. Inspired by the well-known Johnson-Lindenstrauss (JL) lemma (Johnson, 1984, Frankl, Maehara, 1988, Matousek, 2008, Random Projection (RP) has a great impact on providing feasible mapping. The Euclidean distances of the data points between the original high-dimensional space and the lowdimensional space reduced by RP are approximately preserved (Achlioptas, 2001, Li, Hastie, Church, 2006, Jeremy, Martin, 2002, Achlioptas, 2003, Bezdek, Ye, Popescu, 2016. Moreover, RP is more computational tractability for hyperspectral remote sensing image than the PCA and does not introduce a significant image distortion (Bingham, Mannila, 2001). However, RP is extremely unstable, that is, different projections will produce different segmentation results (Fern, Brodley, 2003, Avogadri, Valentini, 2009. To this end, Popescu et al. (2015) proposed a simple algorithm based on RP and Fuzzy C-Means (FCM) for big data clustering (RPFCM). RP is exploited in the algorithm to generate multiple subsets into a low-dimensional space from the original data.
Hyperspectral image segmentation algorithm generally is a prerequisite for kinds of hyperspectral remote sensing image classification. At present, one of the commonly used segmentation algorithms for hyperspectral remote sensing image is Fuzzy C-Means (FCM) algorithm (Hichri, Ammour, Alajlan, Bazi, 2014), which is an extension of the classical clustering algorithm. When it comes to the FCM algorithm, although it is suitable for segmenting most hyperspectral remote sensing images within an acceptable accuracy range, it is very sensitive to noise and cannot accurately identify the boundary of the area. To improve segmentation accuracy, Xu et al. (1997) proposed a new and adaptive FCM technique based on compensating for intensity heterogeneities. However, it is extremely sensitive to a certain number of salt and pepper noises (Pham, Prince, 1999). Taking into account the noises and intensity heterogeneities, Miyamoto et al. (1997) proposed a novel FCM technique by adding a regularization term in its objective function that is a coefficient of the regularization to indicate the fuzziness of the objective function. Still, there are many misclassified pixels in the algorithm. To solving the above problems, Ahmed et al. (2002) proposed a novel Bias-Corrected FCM (BCFCM) algorithm, which modifies the objective function of the standard FCM algorithm by combining the centre vector and its immediate neighbourhood vectors. The BCFCM algorithm is one of the few available methods for hyperspectral remote sensing image segmentation that naturally integrates spatial and spectral information. In this paper, a segmentation ensemble algorithm based on RP and BCFCM algorithm is proposed. Firstly, RP is used to find an effective representation of spectral information and data structure in a low-dimensional space. Secondly, the BCFCM algorithm is designed to segment the low-dimensional image, which assigns degrees of membership in several clusters to each vector. Then, fuzzy clustering algorithm is exploited to segment the fuzzy membership matrix of the ensemble algorithm. This algorithm is a new framework and BCFCM uses Euclidean distance guaranteed by RP algorithm. Finally, the experimental results show that the proposed algorithm can be effectively applied to hyperspectral remote sensing image segmentation. This paper is organized as follows. In section 2, the proposed algorithm is described. The experimental results and discussion is given in section 3. Moreover, section 4 describes conclusions.

RANDOM PROJECTION
Hyperspectral remote sensing image contains hundreds of bands, which has the ability to detect more detailed and accurate surface information. Accordingly, the high-dimensional image requires longer computing time (Tosun, 2005, Li, Bioucasdias, Plaza, 2012, Borges, Bioucas-Dias, Marcal, 2011. To solve this problem, as a dimensionality reduction method, RP can be used to reduce the dimension of hyperspectral remote sensing image. Meanwhile, RP will find an efficient the sparse structure of hyperspectral remote sensing image in a low-dimensional space and does not dependent on the number of bands of hyperspectral remote sensing image. Another advantage of RP is that it will preserve the projecting Euclidean distance according to the JL lemma. It means that if original data are projected onto a randomly selected subspace using RP matrix, then the Euclidean distances of the data points between high-dimensional and lowdimensional space are approximately preserved. Therefore, RP can reduce the dimension of hyperspectral remote sensing image with greatly shorten time cost and without introducing significant distortions. Given a hyperspectral remote sensing image X = {xi: i = 1, ..., n}, where i is the index of vectors, xi is a hyperspectral vector with d dimensions and n is the number of vectors. Then a k×d RP matrix R is created, whose columns have unit lengths. The matrix projects each hyperspectral vector into k-dimensional space (k << d). RP is computationally very simple with order O (dkn).
For given constants ε and β > 0, and the dimension k will be selected such that, where ε = projection accuracy β = projection success rate n = vector number The value of k0 is depending on ε, β and n. For a k-dimensional matrix R, the choice of its entries is one of the interests for hyperspectral remote sensing image segmentation. Generally, the entries of R can be considered as random variables following a Gaussian distribution (Fang, Zhang, Wei, 2008), although it can also be replaced by another much simpler distribution. Based on this consideration, the Achlioptas's matrix is defined to further efficaciously represent the sparsity of hyperspectral remote sensing image. Suppose that R is a k×d RP matrix [rgf] k×d. Its entry rgf can be calculated from the below distribution In fact, since rgfs take zero with the maximum probability, Achlioptas's result means further computational savings. Using the Achlioptas's matrices R, the high-dimensional data X can be projected into a low-dimensional subspace, that is, Consequently, the low-dimensional image Y = {yi: i = 1, ..., n}, where yi is a vector with k dimensions. With probability at least 1-n -β , for any distinct rows y1 and y2 in Y, where ||·|| is Euclidean distance.
The projection distance is preserved in relatively fix range based on Eq. (4) and RP matrix is not depend on hyperspectral remote sensing image.
This paper measures the similarity by the Euclidean distance, which is a widely used weigh of similarity of data vectors. It is optimal, in the sense of dimensionality reduction, to make the low-dimensional image as large as possible retaining original hyperspectral remote sensing image information.

BIAS-CORRECTED FUZZY C-MEANS ALGORITHM
When the number of samples of each class has a large difference in hyperspectral image, the segmentation accuracy of the traditional FCM algorithm is not ideal for the segmentation, which will cause an incorrect partition. In order to make full use of the reserved similarity in a low-dimensional space and overcome salt and pepper noises, BCFCM algorithm is used in this paper. The objective function of the BCFCM algorithm is defined by adding the bias field and neighborhood relationship term into that of FCM algorithm, and the optimal segmentation can be obtained by minimizing the objective function iteratively. Provided that the low-dimensional image Y under RP algorithm can be divided into c classes. The objective function of the BCFCM algorithm can be expressed as,

ENSEMBLE ALGORITHM
As different RP may result in different clustering solutions, it is attractive to design the cluster ensemble framework with RP for improved and robust clustering performance. So a more efficient cluster ensemble algorithm for multiple RP and BCFCM is proposed. The realizing procedure of the proposed algorithm is described as follows.
Input: a hyperspectral remote sensing image X.
For w < W

Run
Step 1 and Step 2.

end for
Get fuzzy membership matrix U w Step 4. Apply BCFCM algorithm to obtain the segmentation result z ← Eq. (7).

EXPERIMENTAL RESULTS AND DISCUSSION
To demonstrate the feasibility and validity of the proposed algorithm, the experiments are performed by using real hyperspectral remote sensing image from Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) scanner. This data set covers 224 bands in the place of Salinas Valley, California, USA, with the wavelengths from 0.2 to 2.4μm and 10nm spectral resolution, as well as 3.7m spatial resolution. After discarding the 20 water absorption bands ([104-108], [150][151][152][153][154][155][156][157][158][159][160][161][162][163], 220) and correcting the atmospheric and calibrating radiation, sub-blocks with 204 bands are intercepted as experimental data. Figure 1 shows two false colour composite images, which use an approximate RGB image of three-band combination of hyperspectral remote sensing image, namely band 29 (0.64μm), band 20 (0.55μm) and band 12 (0.47μm) express approximately the R, G, B components, respectively. Among them, the scopes of sub-blocks are 128×128 vectors. Additionally, in application, the points in the image that do not contain any information are regarded as the background and do not allow them to participate in the segmentation. Therefore, the number of valid data points in the experimental image is 12464 and 1940, respectively.
(a) Experimental image 1 (b) Experimental image 2 Figure 1. False color composite images. For the experimental image, the number of vectors n is equal to 16384, which is 128 times 128. And the dimension d of the The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII- B3-2020, 2020XXIV ISPRS Congress (2020 vector is 204, which makes the BCFCM algorithm require more calculation time and larger storage space. To this end, RP, dimensionality reduction algorithm, is required before performing the segmentation algorithm in this paper. For improving the projection success possibility of 1-n -β , the value of β is supposed 0.5. In the range where ε is greater than 0 and less than 1.5, when ε is 1, the maximum value of the denominator ε 2 /2-ε 3 /3 is 6 in Eq. (1). At this time, the value of k0 is 292, which is not desirable because it is larger than the original dimension. However, when ε is greater than 1.5, the value of k0 is negative. Then the low dimension k can be selected by any integer smaller than the original dimension d. Therefore, this paper takes ε as 2, and k is equal to three, which is sufficient for subsequent hyperspectral remote sensing image segmentation.
After RP operation, BCFCM algorithm is used to segment the low-dimensional image. Weighting exponent m must be selected between 2 and 4.5 by experimental means. When the value of m is high, the cluster membership is inclined to be softened and the segmentation result tends to be blurred. Therefore, the weighting exponent m takes 2. The convergence threshold e takes 0.0002. The number of ensembles takes 30, and maximally iterative number Tmax takes 100 in the experiments. Finally, the values of class c are supposed 6 and 4 for the images shown in Figure 1. (a) and (b), respectively. In order to prove the superiority of the proposed algorithm, BCFCM algorithm without dimensionality reduction (d-BCFCM), BCFCM algorithm after PCA (PCA-BCFCM) and RPFCM algorithm (Popescu, 2015) are used as comparison algorithms to segment hyperspectral remote sensing image, respectively. Figure 2. (a1) and (b1) are two standard images of experimental images. Figure 2. (a2) and (b2) show the segmentation results from the proposed algorithm. Figure 2. (a3) and (b3), Figure 2. (a4) and (b4), as well as Figure 2. (a5) and (b5) show the segmentation results from d-BCFCM algorithm, PCA-BCFCM algorithm and RPFCM algorithm, respectively. Through the above experiments, the segmentation results from the proposed algorithm is superior to that from the other three algorithms, which can prove the efficiency and accuracy of the proposed algorithm for hyperspectral remote sensing image segmentation. For the first experiment, grapes-untrained and vineyard-untrained on the bottom left corner in Figure 2. (a2) can be roughly segmented by the proposed algorithm. Although RPBCFCM algorithm still cannot accurately subdivide these two classes, it is much better than the method without RP, because RP can maintain the data topology. In addition, for the second experiment, although the variance of the bare land in the right side in Figure 1. (b) is large and the number of data on the upper left corner in Figure 1. (b) is relatively small, the proposed algorithm can still obtain better segmentation results. However, the segmentation results are not ideal; there are misclassified vectors and leaking classified vectors due to the noise effect on the BCFCM algorithm. Confusion matrix shows the accuracy of the segmentation result by comparing a segmentation result with ground truth information. According to Table 1, kappa coefficients and overall accuracies for the four algorithms are calculated. From Table 1, the value of the accuracy for RP-BCFCM algorithm is higher than those for others. In particular, the comparison of these accuracies shows that the overall accuracy of Figure 1. (b) is nearly 49.33% better than that from the PCA-BCFCM algorithm. From accuracy perspective, because spectral information and data structure are considered, the proposed algorithm is advisable for hyperspectral remote sensing image. Thus, the proposed algorithm can achieve a better segmentation result.

CONCLUSIONS
This paper presents a new segmentation algorithm based on RP and BCFCM algorithm for hyperspectral remote sensing image segmentation. By analyzing the segmentation performance, the proposed algorithm is the best algorithm for segmentation of hyperspectral remote sensing image comparing with other algorithms. Besides, in consideration of computational cost, RP constitutes a promising avenue for lowcost dimensionality reduction for hyperspectral remote sensing image. However, there are still some problems, such as the segmentation results will also be influenced by noises.