A NEW CLOUD SHADOW DETECTION ALGORITHM BASED ON PRIOR LAND TYPE DATABASE SUPPORT

Cloud shadow detection is one of the basic step for remote sensing image processing. The threshold method is commonly used currently because of its easy implementation and good accuracy. Aiming at the problems that common threshold settings are difficult to meet the complex surface conditions and the results is two-value, this paper proposes a detection method of cloud shadow pixels based on land cover data support by calculating shadow probabilities using Landsat 8 data as an example. Then, a validation with visual interpretation is used to verify the accuracy. The results show that the method can achieve high cloud shadow detection results. * Corresponding author: sunlin6@126.com


INTRODUCTION
The existence of cloud shadows on remote sensing images can cause wrong expression of the original surface information, which makes the qualitative and quantitative results inaccurate. Extracting cloud shadow areas is the basis for remote sensing image processing (Li et al., 2016). Threshold method of shadow detection is commonly used because of its easy implementation and good accuracy. It is based on the difference between cloud shadow and typical underlying surface spectrum to identify the cloud shadow pixels (Shahtahmassebi et al., 2013). The traditional threshold method (Song, Civco, 2002;Boright, Sluder, 2007;Zhu, Woodcock， 2012) finds the threshold with the spectral features based on the band value, band ratio or difference. However, the surface condition is complicated. If the threshold is applied to various types of surface, it is difficult to achieve comprehensive and accurate cloud shadow recognition (Sun et al., 2018a). Moreover, and the result of traditional methods is two-value result (cloud shadow or clear land). However, the support of prior data can better improve the accuracy (Tian et al., 2018). To solve these problems, this paper proposes a detection method of cloud shadow pixels based on land cover data support using spectral characteristic differences between cloud shadow and clear land. This paper takes Landsat 8 data as an example and uses GlobeLand30 surface type database as a support to obtain the threshold of cloud shadows under different surface types. Then, the cloud shadow probability is calculated based on the database of land types and Cloud Shadow & Clear Land (CSCL) pixels. Cloud shadow detection experiments are carried out in areas with cultivated land, forest, grassland, shrubland, wetland, artificial surface and bareland. Moreover, the validation with visual interpretation is used to verify the accuracy. The results show that the method can achieve high cloud shadow detection results.

GlobeLand30:
The 30-meter Global Land Cover Dataset (GlobeLand30) is a global land cover product generated from 30-meter multispectral images, including Landsat TM5, ETM+ multispectral images and multispectral images from the Chinese Environmental Disaster Alleviation Satellite (HJ-1). It also combines a lot of auxiliary data and reference materials at the same time. The overall accuracy of GlobalLand30-2010 can reach 83.51% (Chen et al., 2017). The relevant Kappa indicator is 0.78. Thus, this land cover type classification product achieves high accuracy, which provides an accuracy guarantee for its application in cloud detection in this study (Sun et al., 2018b).

Pixel database building:
The pixel database consists of cloud shadow and clear land utilizing Landsat 8 images. In order to ensure the diversity and availability of samples, the sample points are selected from 40 images in the world. It not only samples the CSCL pixels of different types of surface, but also samples the cloud shadow of different degrees, which makes the shadow features more complete and provides guarantee for the detection results.

Optimal bands selection
According to the built prior CSCL pixel database, the Shadow Correct (True Negative) Rate (SCR) and Clear-land Error (False Positive) Rate (CER) is calculated at each threshold which is reflectance changing from 0 to 1 at intervals of 0.01. Figure 1 shows the simulation trend of shadow accuracy rate and clear land error rate. When the threshold is very low, SCR and CER are all 0. Then, some shadows can be identified correctly which results in an increasing SCR. Meanwhile, CER is low due to the accurate recognition of clear land. Big difference between SCR and CER start to appear as the threshold increasing. Finally, both the two rates end to 1. This simulation is performed on each surface type and each band. When there is a band that SCR is greater than 0.95 and CER is less than 0.1, then this band is included in one of the optimal bands for one land type.

Cloud shadow probability calculation
The optimal bands for each land type are obtained by the above simulation. MIN' and MAX' are obtain by following equations: where T= threshold value The probability of cloud shadow for each band is calculated in which interval that traversed in turn at an interval of 0.001 when T∈[MIN'，MAX']. The value at the right end of each interval is taken as the threshold representing the interval, so the cloud shadow probability corresponding to each threshold is obtained. The cloud shadow probability of the ith interval is calculated as follows: where Nshadow = cloud shadow pixel number in [MIN'+(i-1) *0.001, MIN'+i*0.001] Ntotal = total pixel number in this interval So the probability of cloud shadow at a certain threshold is obtained. In the calculation, MIN is the maximum value in the thresholds of the cloud shadow probability of 1 and MAX is the minimum threshold value when probability is 0. Therefore, when reflectance is less than MIN, the cloud shadow probability is 1, and when the pixel reflectance is greater than MAX, the probability is 0. When the threshold is between MIN and MAX, according to statistics, the cloud shadow probability corresponding to each threshold will present a certain trend. The function that best fits this curve is the Sigmoid function (S-shaped curve). The function is defined as equation (4), and Figure 2 shows an example of three optimal bands for grassland.

Cloud shadow probability generation algorithm
Each land type usually has 2~3 optimal bands, so these bands need to be combined to get the final probability results. Based on the CSCL pixel database, the average difference between the clear land and cloud shadow reflectance is calculated for each band, and then the corresponding weight is given by the distance weight method. The difference bigger, the weight larger. It is because more distinguishable. The final algorithm is as follows: where i= land type n = number of optimal bands Wi,j = weight of band j on land type i ρ= apparent reflectance

RESULT
Cloud shadow detection results can be obtained by selecting an appropriate probability threshold according to user needs. Generally speaking, a pixel whose probability is greater than 0.75 can be considered as cloud shadow in all probability. Thus, in this paper, 0.75 is used as the probability threshold to the experiments. Some results are shown in Figure 3. It shows the good detection effect of this algorithm in different land types and shadows of different degrees.

CONCLUSION
Aiming at the problem that traditional threshold method cannot adapt to complex surface conditions, land type data and CSCL pixel database are added as references in this paper, so that the threshold can be set according to different situations. In addition, the detection result of the threshold method is usually a 0-1 result. This paper proposes a shadow probability generation method based on the pixel database, which is better than the two-value result. The results show that the algorithm can achieve good accuracy. This paper provides a new idea for cloud shadow detection.