RESEARCH ON PM2.5 MASS CONCENTRATION RETRIEVAL METHOD BASED ON HIMAWARI-8 IN BEIJING

This paper was based on Japan's new generation of geostationary satellite Himawari-8 2016 Aerosol Optical Depth (AOD) data and near-ground monitoring station PM2.5 mass concentration data, boundary layer height (BLH), relative humidity (RH), normalized vegetation index (NDVI) data to establish a multivariate linear regression model (MLR) and a geographically weighted regression model (GWR) in Beijing.This provided data and scientific basis for the treatment of air pollution.The results show that: (1) The fitting determination coefficient R of the MLR was 0.5244, indicating that there was a significant correlation between PM2.5 and AOD. After GWR model introduced BLH, RH and NDVI in turn, R increased from 0.3945 to 0.5403, indicating that the introduction of relevant influencing factors can improve the accuracy of the model, that was, PM2.5 was affected by BLH, RH and NDVI. (2) The regression coefficients of the MLR and GWR of the BLH, RH and NDVI were statistically analyzed. The regression coefficients of the two models were close to each other, but the standard deviation of the GWR regression coefficients was larger than the MLR, indicating that the local information of the GWR model was more abundant. It reflected the difference characteristics of the regression coefficients of each parameter.

related variables, which can play a role in dimensionality reduction. Wang et al. (2017) used the AOD product of the Himawari-8 satellite and used AERONET AOD as the true value, which confirmed the high precision of the Himawari-8 AOD data and can effectively characterize the change of aerosol optical thickness. It can be applied to the AOD-PM2.5 correlation analysis of Beijing-Tianjin-Hebei.The mixed effect model was used to estimate the PM2.5 concentration per hour, which proved that Himawari-8 AOD product had certain value in the study of pm2.5 mass concentration.Previous studies have suggested that the correlation between AOD and PM2.5 varies with the spatial environment. Different geographical regions had different aerosol types and GWR was proposed to better limit the spatial heterogeneity of large-scale regression (Fotheringham et al.1996;Li et al.,2016;Na et al.,2010;Zhang et al.,2015).Traditional research on AOD and PM2.5 basically used MODIS AOD products with a spatial resolution of 10 km and 3km, while Himawari-8 AOD has higher temporal resolution and spatial resolution than MODIS AOD.Therefore,this paper estimated the PM2.5 mass concentration in Beijing based on the Japanese meteorological satellite Himawari-8 2016 AOD data , and introduced the boundary layer height (BLH), relative humidity (RH), and normalized vegetation index (NDVI). Multivariate linear regression model and geographic weighted regression model (GWR) were used to analyze the correlation between AOD and PM2.5 and compare the applicability of the two models to air pollution problems in Beijing.

Multiple Linear Regression Model
In the real world, the change of the dependent variable is often affected by several important factors. At this time, it is necessary to use two or more influencing factors as the independent variables to explain the change of the dependent variable.This is called multiple regression or multiple regression.When there is a linear relationship between multiple independent variables and dependent variables. The regression analysis carried out is pluralistic regressionn. Let y be the dependent variable, x 1 , x 2 ,...,x k are independent variables, and when there is a linear relationship between the independent variable and the dependent variable, then the multiple linear regression model is: Where β 0= constant term β 1 , β 2 ...β k =regression coefficients.

Geographically Weighted Regression Model
Geographically weighted regression is a new spatial analysis method proposed in recent years. It is used to detect the non-stationarity of spatial relationships by embedding spatial structures in linear regression models. The geographically weighted regression model used in this study is as follows: Where d ij = the distance between sample points i and j.
b is a non-negative attenuation parameter that describes the functional relationship between weight and distance.
The larger the bandwidth, the slower the weight decays with distance, and the smaller the bandwidth, the faster the weight decays with distance. The AIC information criterion (ie, Akaike information criterion) can be used to measure the goodness of statistical model fitting. The formula of AIC is as follows: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-3/W10, 2020 International Conference on Geomatics in the Big Data Era (ICGBD), Where tr(S) =the trace of the matrix S of GWR, which is a function of the bandwidth b n = the size of the sample ô=the maximum likelihood estimation of random error term variance, ie for the same sample data, the AIC value is made The bandwidth corresponding to the smallest geographically weighted regression function is the optimal bandwidth.    The slope of the MLR model was 0.52, R 2 was 0.52, and RMSE was 37.6. The slope of the GWR model was 0.54, R 2 was 0.54, and RMSE was 31.7. Overall, the GWR model fit slightly better than the MLR, but it was not obvious. The reason might be that the scope of the Beijing research area was small, and the monitoring stations within the research scope were relatively close and the number was small, so the effective monitoring data volume was insufficient and the data difference was weak in the data processing process.   In order to further analyze the spatial difference of the regression coefficients, the spatial distribution of the regression results in winter was calculated according to the GWR results (see the Figure 4). It could be seen from Figure 4. that the regression parameters all showed obvious regional distribution  2.The reason why GWR model did not improve R 2 significantly may be that the study area was limited to Beijing, the research scope was small, the number of monitoring stations within the research scope was small and the distribution was close, and the effective monitoring data was insufficient in data processing. The process led to weak data differences, and subsequent research will expand the study area to get more effective data and enhance the data difference.