MODELLING THE CHLOROPHYLL-A CONCENTRATION OF LAGUNA LAKE USING HIMAWARI-8 SATELLITE IMAGERY AND MACHINE LEARNING ALGORITHMS FOR NEAR REAL TIME MONITORING
Keywords: Laguna Lake, Chlorophyll-a Concentration, Himawari-8, Machine Learning, C2RCC
Abstract. Recent studies have investigated the use of satellite imaging combined with machine learning for modelling the Chlorophyll-a (Chl-a) concentration of bodies of water. However, most of these studies use satellite data that lack the temporal resolution needed to monitor dynamic changes in Chl-a in productive lakes like Laguna Lake. Thus, the aim of this paper is to present the methodology for modelling the Chl-a concentration of Laguna Lake in the Philippines using satellite imaging and machine learning algorithms. The methodology uses images from the Himawari-8 satellite, which have a spatial resolution of 0.5–2 km and are taken every 10 minutes. These are converted into a GeoTIFF format, where differences in spatial resolution are resolved. Additionally, radiometric correction, resampling, and filtering of the Himawari-8 bands to exclude cloud-contaminated pixels are performed. Subsequently, various regression and gradient boosting machine learning algorithms are applied onto the train dataset and evaluated, namely: Simple Linear Regression, Ridge Regression, Lasso Regression, and Light Gradient Boosting Model (LightGBM). The results of this study show that it is indeed possible to integrate algorithms in Machine Learning in modelling the near real-time variations in Chl-a content in a body of water, specifically in the case of Laguna Lake, to an acceptable margin of error. Specifically, the regression models performed similarly with a train RMSE of 1.44 and test RMSE of 2.51 for Simple Linear Regression and 2.48 for Ridge and Lasso Regression. The linear regression models exhibited a larger degree of overfitting than the LightGBM model, which had a 2.18 train RMSE.