PRELIMINARY INVESTIGATION ON CHLOROPHYLL-A AND TOTAL SUSPENDED MATTER CONCENTRATION IN MANILA BAY USING HIMAWARI-8 AHI AND SENTINEL-3 OLCI C2RCC

Water quality monitoring is important in maintaining the cleanliness and health of water bodies. It enables us to identify sources of pollutions and study trends. While modern methods include the use of satellite images to estimate water quality parameters, commonly used satellite systems, such as Landsat and Sentinel, only generate images with temporal resolution of 2 to 16 days on the average. Himawari-8 satellite system, on the other hand, generates full-disk images every 10-minutes, making it possible to generate water quality parameters concentration maps more frequently. This paper presents the preliminary analysis of the generation of yearly and seasonal Chlorophyll-a (Chl-a) and Total Suspended Matter (TSM) estimation models using Himawari-8 satellite images and linear regression. Correlation analysis shows that the single spectral bands and band ratios involving Red band have the strongest relationship with Chl-a and TSM. Generated linear regression yearly and seasonal models resulted to R values of 0.4 to 0.5 with RMSE values around 3 micrograms/cm for Chl-a and 9.5 grams/m for TSM. Results also indicate that the seasonal models are better than the yearly models in terms of fit and error. Results from the preliminary investigation will be used to generate a more robust global model in future studies.


INTRODUCTION
Water quality monitoring is the process of gathering information about the status of water bodies to identify sources of pollution or study trends; this is necessary in maintaining their cleanliness and health. The physical, chemical, and biological characteristics of water are monitored in order to gather information about the status, changes, or trends in the water bodies, necessary in making decisions and policies for the betterment of the water bodies. Water quality parameters such as Chl-a, TSM, colored dissolved organic matters, water temperature, total phosphorus, and dissolved oxygen, are traditionally measured by collecting samples from the field and analyzing them in the laboratory. Specifically, Chl-a and TSM are commonly studied as these parameters are useful indicators of water quality and have a relationship to the water color. Chla is an indicator of biological activity in the water through its relationship to nutrient concentration and algal production, while TSM is an indicator of the presence of suspended sediments in the water that affects how much light is scattered than transmitted through the water. However, measurement of these water quality parameters using traditional methods might be costly and tedious when done frequently and over a large area. With technology and development rapidly growing nowadays, the use of remote sensing techniques give solutions to the limitations of traditional water quality monitoring.
Remote sensing techniques for the estimation and monitoring of water quality parameters involve the use of various satellite images and the study of the relationship between the imager's bands and water quality parameters. Through the utilization of the different spatial and temporal resolution of the satellite systems, remote sensing provides a way to determine the spatial and temporal variations in the water quality, necessary for more accurate and detailed monitoring. Specifically, in times where in-situ measuring is limited and restricted, remote sensing can be used to perform an accurate and efficient way to determine water quality parameters and monitor the health of water bodies. Commonly studied satellite systems in the field of water quality estimations are Landsat and Sentinel due to their high spatial resolution and the availability of various tools for such purpose. However, the aforementioned satellite systems only generate images once from 2 to 16 days on the average over a particular region, making it difficult to perform a more temporally sensitive analysis, which is necessary for waters that are highly dynamic and productive. Himawari-8 weather satellite system provides a way to generate more water quality parameter data with the satellite's temporal resolution of 10 minutes. This can be utilized hand-in-hand with high spatial resolution satellite systems for a more rigorous and thorough analysis and assessment.
Himawari-8 is geostationary weather satellite of the Japan Meteorological Agency (JMA) launched on October 7, 2014. The satellite system generates full-disk scanning every 10minutes of East Asia and Western Pacific region. Aboard the Himawari-8 satellite, is the Advanced Himawari Imager (AHI), a 16-channel multispectral imager capable of capturing visible light and infrared images similar to the Advanced Baseline Imager (ABI) used in GOES (Table 1).
Himawari-8 was mostly used for studies in relation to meteorology such as the study done to observe the eruption of Mt. Raung in Indonesia using shortwave (SWIR) to infrared bands of AHI for observation (Takayuki, 2018).
For water quality monitoring, there are a few studies that generated SST and TSS estimates. Himawari-8 data was used to produce sea surface temperatures using a quasi-physical algorithm, which solves a parameterized infrared radiative transfer equation. A study was conducted to compare and validate the SSTs from Himawari-8 from June to September 2015 with drifting and tropical moored buoy data using 630,000 pairs of Hiamwari-8 SST and buoy data. The results showed good agreement between the Himawari-8 SST data and the buoy data with RMSD and bias of 0.59 K and -0.16 K, respectively.
The negative bias was said to be either caused by the differing depths in measurement and/or cloud contamination (Kurihara, 2016). Dorji and Fearns (2018) tested the feasibility of Himawari-8 images to compute for the total suspended sediment in the coastal waters of Australia. The study also developed an atmospheric correction method for the estimation of total suspended sediment levels. Results showed high correlation coefficients of 0.91 and 0.71, respectively, between the AHIderived TSS concentration with the Landsat and MODIS-Aqua data. Moreover, the study also utilizes the SWIR bands of the AHI for the correction of turbid coastal waters which is not present on other geostationary satellites.
This study uses the Case-2 Regional Coast Colour (C2RCC) by Doerffer and Schiller (2016) to generate water quality parameters to train and test the model. C2RCC is a set of neural networks used in order to generate Case-2 water products, specifically Chl-a and TSM, for various satellite systems such as Sentinel, MERIS, VIIRS, MODIS and Landsat. Specifically, the process involves the inversion of water leaving reflectance spectrum with atmospheric correction to generate the water products.
C2RCC was utilized for the preliminary investigation in order to test the method and analyze if there is a relationship between water quality parameters and the Himawari-8 spectral bands as in-situ field measurements are limited.
This paper presents the preliminary analysis of the generation of Chl-a and TSM seasonal and yearly models using Himawari-8 AHI spectral bands and Sentinel-3 OLCI C2RCC Chl-a and TSM products as training-test data for the correlation and linear regression analysis.

Study Area
Manila Bay is the largest natural harbor in the Philippines with an area of 1,994 sq. km and a coastline of 190 km, alongside highly dense provinces such as Bulacan, Bataan, and Cavite, and cities like Manila.
The bay is a center for various economic, industrial, and commercial activities such as shipping, fishing, aquaculture, tourism, and transport. Manila Bay is historically a center of biodiversity where around 100 different species of birds such as the Chinese Egret and Black-winged cuckoo-shrike, and various number of catfish and mackerels can be observed. Moreover, different species of mangroves can also be observed in the bay with Avicennia marina as the predominant species in the bay area. As such, different developments that are proposed and in motion resulted to various concerns about the bay's condition, such as the bay's deterioration in water quality, making monitoring of the bay more important and necessary.

Figure 1.
Major rivers of Manila Bay depicted as blue lines.

Downloading of Himawari-8 HSD and generation of C2RCC Chl-a and TSM data from Sentinel-3 OLCI
The study is divided into three main components: gathering of Himawari-8 HSD files and Sentinel-3 OLCI Level 1 images, pre-processing of Himawari-8 HSD files and Sentinel-3 OLCI, and generation of water quality parameter models using Linear Regression Analysis ( Figure 2).
The Himawari standard data (HSD) files was downloaded from Japan Aerospace Exploration Agency (JAXA) and National Institute of Information and Communications Technology (NICT) Japan Science Cloud file transfer protocol server. HSD files were in UTC format separated per band and segment. The study area is covered in the 4 th segment of the full disk image. Data downloaded was once per month where there is minimal to no cloud cover for the year of 2019 from 9:10 AM to 10:10 AM GMT+8, except for March and August where the data was either not downloadable or always cloudy. Bands used were the visible bands (Bands 1-3) and the infrared bands (Bands 4-6).
Sentinel-3 OLCI was downloaded from the Copernicus Openhub and was processed using the C2RCC processor in SNAP. Other default parameters were changed based on the study area.

Himawari Standard Data Processing
The downloaded satellite images of Himawari-8 in HSD file format was pre-processed and converted to GeoTIFF using Geo2Grid processor. Geo2Grid is a bash script from Cooperative Institute for Meteorological Satellite Studies (CIMSS), formed by the Space Science and Engineering Center of University of Wisconsin-Madison, NOAA, and NASA. Specifically, Geo2Grid is a set of command line tools of reading, writing, compositing, and remapping gridded data to a new file format such as temperatures, reflectances, and radiances. Geo2Grid is also used for GOES ABI.
The GeoTIFF images were then clipped using the shapefile of the bay and was then converted into points in order to resample the other bands to 500 meters using Kriging Interpolation. Kriging Interpolation produces prediction surfaces together with measures of accuracy through geostatistical methods. Specifically, it interpolates by graphing out the variance of all pairs of data in terms of distance through a semi-variogram basically following the concept of Tobler's First Law of Geography where closer things are more related than distant things.
Afterwards, points containing the bands' reflectance values and water quality parameter were then extracted using point subsampling.

Correlation and Regression Analysis
Analysis was performed using Correlation Analysis and Linear Regression Analysis to determine the relationship between the AHI spectral bands and water quality parameter. Single bands together with their respective band ratios were tested in this study as spectral band ratios were studied to reduce irradiance, atmospheric and air-water surface influence.
After checking the correlation between the bands and the water quality parameters, linear regression analysis was performed in order to determine the best model. Variance inflation factor (VIF) was examined to remove bands that are highly correlated with each other to avoid multicollinearity. Linear regression models were then created with 70% training set and 30% random validation set for 10 iterations. Normality of the residuals as well as homoscedasticity of the models were also checked after the generation of the best model. The seasonal and yearly models were then applied to generate Chl-a and TSM maps.

Correlation Analysis
Correlation analysis was used to determine the strength of the relationship between variables, specifically between the water quality parameters and Himawari-8 bands. Although, correlation analysis does not indicate direct causation, determining the bands with the strongest relationship to the water quality parameters is important in creating a more accurate global Himawari-8 model in the future. Figure 3 show the correlation matrix between Chl-a and Himawari-8 bands for the generated yearly and seasonal models. Values highlighted in red show positive correlation while values highlighted in blue show negative correlation, with darker shades indicating a stronger correlation between the variables. For the single bands, the visible bands showed the highest correlation to Chl-a, indicating a strong relationship between the Blue, Green, and Red bands, and Chl-a. The correlation can be due to Chl-a having high reflectance in the green wavelength, while having strong absorption in the blue and red wavelength, making the   chlorophyll to appear green. Chl-a reflectance also reaches its peak near 700 nm wavelength, however the NIR band for AHI is already at 860 nm, hence the lower correlation. Stronger relationships can be observed when checking the correlation between the band ratios and Chl-a. Specifically, Blue over Green band showed the highest negative correlation to Chl-a, indicating an inverse relationship between them. Moreover, ratios involving Red band as the numerator showed the highest correlation to Chl-a. The stronger relationship observed with the band ratios might be due to the nature that irradiance, atmospheric and air-water surface influences affect spectral band ratios less than it affect single bands.  (1989) showed that the single band resulted to TSM-sensitive algorithms, especially when there is a direct relationship with the TSM and reflectance.
Studies also showed that bands between 700 and 800 nm are the most useful in estimating TSM, but for AHI, the IR bands ranges from 0.86 to 2.3 µm which might explain its low correlation to the water quality parameters. However, relatively higher correlation can be observed for ratios between Band 4 and visible bands, showing that when combined might still be used to explain some relationship between reflectances and water quality parameters.
Chl-a and TSM have high correlation of around 0.8 when tested with the dataset. This might explain the similarity between the results from Chl-a and TSM, as both indicates how much light scatters and absorbs than transmit in straight lines in the water.

Linear Regression Analysis
After checking the strength of the relationship between the water quality parameters and spectral bands with Correlation Analysis, Linear Regression Analysis was performed to generate models using datasets from one year, one dry season (December to May), and one wet season (June to November). Multicollinearity was checked between the band reflectances through the Variance Inflation Factor (VIF), as multicollinearity weakens the significance of an independent variable making the regression coefficients unreliable. Bands with high VIF values were removed until the values are less than the standard 7.5 for all the remaining bands. Linear regression models were then generated through 10 iterations with 70 percent training set and 30 percent validation set. Best models based on the R 2 score of the training and validation set were then selected as the final models to be applied to produce Chl-a and TSM maps (Eqs. 1 -6).
Yearly Equations 1 to 3 show the best models generated for Chl-a, while Table 2 shows the regression coefficient values. Band ratio Red over green was included in all the models, while ratio Blue over green for the yearly and wet season model. In general, an increase in Band 3 depicts an increase to the Chl-a estimated, except for Dry season Band 3/2, while a decrease in the reflectance Blue over green indicates an increase to the Chl-a value for all the model. Moreover, ratios including the IR bands were included in the model, which means that IR band ratios are Features that are highly correlated to TSM are shown in dark red and dark blue colors such as band ratios of Band 3 and Band 1/2. still significant in generating estimation models when prioritizing bands that are not multicorrelated. For TSM, Equations 4 to 6 show the best models generated for TSM, while Table 3 shows the regression coefficients. Similarly, band and band ratios included in the Chl-a models were also the ones included in the TSM models, except for Band 3/6 in the wet season TSM model. In general, band ratios were included in the models rather than single bands, which is similar to what was observed with the correlation matrices where band ratios were more significant.

Chl-a
In terms of which variable affected the Chl-a models the most, Band ratio 3/1 and 3/2 showed the highest standardized coefficient values which can be observed in Table 4. These are also the bands that showed the highest correlation due to Chl-a having high absorption in the blue and red spectrum while high reflectance in the green spectrum. For the TSM models, Band 3/5 affected the yearly model the most, while Band 3/1 and Band 3/2 for the seasonal models as seen in Table 5. However, unlike the models for Chl-a, the difference between the standardized coefficient values for TSM yearly and wet model are not relatively large, specifically Band ratio 1/2 and 3/6 for the TSM wet model.
The best models for Chl-a resulted to training R 2 scores of 0.419, 0.506, and 0.416, with validation R 2 scores of 0.414, 0.501, and 0.407 for the yearly, dry, and wet season models, respectively. Training RMSE of 3.197, 2.727, and 3.344 μg/cm 3 were computed for the yearly, dry, and wet season models, respectively. On the other hand, Validation RMSE resulted to values of 3.200, 2.752, and 3.356 micrograms/cm 3 , respectively. The significantly high R 2 scores and small RMSE show that there is linear relationship between Chl-a and the reflectance bands, and no case of overfitting as the R 2 scores for the training and validation set only have small differences.
Same can be observed with the TSM models with training R 2 scores of 0.462, 0.509, and 0.496, and validation R 2 scores of 0.440, 0.498, and 0.491 for the yearly, dry, and wet season models, respectively. Training RMSE resulted to values of 10. 090,9.849,and 9.502,and validation RMSE of 10.283,9.967,and 9.447 grams/m 3 for the yearly, dry, and wet model, respectively. In general, resulting R 2 scores for the seasonal models are higher than the yearly models. Similar case with the RMSE values of the seasonal models being lower than the RMSE of yearly models. Additional model parameters can be seen in Tables 6 and 7. Normality of the residuals was also tested by checking the histogram of the residuals per model as predictions are calculated based on the assumption that the residuals are normal ( Figure 5). The histogram plots shows relatively normal distributions of the residuals checking the assumption of normality.

Chl-a
Homoscedasticity was also checked by plotting the standard residual values and predicted values, however, the plot showed some trend and not a randomly distributed scatter plot, making it difficult to determine if it truly violates the assumption for homoscedasticity.  The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W6-2021Philippine Geomatics Symposium 2021, 17-19 November 2021

Chl-a and TSM Maps
After generating the yearly and seasonal models for Chl-a and TSM, the models were applied on Himawari-8 images through raster calculator. Maps produced were from 9:10 AM to 10:10 AM GMT+8 for every 10 minutes. Sample generated maps for April 2019 and September 2019 are shown on Figure 6 to 13, displaying maps produced using the yearly and seasonal models.      In general, high concentration of Chl-a and TSM can be found near the coastal areas of the bay. Specifically highest concentrations can be found in the North-western area of the bay near Pampanga-Bataan province. Higher concentrations can also be observed near the mouths of Guagua River, Pampanga River, and Angat River. Discharges from the river contribute to the increase in concentration of the respective areas. Fishponds and aquaculture can also be found in the area which also contribute to the increase in concentration due to the feeds and nutrients present in the water. Comparitively high concentrations can also be found in the Eastern area of the bay near Metro Manila area. Low concentrations are found in the central and mouth area of the bay which is explained by characteristic of open water being naturally less turbid. For the generated maps for September, the concentrations are higher in the Eastern area compared to maps for April.
Qualitatively, the yearly and seasonal maps show similar areas of high and low concentration. One main difference between them is the magnitude of concentrations, specifically the seasonal maps having higher concentrations in general. Sufficient in-situ data is needed in order to identify which maps show more accurate concentrations. Variations can be seen between the 10-minute images, showing some movement of the concentration in the water, specifically in the North-western area of the bay. This can also be utilized in time of algam blooms or heavy rains to determine the source and movement of pollutions. Moreover, researchers can still obtain data and produce maps even on dates with high cloud. For example, in the Septembers maps, the high concentration near the mouth of the bay are cloud pixels confirmed when checked with the true color images. However, there are times where the cloud contamination is less specifically for 9:10 AM. With this, researcher can carefully check the 10-minute data of Himawari-8 when in need of data for a particular date even in times of high cloud contamination, which is difficult to perform for satellite systems that only produce one image per day. The symbology of the maps can also be changed to a more specific range when monitoring specific areas in order to monitor minute changes in the concentration.

CONCLUSIONS
The researchers were able to develop a methodology for estimating Chl-a and TSM using Himawari-8 satellite images by downloading images from JAXA and NICT Cloud and pre-process the images using Geo2Grid in Ubuntu before performing analysis.
Results of the study showed the possibility of estimating Chl-a and TSM using Himawari-8 spectral bands. Resulting models showed significantly high R 2 scores and relatively low RMSE values. Results also show no case of overfitting as R 2 and RMSE values of both the training and validation models are close to each other. Seasonal models for the dry and wet season also showed higher R 2 score and smaller RMSE than the yearly model. Similarities between the results of the Chl-a and TSM models might be due to the high correlation between the two water quality parameters, however, it is important to remember that Chl-a and TSM indicates different factors regarding the quality of waters. Models generated support assumptions of linearity, multicollinearity, and normality of residuals. Development of the study will include generation of models using a more rigorous technique to avoid assumptions that might affect the model, specifically, machine learning algorithms. Incorporation of in-situ data gathered using field instruments either for calibration or model building and validation depending on the number of data gathered will help in generating a more accurate global model. ACKNOWLEDGEMENT