IMPROVING THE ACCURACY OF EXTRACTING SURFACE WATER QUALITY LEVELS ( SWQLs ) USING REMOTE SENSING AND ARTIFICIAL NEURAL NETWORK : A CASE STUDY IN THE SAINT JOHN RIVER , CANADA

Delineating accurate surface water quality levels (SWQLs) always presents a great challenge to researchers. Existing methods of assessing surface water quality only provide individual concentrations of monitoring stations without providing the overall SWQLs. Therefore, the results of existing methods are usually difficult to be understood by decision-makers. Conversely, the water quality index (WQI) can simplify surface water quality assessment process to be accessible to decision-makers. However, in most cases, the WQI reflects inaccurate SWQLs due to the lack of representative water samples. It is very challenging to provide representative water samples because this process is costly and time consuming. To solve this problem, we introduce a cost-effective method which combines the Landsat-8 imagery and artificial intelligence to develop models to derive representative water samples by correlating concentrations of ground truth water samples to satellite spectral information. Our method was validated and the correlation between concentrations of ground truth water samples and predicted concentrations from the developed models reached a high level of coefficient of determination (R) > 0.80, which is trustworthy. Afterwards, the predicted concentrations over each pixel of the study area were used as an input to the WQI developed by the Canadian Council of Ministers of the Environment to extract accurate SWQLs, for drinking purposes, in the Saint John River. The results indicated that SWQL was observed as 67 (Fair) and 59 (Marginal) for the lower and middle basins of the river, respectively. These findings demonstrate the potential of using our approach in surface water quality management.


INTRODUCTION
Existing methods of assessing surface water quality depend mainly on comparing the experimentally measured surface water quality parameters (SWQPs) with the existing surface water quality guidelines (Debels, Figueroa, Urrutia, Barra, & Niell, 2005).This type of surface water quality assessment is valuable for researchers and experts; however, it is often poorly understood by non-experts, such as decision-makers.Decisionmakers do not need to be aware of the technical and detailed results of monitoring stations.Thus, it is necessary to assess surface water quality of water bodies using the water quality index (WQI), which is considered as the most effective tool to extract surface water quality levels (SWQLs) (Bharti & Katyal, 2011).
A WQI is a mechanism based on a numerical expression to identify the level of surface water quality by summarizing complex water quality data into simplified mathematical numbers, which can be interpreted into text classes (i.e.Excellent, good, etc.) (Bordalo, Teixeira, & Wiebe, 2006).In literature, very few studies have attempted to delineate SWQLs using statistically-based WQIs.Most of the available research is mainly based on two statistically-based WQIs: the Overall Index of Pollution (OIP) and the Canadian Council of Ministers of the Environment water quality index (CCMEWQI).
The OIP was used to extract water quality levels in Yamuna River, India using the water quality data of turbidity, power of hydrogen (pH), dissolved oxygen (DO), biochemical oxygen demand (BOD), total dissolved solids (TDS), and fluoride (Sargaonkar & Deshpande, 2003).Water samples were collected from six stations and the extracted SWQLs were excellent at stations 1 and 3.While stations 2, 5, and 6 were categorized as slightly polluted, station 4 was classified as polluted.
The CCMEWQI was used to extract the water quality in Mackenzie River, Canada (Lumb, Halliwell, & Sharma, 2006).The water quality is classified as marginal for drinking purposes and the river is negatively affected by high suspended sediment loads.In another study, the CCMEWQI was used for comparative analysis of regional water quality in Canada and was found to be a good tool for water quality assessment (Rosemond, Duro, & Dubé, 2009).The mean CCMEWQI values ranged from 42.40 to 56.70, which is marginal (i.e. the water quality is frequently threatened or impaired).
Based on the literature review, WQIs can support the accurate interpretation of surface water quality; however, they require a huge set of water samples obtained by physical monitoring of water quality, which is costly, time consuming, and labour intensive.Therefore, the integration of the Landsat-8 multispectral information, the back-propagation neural network (BPNN), and the CCMEWQI is developed for the first time to extract accurate SWQLs.The BPNN algorithm is selected to develop models to quantify concentrations of SWQPs from Landsat8 satellite imagery.The BPNN is proposed because it can lead to good generalization of the network, control the learning process, and achieve the global minimum (Tai-Sheng, Chih-Hung, Li, & Yu-Chu, 2008;Sharaf El Din, Zhang, & Suliman, 2017).The obtained concentrations of SWQPs over each pixel of the selected study area are used as an input to the CCMEWQI to extract accurate SWQLs.The CCMEWQI is selected due to its flexibility in the selection of input parameters (i.e.SWQPs), the capability of minimizing the data volume to a great extent, and simplifying the expression of surface water quality (CCME, 2001).The identified objectives of this study are to: (1) develop Landsat-8 models to estimate concentrations of SWQPs of the selected study area of the Saint John River (SJR), New Brunswick, Canada by using the BPNN and (2) identify the accurate SWQLs in the SJR by using the CCMEWQI.

Study Site and Water Sampling Trips
The selected study area covers two main parts of the SJR: the lower basin (i.e.below the Mactaquac Dam) and the middle basin (i.e.above the Mactaquac Dam).Water sampling was performed at the same time of satellite overpass and collected during five field trips in 27-06-2015, 10-04-2016, 12-05-2016, 22-07-2016, and 23-08-2016.As shown in Figure 1, samples were randomly distributed along the study area.Sixty-six ground truth water samples were collected along 130 km of the SJR and coordinates of each sample were recorded using a handset GPS, GARMIN 76CSx.
Concentrations of both optical and non-optical SWQPs, such as turbidity, total suspended solids (TSS), total solids (TS), total dissolved solids (TDS), chemical oxygen demand (COD), biochemical oxygen demand (BOD), dissolved oxygen (DO), power of hydrogen (pH), electrical conductivity (EC), and temperature, were measured according to the American Public Health Association (APHA) water and wastewater standards (APHA, 2005).
Figure 1.The study area along with sampling points

Landsat-8 Acquisition and Processing
Five Landsat-8 satellite sub-scenes acquired on 27-06-2015, 10-04-2016, 12-05-2016, 22-07-2016, and 23-08-2016 are used in our study to best represent the maximum variation in the concentrations of SWQPs.The Landsat-8 satellite images are available free of charge at Level 1T (terrain corrected) (Earth Explorer, 2016).Atmospheric distortions should be eliminated in order to measure the water-leaving reflectance.The Dark Object Subtraction (DOS) method was used to remove atmospheric distortions and consequently calculate the surface reflectance values (Chavez, 1988).This method is well accepted by the geospatial community and can provide accurate mapping for wetland areas (Song, Woodcock, Seto, Lenney, & Macomber, 2001).

Estimation of Concentrations of SWQPs Using Artificial Neural Network (ANN)
In this study, the BPNN algorithm was adopted to model the nonlinear relationship between the Landsat-8 surface reflectance data and concentrations of SWQPs.As shown in Figure 2, the Landsat-8 multi-spectral bands which show the highest correlation to the selected SWQPs were used to form the input layer.While concentrations of SWQPs were selected, one at atime, to compose the output layer, the number of hidden layers and the number of neurons in each hidden layer was experimentaly selected.

Calibration and Validation of the Developed BPNN Models
The architectural design of the proposed artificial neural network (ANN) consisted of three layers with a sigmoid activation function which is differentiable and can provide the powerful capability of modelling complex and nonlinear problems.In our study, 25 neurons were experimentally selected to form the hidden layer.Using a few set of neurons in the hidden layer may lead to an underfitting problem, while using a huge set of hidden neurons may lead to slow learning.The BPNN algorithm was used to map the relationship between the Landsat-8 spectral data and concentrations of SWQPs.This algorithm can result in good generalization when using either large or small datasets (MacKay, 1992).This algorithm is computationally efficient as 4, 5, 8, 12, 22, 21, 10, 4, 18, and 11 seconds were achieved, at the ANN training phase, for turbidity, TSS, TS, TDS, COD, BOD, DO, pH, EC, and temperature, respectively.Additionally, finding the global minima is guaranteed by utilizing an appropriate learning rate value.In this context, a learning rate value of 0.01 was adjusted to achieve the global minima in the error surface.As shown in Figure 3, for the whole SWQPs, coefficients of determination were very high (R 2 > 0.82) at the neural network training phase with p-value < 0.001.The final relationship between the desired output (i.e.observed concentrations of SWQPs) and the actual output (i.e.predicted from the developed network) was developed in the Matlab environment.

Extracting Accurate SWQLs
In order to properly delineate the levels of surface water quality in the SJR by using the CCMEWQI, the selected study area were subdivided into two main sites: (1) below the Mactaquac Dam and (2) above the Mactaquac As shown in Figure 6, twenty eight water samples were collected below the dam during the first two trips.Rather than using twenty eight water samples, 47544 water pixels, derived from the developed BPNN algorithm with R 2 > 0.80, were used as an input to the CCMEWQI to extract the exact SWQL below the Mactaquac Dam.In the same way, thirty eight samples were collected above the dam during trip 3, 4, and 5. Instead of using thirty eight water samples, 100606 water pixels were used to delineate the accurate SWQL above the dam.
The CCMEWQI calculations were carried out and the concentrations of TS, TDS, and pH were found within the permissible limits; however, turbidity, TSS, COD, BOD, DO, EC, and temperature values exceeded the standard limits given by the CCME and WHO standards for drinking water.The obtained CCMEWQI was observed as 67 (Fair) in the lower basin of the SJR, which means the water quality is usually protected but occasionally threatened or impaired.The obtained SWQL for the lower basin of the SJR was found to be consistent with the results obtained by the Canadian River Institute (Kidd, Curry, & Munkittrick, 2011).Moreover, the water quality in the middle basin of the SJR was classified as 59 (Marginal), which means the water quality is frequently threatened or impaired.The main reason of obtaining different levels of water quality in the two main sites of the SJR is that the lower basin of the river has less agricultural and industrial processes, which may keep this part of the SJR in a better state than the middle basin of the river.
Figure 6.The two sites of the study area of the SJR

CONCLUSION
Traditional analysis of physico-chemical SWQPs could not provide the overall trends of surface water quality in water bodies.Therefore, we need a tool, such as the WQI, to delineate accurate levels of surface water quality.The CCMEWQI was selected because it is very flexible in selecting input parameters (i.e.physico-chemical SWQPs) and capability of minimizing the data volume to a great extent as well as simplifying the process of surface water quality assessment.Due to complexities and difficulties of providing representative database (i.e.water samples), The WQI may be biased towards reflecting misleading SWQLs.Hence, the integration of Landsat-8 spectral data, the BPNN algorithm, and the CCMEWQI was developed to extract accurate SWQLs to be accessible to decision-makers.
The results of this study show the potential of generating generalized models to retrieve concentrations of SWQPs from satellite imagery in the SJR and other water bodies.Additionally, our study is valuable for managers and decisionmakers because the CCMEWQI mechanism provides comparative evaluation of the water quality of sampling sites and summarizes complex water quality data into simplified mathematical numbers.Finally, in order to produce better research outcomes in future, water sampling stations should be collected in the upper basin of the SJR to determine the CCMEWQI in the whole parts of the river.

Figure 2 .
Figure 2. Architectural design of the proposed ANN