Assessment of Rainfall-Runoff Simulation Model Based on Satellite Algorithm

Simulation of rainfall-runoff process is one of the most important research fields in hydrology and water resources. Generally, the models used in this section are divided into two conceptual and data-driven categories. In this study, a conceptual model and two data-driven models have been used to simulate rainfallrunoff process in Tamer sub-catchment located in Gorganroud watershed in Iran. The conceptual model used is HEC-HMS, and data-driven models are neural network model of multi-layer Perceptron (MLP) and support vector regression (SVR). In addition to simulation of rainfall-runoff process using the recorded land precipitation, the performance of four satellite algorithms of precipitation, that is, CMORPH, PERSIANN, TRMM 3B42 and TRMM 3B42RT were studied. In simulation of rainfall-runoff process, calibration and accuracy of the models were done based on satellite data. The results of the research based on three criteria of correlation coefficient (R), root mean square error (RMSE) and mean absolute error (MAE) showed that in this part the two models of SVR and MLP could perform the simulation of runoff in a relatively appropriate way, but in simulation of the maximum values of the flow, the error of models increased.


1.Introduction
The specialists and experts of water resources have always looking for a proper relationship between precipitation and runoff process.For this purpose, several methods have been used and even many software models have been developed.In a general classification, we can divide these models into two conceptual and data-driven models.Conceptual model considers the physics of the issue simulation of runoff through information of precipitation, but data-driven models only deal with exploration of the hidden relationship between the input (precipitation) and output (runoff) of the model.Therefore, implementation of the conceptual models requires much more information about the watershed.
The study of published papers on the rainfall-runoff model show that the preliminary studies in this area date back to the mid-18th century.In 1851, an Irish engineer named Thomas James Molonsy created the first rainfallrunoff model, which was widely used.In 1921, Ross was the first person who used a distributed hydrological model based on the concept of a hydrograph.Similar studies were conducted in the US by Zack (1934), Turner and Bordvein (1941) and Clark (1945) and in England by Richards (1944).The hypothesis in all these models was linear routing of runoff.A large stride to solve this problem was expressed just a year after the concept of unit hydrograph was introduced by Robert Horton in 1993.He presented an article on the generation of runoff and declared that the runoff is generated when the intensity of precipitation rate exceeds the maximum capacity of soil penetration.In the 1960s, for the first time the computer in response to the current computing needs were widely used.One of the first and most successful models was Stanford Watershed Model (SWM) that was created by Norman Crawford and Ray Lindsay at Stanford University and then was entered the market under the commercial of HSP 1 and was widely used.In 1995, the US Environmental Protection Agency added the quality section to this model and this model survived as HSPF 2 and now is used with the same name.The internal and foreign studies in the field of rainfall-runoff modeling are separately presented in the following section.
Nourian et al (2013) modeled daily rainfall-runoff of Gigle watershed in Ethiopia using satellite data and neural network.Satellite data used by them included precipitation data of CMORPH and TMPA 3B42RT.Before using artificial network model, they used wavelet transformation that is considered a pre-processing of dat.The results showed that the correlation between the simulated flow data and the observations without wavelet transformation is 0.80 and using the model of wavelet transformation is 0.93.He et al. (2014) in a comparative study investigated the performance of artificial neural network, Fuzzy inference system and support vector regression in simulation of runoff.The results showed that all three models can be used to predict the flow as well, but in a severe investigation the support vector regression model acted better than the other two models.
In this study, the efficacies of the conceptual and data-driven models in rainfall-runoff modeling in one of the Iranian watersheds, i.e., Gorganroud watershed will be compared.The conceptual model used in this research is HEC-HMS and the 1 Hydrological Simulation Program 2 Hydrological Simulation Program-Fortran data-driven models are artificial neural network (ANN) and support vector regression (SVR).Also, the satellite data of precipitation were used.This is because the precipitation data for many parts of the country are not available separately in terms of time and location, and if satellite data a good performance, they can be used as a database for future studies in all fields of water resources.
In this study, at first the required information and statistics were prepared and satellite data were extracted from four satellites of CMORPH, PERSIANN, TRMM 3B42 and TRMM 3B42RT.After collecting data, at first a comparison was done between satellite data of precipitation in daily and monthly scales with land data.Then the models of HEC-HMS, ANN and SVR will be applied for simulation of rainfall-runoff process.

The rainfall-runoff models
The first view states that all models (even physical models) are basically the tools to extract existing data at different times and locations.In this view, the modeling method is experimental.In this method, the way of system function is determined according to existing data.In fact, by extracting the relationship between input and output data we can find out how the system works.This type of modeling which is called data-driven modeling is done regardless of the physical elements or process theory.This modeling is also called black box modeling.If we can make a proper relationship between the inputs and outputs in this model, we do not need to understand the physical relationships between the other elements in watershed.In this study, the data-driven models of MLP and SVR were used.ANN is one of the best models of artificial intelligence that due to its optimal efficiency in predictions has been considered in classification.ANNs are of several types that the most well-known one is Multilayer Perceptron neural network (MLP).In the present study, the used neural network was MLP network.This network is formed of an input layer, one or more intermediate layer (hidden) and an output layer.In this structure, all neurons of a layer are linked to all neurons of the next layer.In technical terminology, this arrangement forms a network with complete connections.In many complex mathematical problems which result to solve the complex nonlinear equations, an MLP network can be easily used by defining the weights and functions properly.Different stimulating functions are used according to the style of the problem in neurons.Training these networks is usually done by error post-propagation method.
The second method used in the view is support vector machine (SVM).SVM is one of the data-driven models, which is not very old.This model after passing a training process is able to classify or predict the data.This method was originally introduced by Vapnik (1995) as a powerful way to categorize the data.In the present study, a regression form of SVM which is known as support vector regression (SVR) has been used to model rainfallrunoff.In this section, SVR theory in estimating the functions is presented for estimation of different variables.Since, SVR is a data-driven model, it does not explicitly consider the physics of the problem in its predictions and needs training for estimating the new dependent variable from the independent variables.For this purpose, a set of available data must be given to SVR as the information of the training stage, and then it can be used as a simulating model.Vapnik (1995) defined two functions to design SVR.The first function calculates the error or deviation of calculated values by SVR from the observed values during the training process and the second function is a linear function that calculates the weight and deviation of output data per input data values.
The establishment of this relationship is sometimes very complex, in this case, physical interpretations are required and their theoretical analysis alone can not be responsible.The second view states that the models should reflect the user's physical receiving from the processes involving the modeling event as far as possible.Just by using this type of modeling, the user can be sure that the predictions out of the range of the observed data, that is, predictions for future in terms of time and place are reliable.In this view, modeling is done inferentially; in fact, this type of modeling is the conceptual modeling based on the physics of the problem.For modeling the process of rainfall-runoff, the conceptual models needs the interaction of the two processes of surface and subsurface flows and also the processes of evaporation, perspiration and melting of snow.In this study, the conceptual model of HEC-HMS was used in order to model the process of rainfallrunoff.HEC-HMS software is hydrological modeling software of simulation type with capability of optimization of parameters.This software was designed by the US Army Hydrologic Engineering Center (HEC).This engineering center has so far produced various hydrologic modeling software tools such as PRECIP (1989), HEC-IF (1989), HEC-IFH (1992) and HEC-1 (1998).HEC-1 software is, in fact, the first version of HEC-HMS software that then, version 2 and 3 of this software were entered the market.The latest version of this software is version 3.5 that was released in 2010.Modeling in HEC-HMS software is of conceptual type.For conceptual and distributional modeling in a watershed, hydrologist faces with a wide variety of the geological features, soil, vegetation, land use, and topographic characteristics that affect the relationship between precipitation and runoff.But the use of such models is difficult for users due to two following reasons: First, a massive influx of information is required that most of them are not directly measurable; and second, the powerful computing resources are required.
But there are many points in a watershed that have similar hydrologic behaviors, that is, they are similar in balancing water and producing runoff in surface and subsurface flows.So by classifying the points of the watershed in terms of their hydrological similarities, there is the possibility of creating simple forms of distributed models based on the distribution of basic hydrological responses in watershed and without need to check each point individually.Therefore, in HEC-HMS software for modeling different processes effective in creating runoff in the watershed (ponds, potholes, infiltration, etc.), several storage elements are used and during precipitation and the period between precipitations they become full and empty alternatively.If each of the reservoirs is filled, then it is assumed that the additional precipitation reaches the canal as runoff (fast runoff).Also, the storage element is allowed to discharge the reservoirs at time intervals between precipitations (slow runoff) and with this the flow is returned to the river (river's basic flow) and thus the reservoir turns back to the initial state before precipitation.Evapotranspiration from each of the reservoirs is done over time between precipitations.

Case study
Gorganroud watershed is located in the northeastern plateau of Iran plateau and is considered as one of the second grade watersheds of the eastern margin of the Caspian Sea, which is limited to Atrak watershed from the north, Alborz mountains and central desert watershed from the south, Heraz Nekah watershed from the northwest and to Kopeh Dagh Heights and Gharaghom watershed from the east and southeast.Gorganroud Watershed is located between the geographical coordinates of 53º 57΄ to 59º 3΄ east longitudes and 36º 57΄ to 38º 17΄ north latitudes.The highest point of the watershed has 3,200 meters height.The area of the watershed is 11300 km 2 and its important rivers are Gorganroud, Gharasou, Zaw, Gharachai and Mohammadabad.The watershed is mountainous and plain areas and 55% of it is plain area.
In this study, simulation of daily flows of Tamer catchment as one of the catchments of Gorganroud watershed is studied.The area of the Gorganroud watershed in Golestan province is 1525.3square kilometers and is geographically in the range of 37º 24´ to 37º 49´ of the north latitude and 55º 29´ to 56 º 4´ east longitudes.The highest point is in the region of Khoshyeillagh in the south of the watershed with height of 2098m and the lowest point is the south of watershed at Golestan Dam II with a height of 117m above sea level.The gravity center of the watershed geographically is located in the north latitude of 37º 36΄ 19΄΄ and east longitude of 55º 47΄ 46΄΄at the height of 900m.Fig. 2 shows the location of tamer Watershed.In this watershed, there are a limited numbers of rain stations.Most of these stations have been equipped with short term recording data, except Tamer Station which has daily records of precipitation for past 40 years.Also, we have used the statistics of Tamer hydrometric station at the output of the watershed as the observed runoff for calibration and evaluation of the model.
In this study, daily precipitation data recorded in the four satellite algorithms along with land data were used.These four databases include data from satellites of CMORPH, PERSIANN, TRMM 3B42 version 7and TRMM 3B42 RT version 7 that have been used in the present study.
CMIRPH model was proposed by Joyce et al (2004) in the National Oceanic and Atmospheric Administration (NOAA).The output of this model is the precipitation rate based on the satellite images and is available in NOAA site at address ftp://ftp.cpc.ncep.noaa.gov.Spatial and temporal resolutions of the model were 3 hours and 0.25 degrees.Another version of the model has been presented that has a spatial and temporal resolution of 30 minutes and 8 km; but its data is only available for the last two months.This model covers from 60º of south latitude to 60º of north latitude.Passive microwave data in CMORPH model are provided in DMSP13, DMSP14 and DMSP15 satellites by SSM/I sensor, in NOAA15, NOAA16 and NOAA17 by AMSU-B sensor and in TRMM satellite by TMI sensor.Infrared images are obtained via Metrosat-5, Meteosat-7, GOES-8, GOES-10 and GMS-5 satellites (Joyce et al., 2004).
PERSIANN model is a precipitation estimation algorithm using remote sensing by using artificial neural network.Kou et al. (1999) developed this model at the University of Arizona.The basic algorithm of the model is based on neural network model and the inputs of the basic model is the high temperature of the cloud resulted from infrared satellite images of the cloud through the earth-circuit satellites including GoEs8, GoEs9 and GMS.The main characteristic of the earth-orbited satellite images is their high temporal resolution; but these images have low spatial resolution, because the distance of these satellites from the earth is more than that of polar orbit satellites.Using these images, PERSIANN estimates precipitation intensity at land surface (Hang et al., 2004).In order to enhance the spatial resolution, an algorithm is created using images of TRMM, NOAA-13 and NOAA-14 satellites that are of polar orbit type and also by using artificial neural network, spatial resolution of 0.25×0.25 degrees at the tropical area and at temporal steps with 0.5 hour (Seroshian et al., 2002).Data of this database are available at address: http://chrs.web.uci.edu.
Tropical rainfall measurement model (TRMM) began in 1997.This is a part of the international project of NASA and its purpose is to obtain an accurate estimation of the precipitation in the tropical area and subtropical regions.TRMM satellite was launched by the United States and Japan in 1996.This satellite has been equipped with sensors such as precipitation radar (PR), microwave painter, visible infrared searcher (VIRS).This base has different versions.For example, TRMM 3A12 product provides the mean precipitation value and measures 14 vertical profiles of precipitation water, cloud ice and latent heat.The resolution of data of this precipitation product is 0.5×0.5 longitude and latitude degrees and covers monthly time period.Its spatial coverage is also between 40º of longitude to 40º of latitude.The data of this product is available from December 1997.Another version is TRMM3B42.These data are for a daily period and have a spatial resolution of 0.25×25 degrees.In TRMM 3B42 database, the data of the ground stations are also used.Also, the data of network satellite algorithms of geographical precipitation climate center (GPCC) that has been established by Germany are combined with satellite estimations of this base.Network precipitation data from TRMM 3B42 are available from 1988 to date with a two-month delay and 0.25×0.25 spatial resolution.Spatial coverage of these data is from 50 degrees of south to 50 degrees of north and from 180 degrees of west to 180 degrees of east.In addition to data of TRMM 3B42, the data of TRMM 3B42RT that is a near real time version were also used.The data of this database are also available at address: http://disc2.nascom.nasa.gov/tovas.

Discussion and results
In this research, satellite data were used for training the models .At the accuracy measurement stage, the global data were used.In this method, MLP, SVR and HEC-HMS models were trained with each others.

Comparison of the land and satellite precipitations:
The accuracy of satellite data, in addition to the conventional measures of correlation coefficient and root mean square error, was done based on some other criteria.In this part, three criteria of probability of detection (POD), false alarm ratio (FAR) and critical success index (CSI) were used, which are defined as: Where, R indicates the presence of rainfall and N indicates lack of precipitation.In each combination, the first letter indicates the station and the second letter indicates the satellite data.For example, NR represents the number of days that the precipitation has not occurred in the station, but satellite data show precipitation.In the best possible state, the values of POD, FAR And CSI are 1, 0 and 1 respectively.Based on performed comparison of satellite data, TRMM 3B42 shows the highest correlation with land data precipitation during the years of 2003-2004 to 2007-2008.Also, based on the RMSE criterion, the data of CMORPH with slightly preference have acted better than other algorithms.Based on POD criterion, the data of PERSIANN model with POD equal to 0.602 had better performance than other models.Also based on the FAR criterion, the PERSIANN model had the best performance among the four studied algorithms and based on the CSI, the PERSIANN model has acted better than other models.Table 1 shows the values of the discussed parameters for all algorithms.

Evaluation criteria
To evaluate the model and compare the results, three evaluation criteria were used including correlation coefficient (R), root mean square error (RMSE) and mean absolute error (MAE).These criteria are defined as follows: (1) In these relations, Oi is the observed values at ith time step, fi is the value of simulated flow at ith time step, O is the mean value of the observed flow, f is the mean of predicted values and n is the number of data.Any model that has higher R, and lower RMSE and MAE is more optimal.Although in this study, we have used the daily time scale, we carefully considered the accuracy of algorithms in monthly scale.This is important since in most of researches in the field of water resource data, the monthly data are considered.In this scenario, the satellite data were used at all stages of modeling.Therefore, the models were trained once for each algorithm and were tested at second stage.In this scenario, the models of MLP and SVR had a good performance, but the HEC-HMS model failed to model the rainfallrunoff process well and presented unsatisfactory results; due to this reason, the results obtained by this model are not presented here, and we introduce this model as an improper model for using based on satellite algorithms.The reason for this may be insufficient precision of the data presented by CMORPH, PERSIANN, TRMM 3B42, and TRMM 3B42RT in daily scale.Since these conceptual models are concerned with physics of the problem, they cannot have a good performance with imprecise precipitations.While, the data-driven models with precipitation data that on average have a positive or negative bias towards real data can act properly.This means that, for example, if the data of PERSIANN generally estimate the precipitation 10 percent less than the actual value, the data-driven model, when they are trained with these data can act well at accuracy testing with data of the type of training data.The table below shows the relevant criteria of the efficacy.According to the results, the performance of data-driven models in most cases has improved than the second scenario.The reason of this may be found in the cause described a few lines earlier.Comparison of different models shows that the SVR has acted better than MLP and the results obtained by this model is superior to those of MLP in most cases.But this superiority is seen absolutely in comparison based on correlation coefficient, and based on other criteria none of the two models has decisive advantage on another.
Table 3 shows the relevant criteria of the efficacy.of precipitation data of satellite bases, the data resulted from the TRMM model acted better compared to other satellite data.This superiority is more obvious, especially in the simulation of the flow using SVR model based on the data of TRMM.In general, the accuracy of simulation of flow in the data-driven models using satellite data can be reported as proper.It is necessary to note that the data-driven models use the flow rate from previous days and this strength covers considerably the shortcomings in the precipitation data of satellite algorithm.However, the data entered to data-driven models are much less accessible and lower the conceptual models.
Figure 3 shows the time series simulated using data of CMORPH satellite.Based on figure, there is a good agreement between the observed and simulated time series.SVR model has overestimated a little the simulated value in the training stage.This overestimation is also observed in MLP model, but it is milder.

4.Conclusion
Simulation of the river flow is one of the most important fields of research on water resources.This simulation is usually carried out in the form of a relationship between precipitation and runoff.Therefore, having the amount of precipitation, we can estimate the flow rate.Thus in the research, we studied the modeling of the rainfall-runoff process at Gorganroud watershed.Since the elimination of the effect of human activities and construction of water control structures are often not properly done, in this paper, Tamer catchment which is located at the upstream of Goranroud watershed was considered as the sample of the watershed and the rainfallrunoff modeling was done based on this catchment.
Another important element of this study was the evaluation of the accuracy of the satellite data in the simulation of the rainfall-runoff process.Therefore, the daily information provided by four satellites presenting the precipitation rate, including CMORPH, PERSIANN, TRMM 3B42 and TRMM 3B42RT was used.Also, the models used in this research included a conceptual model and the two data-driven models.The used conceptual model was HEC-HMS, and the data-driven models were MLP and SVR.The results were discussed in three general parts.At the first part, the accuracy of daily precipitation was carefully examined; then the performances of the three models of HEC-HMS, MLP and SVR were studied and the efficacy of the satellite data in the simulation of rainfallrunoff was investigated.In this method, the satellite data were used both at the calibration stage and in the stage of accuracy verification.
The results of investigation based on five criteria of correlation coefficient, RMSE, POD, FAR and CSI showed that the satellite data of TRMM 3B42 had the highest correlation with the land precipitation during the water years of 2003-2004 to 2007-2008.Based on POD criterion, the data of PERSIANN model with POD equal to 0.602 had better performance than other models.Also based on the FAR criterion, the PERSIANN model had the best performance among the four studied algorithms and based on the CSI, the PERSIANN model has acted better than other models.
The results showed that the satellites have reported the amount of precipitation less than the real amount.In this regard, the total observed precipitation during the study period was 3036 mm, while the bases of CMORPH, PERSIANN, TRMM 3B42 and TRMM 3B42RT have reported the total precipitation equal to 1520, 2536, 2924 and 2242 mm respectively.These bases have reported the data 1515, 500, 112 and 794 mm less than the observed values.Therefore, based on this criterion, the data given by TRMM have worked better than others.In the monthly scale that the comparisons were done based on correlation coefficients and RMSE, the TRMM 3B42 database with correlation of 0.64 and RMSE equal to 27.7 has the best compatibility with precipitation data of Tamer station.
Based on the modeling done, the models of SVR and MLP had a good accuracy.However, this good accuracy of MLP and SVR can be attributed to the flow rate of earlier days which have been considered as the input of the model.Because at locations that the peak of flow rate has occurred and the amount of precipitation corresponding to it must provide the prerequisites for estimating a reasonable amount for the models, the performance of precipitation data shows itself and the flow rate is underestimated.This case is more visible in the third scenario.
In sum, based on the simulations done, we can state that the performance of the data-driven models has been better than that of conceptual model.Also, the simpler and faster usage, and the need to less data are the other strengths of these models.The SVR and MLP models both had acceptable performance, however in a severe investigation, the SVR model has acted better than MLP model.But it is necessary to note that the use of MLP is slightly faster than SVR, and also to obtain an optimal model, it is necessary that we only optimize the number of neurons, but in SVR we must determine the three parameters of C, ε, and γ properly in order the model has a good performance.

Figure1:
Figure1: Daily precipitation of Tamer station along with precipitation reported by satellite data station

Figure 2 :
Figure 2: Monthly precipitation of Tamer station along with precipitation reported by satellite data station

Figure 3 :
Figure 3: The simulated time series of the flow by models of MLP and SVR according to CMORPH satellite data

Figure 4 :
Figure 4: The simulated time series of the flow by models of MLP and SVR according to PERSIANN satellite data

Figure 5 :
Figure 5: The simulated time series of the flow by models of MLP and SVR according to TRMM satellite data

Figure 6 :
Figure 6: The simulated time series of the flow by models of MLP and SVR according to RT_TRMM satellite

Table 1 :
The values of the efficacy criteria for comparing satellite data with daily precipitation data at Tamer station

Table 2 :
The values of the efficacy criteria for comparing satellite data with monthly precipitation data at Tamer station

Table 3 :
The criteria of the efficacy in the simulation of the flow by different models based on satellite data