SEASONAL COMPARISON OF ERA-INTERIM PRECIPITATION DATASET FOR ENTIRE INDIAN REGION

Era-Interim (ECMWF Reanalysis) is a global reanalysis atmospheric product which is being continuously updated in real time since 1979. It is also termed as third generation reanalysis product. The Era–Interim gives meteorological products like precipitation, temperature, etc. In the present work, 3-hourly Era–Interim product for the entire India is compared with gridded data provided by IMD for period 1979 2013 and APHRODITE data for period 1979 – 2007, respectively. The comparison is done on seasonal basis and the seasons are taken based on the pattern of rainfall, hence, the four seasons selected are DJF (December, January & February), MAM (March, April & May), JJAS (June, July, August & September) and ON (October & November). In the methodology the EraInterim 3-hourly products are converted into the daily products and then it is used to form seasonal images for each year. All the images are then taken to form four images as outcome for the entire study period which represents the average rainfall (mm/day) for the entire region. This is being done for the IMD and for the APHRODITE Data. All the four images are then taken for the comparison with the reference images of the IMD 0.5o x 0.5o gridded rainfall data and with the APHRODITE 0.5o x 0.5o gridded rainfall data. The correlation coefficient and the RMSE for each season is calculated. The mean value is compared with the mean of IMD and APHRODITE rainfall products, respectively and a bias in mean is also calculated along with the scatter plots of EraInterim with the reference datasets. The Era – Interim data came out with suitable comparative parameters with high correlation coefficient and low RMSE value in certain regions and in specific seasons. Scatter plots have also given good correlation in all the seasons. Bias maps have also shown very less bias in specific seasons for certain regions. The suitability maps prepared for the study region also shows that most of the region lies in most suitable range and very less in unsuitable range.


INTRODUCTION
The Indian monsoon is an important component of Earth's climate system (Mitra et al., 2013).Hence the accurate forecasting of its mean rainfall always remains very essential for regional food and water security (Kaur & Kaur, 2017).The summer monsoon season is termed as southwest monsoon as it is based on the direction of surface winds (Gadgil & Rajeevan, 2008).Rainfall during the Indian summer monsoon season shows considerable spatial and temporal variability (Dash et al., 2013).Accurate measurement of rainfall is very important for water-related applications, the evaluation of numerical models and also detection and attribution of trends, however, a variety of freely available gridded rainfall datasets are there for these purposes (Prakash et al., 2015).Hence here the focus is on one such product i.e. 3 hourly Era-Interim metrological dataset.The precipitation parameter of this dataset is taken under consideration along with 0.5º x 0.5º IMD gridded rainfall dataset and 0.5º X 0.5º APHRODITE dataset.

Seasons of India based on precipitation
India has huge variations in its climate and if we take into account the variations in its elements i.e. changes in air temperature, amount of rainfall, changes of air current, etc., then the climate of India can be classified into 4 (four) seasons as: *

December-January-February (DJF)
The year's coldest months are December and January, when temperatures average remains around 10-15 °C in the 1.1.3June -July -August -September (JJAS) This is the monsoon season in the entire India.The season is dominated by the humid southwest summer monsoon, slowly extends across entire India beginning in early June.Monsoon rains begins receding from North India at the beginning of October.It is dominated by humid winds.

October -November (ON)
It is the post monsoon season.It contains less rainfall across the country, however, still have plenty of showers in the southern coastal region and somewhat in north western India.State of Tamil Nadu receives most of the rain in this season.The main objective of the study is to identify the effectiveness of Era-Interim dataset for precipitation information on the seasonal and regional basis over entire Indian region.

Statistical Parameters
The effectiveness of Era-Interim dataset over the Indian region is calculated on the basis of the statistical parameters viz.Root Mean Square Error (RMSE), Correlation Coefficient, Bias, etc.These statistical parameters are computed for Era-Interim dataset with reference to IMD and APHRODITE datasets, respectively.Brief description and significance of each statistical parameter used in the present study is given below:

Correlation coefficient
It is a statistical measure of degree to which changes in values of one variable predict changes in values of another.In positively correlated variables, the value increases or decreases in tandem.Its value lies between +1 and -1.A zero value of correlation coefficient indicates that there is no relationship between fluctuations of the variables (Rouse, 2018.).

Root mean square error (RMSE)
It is the standard deviation of the residuals (prediction errors).It is a measure of spread of residuals.It is widely used to measure performance in meteorology, air quality and climate change studies (Chai & Oceanic, 2015).The smaller the RMSE the better the claims lies.Its value depends upon the range of values considered in its calculation.

Scatter plots
It is a two dimensional graph containing the values of two variables plotted along each axes.It shows the general illustration of the relationship between the two variables (Diana Mindrila, 2003).It shows how well one variable is correlated with the other.

Bias
It is a feature of a statistical technique which tells how much the expected value of the results differs from the true value being estimated.It is said to be unbiased if the value comes out to be zero (Šimundić, 2013).

Study Area
The region selected for the study is entire India and the analysis is being carried out for years 1979 to 2013.India lies in the northern and eastern hemispheres.The main land of India basically extends from 8º 4' 28" N to 37º 17' 53" N latitudes and from 68º 7' 53" E to 97º 24' 47" E longitudes Figure 1).It is a unique region which contains very hot and very cold areas as well as regions with very heavy and very scanty rainfall.The Indian monsoon system is considered as the most prominent monsoon in the world monsoon system as it primarily affects India and its surroundings.It is considered as a vast climate monsoon system due to its variability (Wang, 2005).Indian monsoon contains unique weather phenomenon with seasonal reversal of winds.It also shows sudden onset, gradual advance and gradual retreat.The variation of monsoon over the entire India is regional and temporal, hence it becomes quite suitable to study the effectiveness of any climate dataset over India.

Data Used
The datasets used in the study are Era-Interim, APHRODITE and IMD (India Metrological Department).ERA-Interim is atmospheric reanalysis produced by the European Centre for Medium-Range Weather Forecasts (ECMWF), an independent intergovernmental organisation supported by 34 states, covers the period from 1 January 1979 onwards.Reanalysis of data provides a multivariate, spatially complete, and coherent record of the global atmospheric circulation (Balsamo et al., 2015).APHRODITE'S (Asian Precipitation -Highly-Resolved Observational Data Integration towards Evaluation) daily precipitation is the only available long-term (1951 onward) continental-scale gridded product.It contains a dense network of daily rain-gauge data for Asia including the Himalayas, South and Southeast Asia and mountainous areas in the Middle East (Kamiguchi et al., 2010).India Metrological Department (IMD) is the premier agency in India responsible for metrological observations, weather forecasting and seismology.IMD dataset is based upon the observational data recorded at various types of surface and upper air observations.The datasets used in the present study along with their spatial and temporal resolution are shown in the Table 1.Table 1 Datasets used in the present study Figure 1.Geo-Political Boundary of India, the Study Area.

Software Used
The data available is processed using different sets of software for analysis, visualising and making maps and generating results.The software used in the present study are ArcGIS 10.3, Python 2.7 and IMD_data_converter_50.Python is an interpreted high level programming language which contains enormous number of libraries for image processing and analysis purpose.Due to its interpreter based code execution it becomes easier to incorporate Python in software like Arc GIS, QGIS etc.The image processing libraries used are GDAL, OGR, OS, PIL, NumPy, NetCDF etc.It reads the image as array can process it and the output array can be written into image product.Python 2.7 is used in the present study of handling spatial dataset of IMD, Era-Interim and APHRODITE.IMD_data_converter_50 is a software developed in-house for converting the IMD gridded GRD format files into the point shape file.ArcGIS 10.3 is used in the present study for database management, compiling data creating and analysing.

METHODOLOGY
The present study is carried out in two steps (data preparation and data analysis).The data extraction and preparation is the voluminous task of the present study.Calculation of required statistical parameters is done with various temporal and spatial combinations once the data preparation was done.The broad methodology of the present study is shown in the Figure 2.

Data Preparation
As discussed earlier the data preparation was the voluminous task of the present study due to large spatial extent and temporal coverage of the study domain.Routines and sub-routines were developed in the Python 2.7 to automate repeated steps of data preparation.The methodology for the data preparation is described as below.

Era-Interim data conversion
The 3-hourly rainfall/precipitation dataset of Era-Interim which is reanalysed after every 12 hours is first converted to daily GeoTiff using Python script.The seasonal products for each year are then generated from these daily datasets.The seasonal products of each year are also averaged to form the mean average file for each season for the entire study period.

IMD data conversion
The IMD gridded data available in the GRD format is converted into point feature classes using the IMD_data_converter_50 tool.The point features are then clipped for the study region.Feature classes are further edited and new attributes of mean monthly precipitation and mean seasonal precipitation are added using Python Script.The mean seasonal attribute are then interpolated to form the GeoTiff Product for each season on yearly basis.The seasonal product of each year are also averaged to form the mean average file for each season for the entire study period.

APHRODITE data conversion
The 0.5° x 0.5 ° gridded Aphrodite data of Monsoon Asia is downloaded in NetCDF format.The NetCDF files are converted into the daily GeoTiff and averaged to form the seasonal file for each year using Python scripts developed for this purpose.The seasonal products of each year are utilised in making each season wise file to get the output for the entire study period.Then all files are utilised in generating file for each season.

Data Analysis
The seasonal product for each dataset as prepared in last section are taken for doing statistical analysis.First fishnet along with label is created for the same grid size as 0.5° x 0.5°.The value for each grid label is then extracted for all the seasonal products of each dataset.The pixel values of all the seasonal products from label attribute are used in finding the statistical parameters.Era-Interim data is taken as base data as it is a reanalysis product and reference data is taken as APHRODITE and IMD data, respectively as these are observational datasets.Scatter plots are also generated for IMD with Era-Interim and APHRODITE with Era-Interim to see the variability in pixel values for each seasonal image.

Final Seasonal Products
The study is carried out for four seasons and hence the statistical parameters are identified for all the seasons of the year.The data formed for comparison can be categorized and is shown below:

DJF (December -January -February)
The months of December (i.e. of last year) and January, February for the current year is taken to find a seasonal file which is then utilized to calculate the seasonal average file for the season DJF.It shows that major rainfall occurs in the region north western Himalaya and its value ranges from 2.71 -5.72 mm/day in all the three datasets.The seasonal file generated for all the three products are shown in Figure 3.

JJAS (June -July -August -September)
The months of June -July -August -September of the current year are taken together.This is the monsoon season for the entire Indian region and hence the rainfall lies between 25 -29 mm/day for the entire Indian Region.The rainfall products generated for the datasets is shown in Figure 5.The post monsoon months i.e.October -November of same year when taken together forms this season.This season also contains showers and is very important season considering the agriculture aspect.The rainfall occurs mainly in Tamil Nadu and Kerala and its values lies between 9 -10 mm/day.The rainfall products for this season is shown in Figure 6.The table values shows that the correlation coefficient for all the four seasons lies between 0.79 -0.89 with IMD data and it lies between 0.72 -0.87 with APHRODITE data.In the season ON (October -November) the Era-Interim datasets are highly correlated with both the reference datasets.

Root Mean Square Error (RMSE)
The RMSE between Era-Interim against IMD, and APHRODITE, respectively are also calculated considering all the pixels.The results are shown in the Table 3.

S.
No.

Mean
The mean value of the seasonal products for all the datasets is also calculated and is shown in the Mean of Era-Interim is also found to be relatively close to the mean of the reference data i.e.IMD and APHRODITE data, respectively.The mean annual rainfall for the entire India is found to be 3.14 mm/day from IMD, 2.85 mm/day from Era-Interim and 2.56 mm/day from APHRODITE if all days of years are taken as wet days.

Bias
The bias is calculated for the seasons based on the seasonal image that area generated as the outcome of the products.The bias is also calculated in the tabulated form and has been shown in the Mean bias of all the seasons are relatively close and it is bit high observed for the season JJAS with IMD data and for MAM with APHRODITE dataset.Pixel basis Bias for all the seasons are also computed.

Bias during December-January-February (DJF)
The bias found is very less for the entire season and still a bit high i.e. 3.2 mm/day is found in the north western Himalaya and in Arunachal Pradesh.The results are shown in Figure 7

Region Specific Suitability of Era-Interim
Suitability of Era-Interim rainfall dataset on the basis of regions is also identified and for this one product is generated for the entire study period for all the datasets.For considering the suitability following criteria is considered shown in Table 6.

Bias Range (mm/day) Property
-1 to 1 Most Suitable -2.5 to -1 && 1 to 2.5 Suitable -5 to 2.5 && 2.5 to 5 Less Suitable -14 to -5 && 5 to14 Not Suitable Table 6.Range considered for suitability The suitable region is shown in the figure 23 and figure 24 considering Era-Interim with IMD and APHRODITE respectively.Figures 23 and 24 shows that almost the entire Indian region is found to be most suitable, some portion in suitable region and very less region is found less suitable considering the Era-Interim with both reference data i.e.IMD and APHRODITE.
The suitability percent considering all the entire Indian Region is also calculated and is shown in the Table 7.The Table 7 shows that 83.57% and 86.56% of total area of India is found under most suitable class and very less i.e. close to 1% of total area lies in unsuitable class for Era-Interim rainfall dataset when considered with IMD and Aphrodite data respectively.

CONCLUSION
The current study was carried out to check the effectiveness of Era-Interim reanalysis precipitation data product over the entire India and also to check its effectiveness on the seasonal as well as on the regional basis.The correlation of Era-Interim with IMD as reference dataset for all the four seasons (i.e.DJF, MAM, JJAS, ON) lies between 0.81 -0.89 and mean correlation comes out to be 0.83.And the correlation for Era-Interim with APHRODITE data for all the four seasons lies between 0.71 -0.87 and have a mean correlation of 0.789.It shows that Era -Interim is well correlated with the reference rainfall datasets.The best correlation comes out for the months of October-November (ON) as it has lower rainfall and for the monsoon season which has very high rainfall for the entire Indian region its effectiveness lies in the acceptable limits.The Root Mean Square Error (RMSE) for Era-Interim considering the four seasons lies between 0.5 -2.66 mm/day with IMD and 0.52 -2.92 mm/day with the APHRODITE Data.The mean RMSE for the Era-Interim with IMD and APHRODITE came out as 1.495 and 1.677 mm/day respectively which is also good and acceptable value.The mean value of all season for all the three datasets also came out to be very close and hence it also gives the suitability to use Era-Interim on season basis.The bias map is also prepared which shows the region where there is maximum discrepancy in various seasons of Era-Interim dataset.On the basis of regions considering all the seasons Era-Interim is best for central India and also for the south -north region of entire India.The less suitable regions came out to be North-East Indian regions in MAM, JJAS and MAM.The region of North-Western Himalayas is also found to be less suitable in DJF season.And the Western Ghats are found less suitable for JJAS season.Bias maps has also shown that over the entire India there is very less biasness and only few sites and that too season specific have little high discrepancy.Scatter Plots have also shown the good relationship of this reanalysis product with the reference datasets.Considering its suitability on the basis of region it has been found that this dataset is most suitable for the almost the entire Indian region except some stretches.It is less suitable for the regions of North-East India.States lying in North Western Himalaya comes in the suitable range.The Western Ghats regions area also shows less effectiveness for this reanalysis product still it can be used as very few pixels are found in the not suitable range.The Table 7 also shows its suitability for the entire India in percent area and is 83 -86% lies in most suitable region.ERA-INTERIM product is highly correlated with standard datasets over the entire India and its utility and its uses is very important.

Figure 3 .
Figure 3. Seasonal rainfall during DJF in [a] IMD Product, [b] APHRODITE Product and [c] Era-Interim Product.4.1.2MAM (March -April -May) The months of March, April and May are taken together to form a single file for this season on yearly basis.The areas of north eastern India receives high rainfall and its value lies between 12 to 22 mm/day on the basis of all the three rainfall datasets.The rainfall products generated for this season are shown in the Figure 4.

Figure 4 .
Figure 4. Seasonal rainfall during MAM in [a] IMD Product, [b] APHRODITE Product and [c] Era-Interim Product.

Figure 5 .
Figure 5. Seasonal rainfall during JJAS in [a] IMD Product, [b] APHRODITE Product and [c] Era-Interim Product 4.1.4ON (October -November)The post monsoon months i.e.October -November of same year when taken together forms this season.This season also contains showers and is very important season considering the agriculture aspect.The rainfall occurs mainly in Tamil Nadu and Kerala and its values lies between 9 -10 mm/day.The rainfall products for this season is shown in Figure6.

Figure 6 .
Figure 6.Seasonal rainfall during ON in [a] IMD Product, [b] APHRODITE Product and [c] Era-Interim Product 4.2 Statistical Parameters4.2.1 Correlation coefficientThe correlation coefficients for Era-Interim with IMD and APHRODITE are calculated considering all the pixels of the generated products.The correlation coefficient are shown in the Table2.

Figure 7 .Figure 8 . 9 Figure 9 .
Figure 7. Bias in daily rainfall between Era-Interim and [a] IMD [b] APHRODITE Product for the season DJF

Figure 14 .
Figure 14.Scatter plot between daily rainfall in ON season of Era-Interim and IMD rainfall products Scatter Plots for all the season are shown in the Figures11, 12, 13 and 14.All the scatter plots shows very good relationship of Era-Interim with IMD dataset.

Figure 15 .
Figure 15.Scatter plot between daily rainfall in DJF season of Era-Interim and APHRODITE rainfall products

Figure 18 .
Figure 18.Scatter plot between daily rainfall in ON season of Era-Interim and APHRODITE rainfall products

Table 3 .
RMSE between Era-Interim against IMD and APHRODITE rainfall products for all the seasons RMSE value lies between 0.51 -2.66 for ERA-Interim and IMD and it lies between 0.52 -2.92 for Era-Interim and APHRODITE data.

Table 4 .
Mean seasonal rainfall (mm/day) estimated from Era-Interim, IMD and APHRODITE products

Table 7 .
Suitability in percent with reference data