APPLICATION OF VARIOUS OPEN SOURCE VISUALIZATION TOOLS FOR EFFECTIVE MINING OF INFORMATION FROM GEOSPATIAL PETROLEUM DATA

This study emphasizes the use of various tools for visualizing geospatial data for facilitating information mining of the global petroleum reserves. In this paper, open-source data on global oil trade, from 1996 to 2016, published by British Petroleum was used. It was analysed using the shapefile of the countries of the world in the open-source software like StatPlanet, R and QGIS. Visualizations were created using different maps with combinations of graphics and plots, like choropleth, dot density, graduated symbols, 3D maps, Sankey diagrams, hybrid maps, animations, etc. to depict the global petroleum trade. Certain inferences could be quickly made like, Venezuela and Iran are rapidly rising as the producers of crude oil. The strong-hold is shifting from the Gulf countries since China, Sudan and Kazakhstan have shown a high rate of positive growth in crude reserves. It was seen that the global oil consumption is not driven only by population but by lifestyle also, since Saudi Arabia has a very high rate of per-capita consumption of petroleum, despite very low population. India and China have very limited oil reserves, yet have to cater to a large population. These visualizations help to understand the likely sources of crude and refined petroleum products and to judge the flux in the global oil reserves. The results show that geodata visualization increases the understanding, breaks down the complexity of data and enables the viewer to quickly digest the high volumes of data through visual association.


INTRODUCTION
Visualization is a graphical way of presenting data to enable qualitative as well as quantitative comprehension.It is a tool through which the viewer can identify patterns and relationships in the data and generate inferences (Barik et al., 2017).The simplest visualisations of tabular data can be in terms of graphs, figures, flow-charts, etc.But when the data has geospatial content, then the visualisation should include maps and charts.Location-based information is of prime importance in all geospatial studies.Non-spatial visualisation techniques do not highlight the location-related aspects of geodata.Therefore, the concept of Geodata Visualisation emerged as a specialisation, which uses tools for mapping, computer science and programming.Geo-visualization enables the exploration of information from several perspectives and through several complementary representations (Cartwright et al., 2001).It exploits the graphics performance capabilities of computers (Xiao, Yan, & Zhang, 2010).Visualisations can be static or dynamic / animated based on the content.They can be interactive to enable additional exploration by the viewer (Jin & Liu, 2009).It has a significant role in mining and visualisation of Big-data for finding effective solutions for location-based services.Petroleum and related products are a major source of energy across the globe and comprise of a multi-billion dollar industry.Huge logistics are involved in extraction, shipping, refinement and consumption of petroleum-based energy since these resources are distributed across different continents.The global petroleum data is a geospatial data and cannot be comprehended through only tabular presentation.To facilitate the same, Li et al. (2017) have developed a Web-Based Visual and Analytical Geographical Information System (GIS) for display and visualisation of Oil and Gas Data.
Petroleum products cater to every nation's energy needs, more so for growing economies, where the rapid growth and improving prosperity fuel the growth in energy demand.Several studies were carried out on these aspects related to global oil trade.
China is one of the leading consumer as well as exporter of petroleum products.A conceptual framework for finding the trade patterns in crude oil imports of China for the duration from 1992 to 2015 was laid down by (Shao et al. 2017).They found that China's crude oil import was largely associated with demand, supply and price of exporting countries and bilateral trade relationships from the Middle East nations.However, only bilateral trade relationships affected its imports of crude oil from the Asia-Pacific nations.
On the other hand, Saudi Arabia has been traditionally the leader in exporting of crude oil.Krane (2015) observed that Saudi Arabia's role in global energy markets was changing from being a simple exporter of crude oil to a supplier of refined petroleum products.This change is commensurate with the typical development trajectory of a state progressing to a more advanced stage of global economic integration.Iran is also a major exporter of crude oil.A study relating to the error-correcting macro econometric model for Iranian economy set over the period  showed that its national economy was affected by oil exports and foreign outputs in a long run (Esfahani, Mohaddes, & Pesaran, 2013).
It is evident that the petroleum data is geo-spatial and needs to be treated beyond mere statistical analysis and number crunching.Creation of maps will enhance the understanding and bring out important facts to notice.With this presumption, the present study was undertaken with the objective to evaluate various open source tools for Geodata Visualisation of global petroleum statistics to bring out subtle inferences, those were not easy to comprehend from tabular data.

Datasets
BP Statistical Review of World Energy, June 2017 is an annual report published by British Petroleum (BP, 2017).It includes the global statistical data from 1996 to 2016, presented in tables and graphs.The data is available in PDF as well as MS Excel formats.Data of only crude and refined oil was taken from this report for the study.It covers more than 60 countries across all continents.Vector map depicting the boundaries of all the countries of the world was downloaded from Natural Earth (Natural-Earth, 2018) in the form of a Shapefile (Ver.4.0.0.) and was further used for visualisation.

Tools
Open source tools were used for generating maps and other visualisations.QGIS 2.1.8(QGIS, 2018a) was used to generate pie-chart maps, 3D-extrusion maps, multi-variable maps and graduated symbol maps.StatPlanet tool (StatSilk, 2016) was used to generate choropleth maps.Multiple time-series choropleth maps were animated and converted into gif format.R software (R, 2018) was used to generate Sankey maps.Google Visualisation API (Google, 2018a) was used for the same in conjunction with base-map from OpenStreetMaps and Leaflet libraries of JavaScript (Leaflet, 2018).Google Earth Pro (Google, 2018b) was used to create 3D extrusions on the virtual globe.Graphs were generated using Libre Office tools.

Visual Variables
Maps use various visual variables to create and differentiate symbols on a map.The statistical data of oil being quantitative, the most suitable visual variables to represent the data are size and value (Halik, 2012).Hence, in this study, the visual variable of 'size' was used in graduated symbol maps and in 3D extrusion maps.Choropleth maps generally use value or colour for differentiating classes.In this study, the 'colour' variable was used for making choropleth maps.

Choropleth Maps:
Choropleth Maps display divided geographical areas or regions that are coloured, shaded or patterned in relation to a data variable.It shows relative variation in the values of the variable.The data is divided into a fixed number of classes using data classification techniques like equal interval, quantile and natural breaks, etc.These classes are differentiated by varying shades of a single colour or each class could have a distinct colour.In this study, the data was divided into 5 classes using natural breaks and each class was allotted a different colour.The colours for the choropleth were chosen in such a way that warm colours (red, orange) were assigned for the higher values and cool colours (green, blue) were assigned for the lower values for better discrimination.After generating the choropleth maps they are then used for data analysis and mining (Korycka-Skorupa & Pasławski, 2017).
StatPlanet software was used for generating choropleth Maps and Bar graphs.Maps were made for each year from 1996 to 2016.
They were then animated into a sequence for visualisation of changes over a span of 20 years and converted into gif format.

Graduated Symbol Maps:
In a graduated-symbol map, the values of the variable to be displayed are divided into distinct classes.A common symbol (e.g.circle, triangle) is selected to display the data.The size of this symbol is varied is an increasing order, relative to the mean value of each class.Therefore, the class with the smallest value is attributed to the smallest size of the symbol.The size of the symbol goes on progressively increasing thereafter.In this study, QGIS software (QGIS, 2018b) was used to divide the data into 5 classes using naturalbreaks and black circles were used to display the variables.

Multi-variable Hybrid Maps:
When more than one variables are to be displayed, a combination of different techniques is done to generate a Hybrid map.In this study, using QGIS (QGIS, 2018b), one variable was displayed as a choropleth map and simultaneously in the same map, another variable was displayed as a graduated symbol map.One can also use 2.5D extrusion in hybrid maps.

3D-Extrusion maps:
Qgis2threejs is a plugin of QGIS, which delivers output in HTML form and can depict one variable in 3D extrusion.Using this technique, in this study, two variables were simultaneously depicted.One variable was exhibited as a choropleth map and another variable was extruded in 3D (Armitage, 2017).

Pie-Chart Maps:
Pie charts allow visualisation of the share between two opposing attributes such as import vs export of a country (Li et al., 2017).These are suitable for simultaneous comparison of mutually exclusive multiple variables.In this study, pie-chart maps were generated using QGIS for comparing import of crude oil and export of refined products 2.4.6 Sankey Diagrams: Sankey diagrams are a specific type of flow diagram, in which the width of the arrows is shown proportionally to the flow quantity.They are helpful in locating dominant contributions to an overall flow.Sankeys is best used for mapping many-to-many relations or multiple paths through a set of stages.Both these instances have been used in this study and Sankey diagrams were generated using R software.The basemap was used from OpenStreetMaps and the visualisation was facilitated using Google Visualisation API and Leaflet libraries for JavaScript.

Virtual Globe:
Google Earth Pro virtual Globe was used to create 3D extrusion maps.The shapefile was loaded in this software and random colours were selected to depict different countries.The value of the desired attribute was used for extruding the polygon of each country to create these maps.

Line Graphs:
Line graphs were generated with oil data from 1980 to 2016 to analyse the oil trade movement and oil production.The x-axis represented the time period from 1980 to 2016 and the y-axis represented the corresponding attribute to be analysed.

Methods
The procedure followed in the creation of visualisations is depicted in Figure 1.The boundaries of India in Natural Earth World Map were rectified from the map of India downloaded from Indian Remote Sensing (Singh, 2017).Oil statistics were taken from the BP Report in MS Excel format.The spellings of the names of countries as mentioned in the BP data were amended as per those mentioned in the attribute table of the Natural Earth World Map shapefile.After this, the data from MS Excel sheet was merged into the attribute table, thereby rendering the shapefile with data of the world countries and oil statistics.This shapefile was then used for generation of various visualisations.The activities involved in the petroleum trade are depicted as a flowchart in Figure 2. Some of these steps are statistically analysed in this study using the techniques of geodata visualisation.

Petroleum Chain
Refining of crude oil is undertaken in refineries.Setting up of refineries involves advanced technology and huge resources.Hence, not all nations that harvest crude oil can refine it entirely.Also, there are developed nations, which have fewer reserves but more refining capacity.Hence, these nations import crude oil and refine it for further consumption.The final products are exported to all the other nations.

Reserves of Crude Oil
Choropleth maps of proved reserves of Crude Oil from 1996 to 2016 were compared.Figure 3 depicts the status of oil reserves in 1996 when Saudi Arabia was the leader in oil reserves, followed by Russia, Iran, Iraq, UAE, Kuwait and Venezuela.Thus reveals that the maximum oil reserves were predominantly in the middle-east and Russian Federation.Each year newer stocks of crude oil are discovered as more and more explorations are undertaken.Therefore, the status of reserves changes.The status of proved oil reserves in 2016 is depicted in Figure 4, which reveals that the maximum reserves are with Venezuela, followed by Saudi Arabia, Canada, Iran and Iraq; all having more than 150 thousand million barrels of crude oil reserves.This depicts a big disruption in the monopoly of the middle-east nations since the American nations of Venezuela and Canada have gained prominence. Post-2000, Iran and Canada have made several significant discoveries leading to increase in their proven reserves.The quantity of Russian reserves is near about same in the past 20 years, but other nations have surpassed this value and hence, Russia has slipped to the sixth rank.
The decadal change in crude oil reserves from 2005 to 2015 was expressed as percentage growth and was depicted as a choropleth map in Figure 5. Highest positive growth (above 10 %) was exhibited by Venezuela, Kazakhstan and Sudan.Following it is China with 5.1 % growth since China is conducting extensive exploration in the South of China Sea.Several countries exhibited negative growth, since, they have extracted their limited reserves.Some such countries are Denmark, Mexico, Equatorial Guinea and U.K.

Extraction of Crude Oil
The crude oil has to be extracted from the reserves under the surface of land and ocean.This depends on the degree of difficulty rendered by the location and the sophistication of infrastructure available with each country.Hence the status of reserves is different from the ability to extract crude oil. Figure 6 depicts the relative capacity of different nations to extract crude oil.This depiction is done by 3D extraction of data on the Virtual Globe using Google Earth Pro software.One finds that the leading nations are USA, Canada, Russia, Saudi Arabia, Iran, Iraq, Kuwait and China, who extract more than 3,000,000 barrels per day.

Refining of Crude Oil
The capacity to refine crude oil is depicted as throughput in a graduated symbol map shown in Figure 7.It reveals that the highest refining capacity is with the USA followed by China, Russia, India, Japan and South Korea.
Of these, except Russia, none of the nations has very high crude reserves.Hence they are major importers of crude oil and obviously have a huge infrastructure in terms of ports, oils storages, refineries and pipeline network to transport the petroleum products.

Refined Products
A time-series analysis of the output of refined products in thousand barrels per day from 1965 to 2016 is depicted in Figure 10.Geospatial analysis of time series requires animation and so cannot be depicted in print media.For the same, line-graphs were found to be effective (Shumway & Stoffer, 2011).
Figure 10 shows that the USA has been consistently the leader.However, China has rapidly enhanced its prowess producing refined products post-2000.Erstwhile USSR had a significant capability of refining.However, post-split, the capacity of Russia decreased as a significant chunk of its assets went to Kazakhstan.Next in order are India, Japan and South Korea.These three nations import a significant amount of crude oil and refine it themselves.
Figure 10.Time-series depiction of the generation of refined products from 1965 to 2016.

Consumption of Refined Products
Some of the important refined products from oil are petrol, diesel, kerosene, aviation turbine fuel, lubricants, plastic, tar, coke, etc.These are consumed as well as exported by the nations that generate them.Ideally, one would surmise that the extent of consumption of petroleum products would be governed by the population.Figure 11 is a hybrid map that compares the population and the consumption rate of petroleum products by nations.The colours depict population, with a red showing veryhigh population (China and India), followed by brown colour showing high population (USA, Indonesia, Brazil, Pakistan), yellow colour exhibiting moderate population (Russia, Mexico) and green colour depicting less population (Saudi Arabia and Canada).The consumption of petroleum products is depicted by 3D extrusion.Higher the extrusion more is the consumption.One can infer the consumption is not related to population.The USA has the highest consumption, despite lesser population than that of India and China.The consumption by Saudi Arabia, Russia and Canada are much higher with respect to their population.Higher consumption of India and China can be attributed to very high populations.However, the consumption of Brazil and Indonesia are much lesser compared to their populations.Therefore, it is inferred that the lifestyle of the people and the degree of development of a nation that decides its consumption of petroleum products and not its population.

Export of Refined Products
Refined products are exported mostly to the European nations and other nations in Africa and Asia-Pacific region.The flow of refined products from the major suppliers is depicted through a Sankey Diagram in Figure 12.   Figure 14 shows the share of exports of crude oil and refined products from major counties/zones in form of a pie-chart map.The circular pie for each country has a size relative to the total value of trade in thousand barrels per day.Larger the size more is quantum of trade.The green component depicts the export of crude oil, while the orange component describes the export of the refined product.Certain pies are displaced from their geographical position to avoid cluttering.In such cases, these pies have a line depicting the location of that country.Also, the names of these countries are annotated next to the pies.Analysis of Figure 14 reveals that Singapore, India, Japan and China have fewer reserves and so do not export crude oil.But they import crude oil, refine the same and then export the valueadded refined products.Canada, South America, Mexico, Russia, Kuwait, UAE and Saudi Arabia have significant amounts of crude reserves.So they not only export crude oil but also refine the crude oil and export refined products to the world.Iraq is a unique nation that does not export its crude oil.It refines all its crude oil and exports only the refined products.The USA and European nations export less of crude and more refined products.

Inferences
From the visualisations displayed above, a summary of the major importers and exporter nations/regions is depicted in Error!Reference source not found..The above-mentioned study can help different nations to find a correct supplier for their needs.If a nation requires to purchase crude oil, then it should look for such nations that not only have large reserves but is also able to extract sufficient crude oil and also willing to export it.Similarly, if a nation wants to purchase refined products, then it has to look for those countries which have more refining capacity and are willing to sell the products.The list of such countries is listed in Table 1 The criteria to select the appropriate best supplier depends on the cost of crude/products, quality of the crude oil, a distance of the source nations from the consumer nation and the transportation costs.
Though Saudi Arabia is the leading exporter of crude oil as on date, there are other nations which are carrying out off-shore explorations in different areas and increasing their stocks of reserves.These are countries like Venezuela, Kazakhstan, Sudan, China, Canada, Iran, etc.These countries show a promising future for the supply of crude oil.
There are several under-developed nations in Africa and South & Central America, which have significant amounts of crude reserves.As and when the infrastructure to extract crude oil improves in these nations, they would gain significance as exporters of crude oil.
The refining capability is presently available in abundance with developed nations.The developing nations are increasing their refining capabilities.The under-developed nations lack this infrastructure and have to rely on the suppliers of refined product.India and China have shown a significant surge increase in refining capacities post the year 2000.USA, China, India, Japan and Europe are major importers of crude oil.They refine the same and consume a huge share of it owing to huge populations and life-style.Thereafter, they export the refined products.Singapore is a major consumer of refined products.Another significant finding is that the consumption of petroleum products is not governed by population, but by the lifestyle.

CONCLUSIONS
A picture is worth a thousand words.Hence, depicting information in a map is easier for comprehension as compared to sifting through volumes of text or tabulated data.This study reiterates the fact.Usage of different types of maps could ease the understanding of complex and voluminous data on the global oil statistics.

Figure 2 .
Figure 2. Flowchart of activities in Petroleum trade.

Figure 3 .
Figure 3. Statistics of proved oil reserves in 1996

Figure 4 .
Figure 4. Statistics of proved oil reserves in 2016.

Figure 5 .
Figure 5. Percentage growth in crude-oil reserves from 2005 to 2015.

Figure 6 .
Figure 6.Capability to extract crude oil expressed in 3D extrusion on Google Earth Pro.

Figure 7 .
Figure 7. Refinery throughput (Thousand barrels per day)The producers of crude oil, export it to the refining nations.The flow of crude oil from major producers is depicted in a Sankey diagram in Figure8.On the left are the producers/exporters of crude oil, while on the right are the importers of crude oil.The thickness of the arrows depicts the relative share of the crude oil trade.

Figure 8 .
Figure 8. Major importers of crude oil Saudi Arabia is the single largest country that exports crude oil.The geospatial distribution of these exports are graphically depicted in Figure 9 in form of a Sankey Diagram.The thickness of the arrows depict relative quantum of export of crude oil.Figure9reveals that major quantum of export is to USA, China, Japan, European nations and India.Less quantum of exports are towards Canada, South Africa and South and Central America.This is evident because, these nations have their own resources of crude oil.
Figure9reveals that major quantum of export is to USA, China, Japan, European nations and India.Less quantum of exports are towards Canada, South Africa and South and Central America.This is evident because, these nations have their own resources of crude oil.

Figure 9 .
Figure 9. Export of crude oil from Saudi Arabia.

Figure 11 .
Figure 11.Hybrid map depicting population in colours and consumption of refined products as 3D extrusion.

Figure 12 .
Figure 12.Export of refined products from major sources.The major exporters of refined oil are Russia, USA and Europe.Russia exports a limited amount to the USA, but a very significant share to European countries.USA exports to Europe, South American nations and Mexico, who have limited refining capacity.Europe exports refined products to South Africa, Singapore and Asian countries.

Figure 13
Figure 13 depicts the Sankey diagram showing the geographical movement of refined petroleum products from USA to South and

Figure 13 .
Figure 13.Export of Refined Products from USA.

Figure 14 .
Figure 14.Comparison of the export trade of crude and refined products from major sources.

Figure 15 .
Figure 15.Major exporters and importers of Oil and refined products.

Table 1 .
. Major suppliers of crude oil & refined products.