Realtime Planetary-Scale Datacube Fusion

Abstract. The datacube model has rapidly gained acceptance as a cornerstone for analysis-ready data, and also the corresponding service model which is more powerful, but easier to use than existing API-based interfaces. Mature and widely adopted datacube standards show the way how datacube functionality can be presented to clients, be they human or m2m connections. Specifically, the OGC Coverage Implementation Schema (CIS) data model and the Web Coverage Service (WCS) service model suite define a framework adopted by the main opensource as well as proprietary tools, including MapServer, GeoServer, GDAL, QGIS, ArcGIS, and python/OWSlib.In this contribution we present federation methods implemented in the rasdaman datacube engine which is accepted technology leader, standards driver, and reference implementation for datacubes.In summary, rasdaman allows federated processing of declarative queries whereby complete location transparency is given: any federation member can receive the query, and the federation will dynamically orchestrate each query individually for optimized processing.



INTRODUCTION
The datacube model has rapidly gained acceptance as a cornerstone for analysis-ready data, and also the corresponding service model which is more powerful, but easier to use than existing API-based interfaces. Mature and widely adopted datacube standards show the way how datacube functionality can be presented to clients, be they human or m2m connections. Specifically, the OGC Coverage Implementation Schema (CIS) data model and the Web Coverage Service (WCS) service model suite define a framework adopted by the main opensource as well as proprietary tools, including MapServer, GeoServer, GDAL, QGIS, ArcGIS, and python/OWSlib. While the basic, most widely used functionality includes access, possibly extraction, and reformatting of a data sub(set) for download, processing and in particular data fusion represent complex, resource-intensive challenges. In the WCS suite this is addressed through a concept of modularity where WCS Core (which is mandatory for any WCS to implement) offers the GetCoverage request to accomplish "give me part of this spatiotemporal coverage, in my favourite format" and (optionally implementable) extensions add further functionality facets, up to the spatio-temporal datacube analytics language, Web Coverage Processing Service (WCPS) .
For example, computing the NVDI over Europe on the 1 st of July 2018 using Sentinel data stored in a datacube with the same name, and returning the result as a NetCDF file, can be achieved with the following WCPS query: As WCPS allows combination of datasets ("joins" in database terminology) the question arises: what if datacubes to be combined reside on different computers (cloud scenario), or even in different data centers (federation scenario)? A suboptimal implementation obviously could cause massively degraded performance.
In this contribution we present federation methods implemented in the rasdaman datacube engine which is accepted technology leader, standards driver, and reference implementation for datacubes.

RASDAMAN ARCHITECTURE
The rasdaman engine resembles a fully-fledged Array Database System which utilizes a declarative query language enhancing standard SQL with high-level array operators; in fact, ISO in 2018 has adopted this language as an extension to the SQL language standard (ISO, 2019). This datacube language is domain agnostic and can serve as well medical imagery, cosmologic simulation results, and the like. A semantic layer on top of it offers geo semantics, i.e., it knows about spatial and temporal coordinates and, hence, also about regular and irregular grids. This semantics is offered via OGC WCS, WCPS, and WMS interfaces so that a plethora of clients can utilize massive datacubes managed by rasdaman. All such requests are translated to array SQL queries internally.
In the server, such incoming queries are optimized, parallelized, and ultimately mapped to datacubes that are tiled on disk (or tape). As opposed to standard regular tiling (i.e., equi-sized tiles) rasdaman allows for arbitrary tiling guided by several strategies . Tiling remains invisible to the applications (and, hence, in the queries), it is a tuning parameter for the administrator. Tiles of the same datacube can even sit on different machines.
The rasdaman server is multi-parallel per se, allowing any number of simultaneous parallel requests (inter-query parallelization) as well as parallel execution of incoming queries (intra-query parallelization), see (Dumitru, 2014).
Datacube operations are usually triggered via standardized web services, such as WCS, WMS and WCPS. However, these services are only the interface to the client performing the operation, while the processing itself is handled at a lower level which enables optimized, efficient execution. In rasdaman, the geo layer intercepting the web requests representing datacube operations translates them into array database queries, which are then handled by rasdaman's query processing engine.
The engine analyses each incoming query and compiles it into machine code. During that process, a series of optimization steps is applied, one of which is distribution.
Such federations are being done routinely meantime (Baumann, 2017

CONCLUSION
In summary, rasdaman allows federated processing of declarative queries whereby complete location transparency is given: any federation member can receive the query, and the federation will dynamically orchestrate each query individually for optimized processing.