THE FULLY AUTOMATIC OPTICAL PROCESSING SYSTEM CATENA AT DLR

: Here we present the operational, fully automatic processing system CATENA developed at DLR. The uniform pre-processing of an increasing amount of satellite data for generation of whole coverages of e.g. Europe for one time or of time-series for one location covering many years is requested more and more. Such requirements contain the processing of huge amounts of data which can hardly be handled manually. So a fully automatic pre-processing environment was developed at the Remote Sensing Technology Institute of DLR in Oberpfaffenhofen since 2006. This processing environment named CATENA was designed for uniform, automatic general purpose processing of huge amounts of optical satellite data of similar type. In this paper we present the concept of the processing system, the framework and the decomposition of processing requirements to processing modules and processing chains. We give some examples for already implemented general purpose or project speciﬁc processing chains and an analysis of performance and quality of the results.


INTRODUCTION
During the last years the volume of satellite data has increased rapidly.Beyond the growing number and sizes of actual acquisitions also the need for repeatedly reprocessing of whole timeseries including historic data is more and more requested.
To cope with these huge amounts of data which needed to be (pre-)processed classical methods -mostly using manual interaction like the measurement of ground control points or supervising classifications -are no longer applicable.
Examples for such types of processing were e.g. the ESA GMES Fast Track Land Services Image2006, -2009and -2012 series of two yearly coverages for 38 countries in Europe.Here the preprocessing consisting of a highly accurate orthorectification of about 3500 IRS-P6-LISS3 and SPOT 4/5 scenes with accuracies below 1 GSD (ground sampling distance) together with manual quality assurance of the images and delivery in one European and all national projections was requested in half a year.Such huge amounts of data could no more be handled manually and so a fully automatic pre-processing environment was developed at the Remote Sensing Institurte of DLR in Oberpfaffenhofen.This processing environment named CATENA was designed for uniform, automatic general purpose processing of huge amounts of optical satellite data of similar type.In this paper we present this operational, fully automatic processing system.The system was certified following the ISO9001:2008 standard in 2012.
CATENA is based on the modular image analysis system XDibias which is developed and still extended and improved at DLR since the late 1970s for scientific processing of satellite imagery.In this context e.g.already high precision image correlation and ortho rectification software was developed for integration.CATENA is build up as a multi-purpose multi-chain system allowing the individual definition of processing chains for many purposes or even individual projects.All these processing chains are composed from modules which in turn wrap the scientific software carrying out one specific task.These modules use one standard data exchange format and a standardized process flow defined for ESA processors (ESA, 2007).For converting proprietary Level 1 satellite data together with all needed metadata to the standard exchange format used by the modules a broad amount of import modules exist already for e.g.Landsat, SPOT, IRS-P6 LISS3/AWiFS, Cartosat, ALOS AVNIR/Prism, RapidEye, Formosat, Kompsat, SPOT Vegetation, ATSR2, AATSR, Meris, Modis, Ikonos, GeoEye, QuickBird, WorldView, ZY-3, Pleiades . . .The processor system uses for exporting the results the GDAL (Geospatial Data Abstraction Library) system to create output in many established geocoded image formats (e.g.GeoTIFF, ENVI, Erdas Imagine, PCI, JPEG2000, HDF, netCDF, . . .).
The processing system is divided in three parts: The modules executing one task on a bunch of images, the chains linking the modules together by defining the process flow and the framework system controlling and distributing the chain execution among the available processing nodes.The framework exists at DLR in two flavors: one stand alone version and one version integrated in the operational ground segment system DIMS (Data-and Information Management System) of DLR.The task of the framework is to know about the orders to be performed (which data should be processed with which chain and what special parameters), to distribute the work on available systems (grid computing), to provide the requested data, execute the requested processing chain, collect the results and handle errors.
Beneath the fully automatic distributed processing of mass satellite data the system also supports projects by allowing optional manual quality checks, extracting detailled statistics and the possibility to configure the system for automatic polling of data and uploading of results.
In this paper we present the actual state of the processing system CATENA together with the architecture of the system and results and quality figures for different application areas of the system.So already processing chains for the automatic sensor model correction and orthorectification, for atmospheric correction, for impervious surface extraction, for time series pre-processing, for derivation of digital surface models and much more exist.In this paper we will describe the system in detail using the examples of the orthorectification chain and the chain for extracting digital surface models.For these chains also results and quality figures are shown.

Preliminary work
In the frame of the orthorectification of two coverages of about 3500 SPOT-4/5 and IRS-P6 LISS-III scenes for ESA's "GMES fast track land service 2006-2008" a system for automatic orthorectification of such imagery was developed at DLR (Müller et al., 2008).This system implemented already a first kind of an distributed automatic processing chain.In Müller et al. (2012) the method for the automatic orthorectification used in CATENA is explained in detail.
The processing system CATENA is based on the DLR developed image processing system XDibias (Triendl et al., 1982).

SYSTEM ARCHITECTURE
As shown in fig. 1 CATENA consists of three main parts which will be explained subsequently: Figure 1: CATENA system elements

The Modules
The heart of the CATENA processing system are the Modules.These Modules encapsulate the organisational or scientific algorithms named Tools (cf.figs. 2 and 3).A module is usually a simple UNIX-shell-script wrapping the Tool by converting input images, the corresponding metadata or other input data to the format needed by the Tool, calling the Tool and converting the Tool's results back into the CATENA default format.Beneath this logging information, processing results, quality parameters etc. are returned.
Since all Modules use the same standardized image, metadata and data formats on their input and output interfaces all modules may be combined easily to complex processing chains.The standard format used by the modules is the XDibias image and metadata format extended by additionally needed metadata in processing chains.
There exist beneath scientific modules like cloud detection, image matching, orthorectification or atmospheric correction also so called organisational modules.Such modules include the import of the proprietory satellite data to the standard format, the extraction of reference data or digital elevation models (DEMs) or the export of the results to requested output formats.
Each module works on any number of input images.The inputs are defined in the Chain-Definition (cf.e.g.tab. 1) as image The module executes the wrapped Tool for each of the inputs and generates one or more outputs following the same numbering scheme.

Chain Definitions
A Chain Definition is realized by an simple text configuration file (cf.e.g.tab.1).It contains the chain description in form of a list defining how and in which order modules are executed with which data.It may contain additionally chain specific default values for the processing parameters.

Framework
The   The Task-Execution executes the task by running the modules defined in the requested chain for the given input-data locally on one assigned processing node.

CATENA at work
Fig. 5 shows how CATENA works.An operator ingests a new task into the Task-Database (he/she tells the system which data should be processed with which processing-chain).All the central parts of the system reside on one system -here called the Central Control Node.The processing itself is done on the Processing Nodes.Adding a new Processing Node is as simple as possible due to the distributed grid computing approach: Simply set up the machine (for running CATENA Linux CentOS 5.x, GDAL and XDibias are required) and create a crontab entry calling the Task-Scheduling.The Task-Scheduling detects if it is run for the first time on a new Processing Node and adds the Node automatically to the list of known nodes in the Task Database and fetches the first tasks.
In fact the Task-Scheduling is much more complex in selecting the next task to be executed on the Processing Node.Since if a task stops for an error the data still resides in the local workspace on the Processing Node.If the error is solved (normally by the operator), the task is scheduled for continuing in processing on this node.Such tasks will be run prior to new tasks to clean up used workspace as early as possible.Also each task may be provided with a priority.So tasks with higher priority will be fetched prior to tasks with lower priority.And last but not least each Processing Node can define a "daytime" during which the workload should be reduced and also a manually selection and deselection of active Processing Nodes is possible via the Local Control.Afterwards an optional manual quality control is possible right inside the processing chain to check for many parameters like the quality of the reference and DEM, the cloud cover of the image and -most important -the quality of the distributon of the GCPs.If this quality control fails, the system gives the possibility for the operator to measure GCPs manually and use these manually measured points in restarting the process.
International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-1/W1, ISPRS Hannover Workshop 2013, 21 -24 May 2013, Hannover, Germany The detailled Chain-Description is shown in tab. 1.A Chain-Description is a list of sequentially executed modules together with their inputs and outputs.A module is executed if all of the required inputs exist and creates the listed outputs.So the Chain-Definition can also implement branches in a simple manner.Here is a short description of the tasks of the modules implementing the ortho processing chain shown in fig.6: • cmimport: import of images from path given in task; writes imported as images A. Depending on the sensor models there exist three branches in the chain: • Orbit and attitude data (a file A/ nav for the exterior orientation exists in the input image A) using modules cmcorrectnav and cmortho (e.g.used for SPOT imagery) • RPCs (Rational Polynomial Coefficients, a file A/ rpc exists in input image A) using modules cmcorrectrpc and cmorthorpc (for most satellite imagery like IRS-P6, RapidEye, Ikonos, GeoEye, WorldView, Pleiades, . . . ) • Geolayer (a file A/ geolayer of the same size as the input exists containing latitude and longitude for each pixel in the original image) using modules cmcorrectgeolayer and cmorthofromgeo (mostly for low resolution imagery like AT-SR2, AATSR, Meris or also for Landsat imagery) • Finally if no sensor model exists using module cmaffin2ref only based on the found GCPs The chain contains additionally as last (optional) step the atmospheric correction using ATCOR (Richter, 1990).This is performed by first extracting a watermask and a cloudmask from the ortho image, second reprojecting the reference DEM onto the ortho image (cmdemortho) and finally calling the module cmatcor to create the atmospheric corrected image.
Modules without any input conditions (like cmimport, cmchecklist, cmexport or cmdeliver) will be executed always.So for example the module cmchecklist will always be executed but depending on a processing parameter set it will decide if the user or project requires a stop of the chain until a manual quality check is performed and the checklist is filled correctly.
If a physical sensor model exists (as for most of the high resolution satellites) the geographic accuracy of the Ortho Chain is about 0.5 GSD of the minimum GSD (ground sampling distance) of the satellite image or the reference database used as shown in Müller et al. (2012).Most of the lower resolution sensors provide only a geolayer and not a sensor model.In this case the geocorrection is in the range of about 1 GSD (e.g. for AATSR 1 km, Günther et al. (2012)).
International  This chain uses the CATENA grid computing facility again inside the chain: After doing the bundle adjustment between all input scenes the bundle-block is tiled and for each tile the parts of the covered images are input data of new CATENA tasks using the chain "stereo" to calculate the DSM of this tile.These stereo-subtasks are ingested into the task database and the main multistereo-chain has to wait for completion of all these subtasks.These tasks are now processed in parallel on all available processing nodes to exploit the full capacity of the CATENA grid computing and speed up computation enormously.Afterwards all results are collected and merged into the resulting DSM and a full ortho image is generated using the corrected inputs and this DSM (see fig. 8).The DSMs created from the DSM Generation Chain span many sensors like Cartosat, ALOS-Prism, SPOT-HRS, Ikonos, Quick-Bird, Worldview 1/2, the Pleiades or the new chinese ZY-3.Best results are gained if the input images are acquired in the same orbit so no big illumination or vegetation changes occur.Also multi-temporal data may be processed -with good results in arid regions.For a detailled quality analysis of the resulting DSMs from the CATENA multistereo chain see dAngelo and Reinartz (2011).Fig. 9 shows a short overview comparing DSMs generated from a four-image-in-orbit image set of WorldView-2 over Munich.Comparing a DSM generated from the two images with a convergence angle of 24 • to those generated from the two images with a convergence angle of 12 • shows how the occluded areas 1 in the DSMs get reduced.Taking three or four images of the set reduce these areas even more.
Figure 9: Comparison of DSMs generated in the area of the Munich main station from WorldView-2 images using two images with 24 • and 12 • stereo-convergence-angle or three or all four images in parallel for the DSM generation.Using narrower convergence angles or more scenes reduce the occlusions (areas not visible in two or more images, shown in blue) drastically.
The experience from the many automatically processed DSMs via the CATENA multistereo chain shows that best results with regard to cost-value ratio can be generated using a stereo triplei.e. three images acquired in one orbit with acquisition angles of about 10 • . . .20 • (forward), 0 • (nadir) and −10 • . . .−20 • (backward).

PROCESSING CHAINS
In the meantime a whole range of processing chains is defined and used in CATENA.Processing chains can be general purpose processing chains for any kind of input and standard output or specialized processing chains for specific issues or projects.These may define in the Chain-Definition already specific processing parameters like special reference or DEM databases or output files and formats.The main generic processing chains are listed in tab. 2 and some of the project specific processing chains are listed in tab. 3.     The best application for fully automatic processing chains are the processing of mass data and time series using all the same input data and the same processing scheme.Other caveats concern the reference data used for processing.It should be obvious that a reference DEM should be available for orthoprocessing, but e.g. in areas above 60 • north no SRTM data is available and an error will be dropped by the automatic processing chain if the standard SRTM DEM database is used.
But most of the errors encountered in the last years concern the image matching of the ortho chain.The quality requirements for the matching in the automatic processing chain are very tight.So scenes containing many clouds, low contrast or only small islands in large water areas may regularly fail in this step.One solution may be to loosen this requirements automatically -the chain will start always with tight figures and loose them step by step.But we recommend always an operator in charge for the processing to have a look on the scenes and adapt the parameters individually depending on the scenes.Experiences show that standard satellite scenes with low cloud coverages process in over 75 % of all cases fully automatically, the remaining will drop an error.
The errors most often observerd are listet in tab. 4. As can be seen most of the errors are reason to wrong usage of the chain (wrong reference databases, wrong paths provided by the user at taskingestion) or system limits (missing licenses, no space left on devices) and bad imagery (mostly too cloudy or too much water in the scene).One word concerning the "missing licenses" problem: Actually we reject any scientific software ("Tool") requiring a software license like IDL, Matlab or similar software most frequently used by scientists.In an operational processing environment such software require in worst cases as much licenses as available processing nodes exist in the environment.In our case this may be 50 IDL licenses in parallel for executing ATCOR.If the license is not available the module drops an error and a cron-job checks automatically for such "missing license" errors and restarts such tasks in a regular manner until they get a free license.In processing mass data such a bottleneck can not be acceptable.

CONCLUSION AND OUTLOOK
In this paper we presented the architecture, typical processing chains and some representative results and experiences of the operational, fully automatic processing system CATENA developed at DLR.The multipurpose processing system CATENA allows the definition of any kind of processing chains composed from so called modules.These wrap scientific software or organizational tasks within standardized interfaces to allow easy combination into processing chains.Typical processing chains implemented are e.g. the orthorectification of satellite imagery or the generation of digital surface models (DSMs).These chains rely on the import of standardized data.Up to now over 25 sensors are supported ranging from low resolution EnviSat's Meris and AATSR over medium resolution like ALOS, IRS-P6, SPOT to very high resolution sensors like Ikonos, WorldView or the Pleiades.
A main advantage of the system is the integrated grid computing allowing the easy usage of any workstation configured for CATENA for processing instead of relying on a dedicated serverfarm and such reducing running costs effectively.
The experiences of about 270.000 processed scenes show that such a fully automatically processing system is best used with large amounts of input-data of the same type like the processing of time-series from the same sensor or continental coverages of only a few sensors.The system needs still experienced operators to handle upcoming errors.Up to now over 20 processing chains are defined including 7 project specific and 7 internal chains.The CATENA system is used at the DLR EOC (Earth Oberservation Center) for many projects and applications.

Figure 2 :
Figure 2: CATENA Definitions names like A, R or D. Each module iterates for a given input image A over all existing images A.1, A.2, A.3, . . ., A.n.If more than one input is requested in the Chain-Definition these inputs are iterated in parallel (A.1 and D.1, A.2 and D.2, . . .).The module executes the wrapped Tool for each of the inputs and generates one or more outputs following the same numbering scheme.
so called Framework is a collection of software consisting of the Task-Scheduling, the Task-Execution and the Local Control.The CATENA-framework exists in two flavors: the standalone-version and the DIMS-version (for inclusion in the Dataand Information Management System of DLR).Both versions use the same modules and chains but an other framework.In the stand-alone-version the framework consists of an MySQLdatabase containing the task list and several operating scripts and cron-jobs doing the Task-Scheduling and the Task-Execution (see fig. 3).In the DIMS version these database and the operating scripts are part of DIMS and not explained in this paper.

Figure 3 :
Figure 3: CATENA Architecture As first component of the Framework the Local Control ("Central Control Part" in fig.3) is the interface for controlling the Task-Scheduling and the Task-Execution by an operator.It consists mainly of an database containing a list of "tasks" and an webinterface for managing these tasks (see figs. 4 and 10).Each task is identified by the location of the input-data to be processed, the requested processing chain (how to process the input-data) and optional specific processing parameters.Each task is created by a process called ingestion.The Task-Scheduling of CATENA implements a sophisticated decentral grid-computing (see 2.4).It is responsible for the assignment of tasks to different processing nodes.For details see fig.5.

Figure 4 :
Figure 4: Local Control as CATENA web-interface showing all finished, processing and unfinished tasks in the database.On the left side the involved hosts are colorfully shown (green: waiting for new tasks, red: no node available, orange: all available nodes on host are processing, yellowish green: nodes are processing on this host, but there are still free nodes)

Figure 5 :
Figure 5: CATENA at work For this on each of the Processing Nodes the Task-Scheduling queries periodically the Task-Database for new tasks to be done.If a new task exists it's marked as running in the database and the Task-Execution is started by the Task-Scheduling on this Processing Node.The Task-Execution in turn creates the workspace locally on the Processing Node, fetches the requested processing chain from the Central Control Node and executes step by step each Module listed in the processing chain.After sucessful execution of the whole chain and delivery of the results the local workspace is freed again.Normally each Processing Node is represented by one core of a workstation or server.So many Processing Nodes may be run on one machine.

Figure 6 :
Figure 6: CATENA ortho chain Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-1/W1, ISPRS Hannover Workshop 2013, 21 -24 May 2013, Hannover, Germany 4 THE CATENA DSM GENERATION CHAIN "MULTISTEREO" 4.1 Architecture of the DSM Generation ChainThe processing chain "multistereo" generates digital surface models (DSMs) from two or more overlapping input images.The Chain is shown in fig.7.Using this chain about 150 very-highresolution DSMs (WorldView and comparable sensors) were already processed.These DSMs resulted from about 3500 calls of the "stereo" subchain for DSM generation of sub-tiles (approx.23 tiles per scene).

Figure 8 :
Figure 8: Multistereo processing chain: Input are two or more images of the same area, output is the DSM and an ortho image 4.2 Results of the DSM Generation Chain

6
RESULTS AND EXPERIENCESProcessing of about 270.000 scenes from 2007 to 2012 with the fully automatic processing system CATENA led us to following experiences: A fully automatic processing system depends much on standardized input and good reference data.So the development of a new import module for a new sensor may be finished 1 Occlusions in DSMs are areas which are only seen in one of the images and so no height by image intersection can be calculated International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XL-1/W1, ISPRS Hannover Workshop 2013, 21 -24 May 2013, Hannover, Germany in one hour if the sensor provides good commented metadata and a standard sensor model like RPCs.But also this task may need three or more months if there are missing or contradicting file descriptions or strange metadata formats.The chain itself depends absolutely on this standardized input.In emergency events like activations of the International Charter on Space and Major Disasters (International Charter on Space and Major Disasters, 2013) often the delivered data does not comply with the standard data of the providers (channels or metadata missing or wrong).In such cases the automatic processing fails continuously despite of many already known and catched exceptions.

Figure 10 :
Figure 10: CATENA web-interface showing a map of actual IRS-P6 AWiFS and Liss3 scenes, green: finished, blue: unfinished, green X: working, red X: error

Table 1 :
Ortho processing Chain-Description; inputs/outputs "A. . ./ aux" represent XDibias images, all other inputs/outputs represent other files; modules are only executed if one of the defined outputs is missing and all of the the required inputs exits

Table 2 :
General purpose processing chains in CATENA

Table 3 :
Project specific processing chains in CATENA

Table 4 :
Most frequent errors in automatic processing and their most likely reasons (more than one error may occur in processing a scene)