A REVIEW OF BENCHMARKING IN PHOTOGRAMMETRY AND REMOTE SENSING

: This paper is a preface of the ISPRS/EuroSDR Workshop on the “Evaluation and Benchmarking of Sensors, Systems and Geospatial Data in Photogrammetry and Remote Sensing” that was held in Warsaw University of Technology on 16-17 September 2019. The paper reviews benchmarking in photogrammetry and remote sensing found in the literature relating to geodata. The first part of the paper is a bibliographic analysis based on queries in Scopus and Web of Science databases which shows an increase in research activities based on benchmarking data. In the second part, a review of past and ongoing benchmarking initiatives is presented, providing examples of initiatives within e.g. ISPRS and EuroSDR. The topic of evaluating data, sensors and algorithms with benchmarking activities is interesting as it provides the opportunity to compare, with a unique approach, research results from independent scientists. As hereafter reported, benchmarking has increased in recent years, with many benchmarks being presented in the photogrammetric and remote sensing communities.


INTRODUCTION
A benchmark can be broadly defined as a standard or point of reference against which entities may be compared.It is also understood as a test designed to evaluate or compare the performance of computer hardware or software.Nowadays, benchmarks are a useful tool for evaluating and measuring sensors and platforms or to assess the quality of geospatial data (geodata) and processing algorithms.
The origins of benchmarking can be found in process management.Benchmarking is the practice of comparing processes, i.e. business and performance metrics, to industry standards and best practice from other companies.Dimensions that can be typically measured include quality, time and cost.Benchmarking is used to measure performance using a specific indicator (for example, cost per unit of measure, productivity per unit of measure, cycle time per unit of measure, or defects per unit of measure), resulting in a metric of performance that is then compared to others (Fifer, 1989).
Benchmarking may be a one-off, single action but may also be treated as a continuous process.There is no single benchmarking process that has been universally adopted.However, the wide appeal and acceptance of benchmarking has led to the emergence of benchmarking methodologies (Boxwell Jr, 1994), with benchmarking methodology typically comprising 12 stages (Camp, 1989) Such approaches from process management, which primarily relate to management in commercial companies, may also be adopted in benchmarking in the geospatial industry.There is no doubt that scientific problems must be identified, and existing solutions should be explored with potential partners.After these initial steps, test data should be prepared and final partners (ideally via an open call for participation) should be found.It is an important requirement that participating organizations are leaders in selected areas to ensure that the planned benchmark is in accordance with up-to-date issues which are related to that subject.
Benchmarks in Photogrammetry and Remote Sensing are typically undertaken as an evaluation of: − a given sensor /or system of sensors; − geodata collected using a particular platform / sensor; − algorithms or methods for geodata processing.A benchmark can be assessed: − by many users (benchmark participants) with a proposed methodology -i.e. if geodata from the sensor is provided as a reference sample for processing in evaluation or comparison by benchmark participants; − by many users (benchmark participants) with a defined methodology -i.e. if the evaluation or comparison of algorithms is the aim of the benchmark; − by a benchmark proposer or defined algorithm and analysed by benchmark participants -i.e. if the sensor or data are evaluated by analysing different samples collected, with benchmark participants submitting them for evaluation , perhaps within an organised contest.
Test data in benchmarking research is usually prepared as a reference dataset provided as open data for participants.It may also be necessary to define how samples should be collected to guarantee objectivity of results obtained under similar conditions and experimental setups at (potentially many) participants' institutions.
As a tool in the hands of scientists, the concept of benchmarking can be widely understood, and it allows for the exchange of thoughts and theories, joint experiments and independent research, often leading to universal conclusions.This short review of the development of benchmarking in photogrammetry and remote sensing reveals the growth in popularity of benchmarks in the presented bibliographic analysis of published articles and conference papers.It helps with retrieving information concerning completed, ongoing and potential future scientific initiatives related to benchmarking.

ANALYSIS OF BIBLIOGRAPHY DATABASES
To gain an overview on benchmarking techniques, it is worth tracing how often the phrase "benchmark" appears in the bibliographic databases.There are many bibliographic databases, but only a few allow the discovery of such information through an advanced query.To illustrate the popularity of benchmarking, query results are presented based on Scopus and Web of Science databases.These databases contain the most significant journal and conference papers in photogrammetry and remote sensing (including the ISPRS Archives and Annals).
Elsevier's Scopus database (https://www.scopus.com/)permits an advanced search for articles by title, abstract and/or keywords.This search can therefore be combined.A combined query searching for the word "benchmark" and related words such as "benchmarking" or "benchmarks" in titles, abstracts or keywords returned 1353 papers (as accessed on 6 September 2019) related to, or mentioning, benchmarks (Figure 1).These results are connected to many subject areas (Figure 2) which include photogrammetry and/or remote sensing.Analysis of the results of this query show a growing trend when communication between scientists significantly increased in the era of Internet and digital bibliographic databases.The query discussed above allows the researcher to identify a lot of published work that refers to benchmarks, but does not directly relate to them.Therefore, a further search was performed regarding the occurrence of the word "benchmark" or its related form in a paper's title and reference to photogrammetry or remote sensing in any other field (keywords or abstract).The result showed 90 items which are included in the Appendix.The growth in the number of items directly related to benchmarks is clearly noticeable in the last 10 years (Figure 3).Searching for documents titled with the word "benchmark" and related to photogrammetry and remote sensing from the Web of Science retrieved 117 documents (accessed on 6 September 2019) (Figure 6).The Scopus and the Web of Science results are comparable and indicate that this method of scientific research activity has become increasingly popular in recent years.

OVERVIEW OF BENCHMARKING INITIATIVES
Organisations such as the International Society for Photogrammetry and Remote Sensing (ISPRS) and European Spatial Data Research (EuroSDR) have repeatedly supported the organisation of benchmarks.In the following, selected examples of completed (Section 3.1), ongoing and forthcoming (Section 3.2) benchmarks are presented.

Completed benchmark initiatives
Benchmarking activities are in the interest of many ISPRS working groups, with the following a summary of some of the most recent and high profile benchmarks related to sensors, algorithms and methods.

ISPRS Test Project on Urban Classification, 3D Building Reconstruction and Semantic Labeling
This benchmark commenced in 2011 and is still ongoing.The most important application of the tested photogrammetric data was object detection and 3D building reconstruction in urban areas.Participants in this project could choose of the following tasks: 1) Urban Object Detection: participants may choose to detect single object classes, or can try to extract several object classes simultaneously; 2) 3D Building Reconstruction: participants shall reconstruct detailed 3D roof structures in the test areas.
In both cases reference data were created by the organiser of the benchmarks.The participants submitted their results to the organisers of the tests: urban classification and 3D reconstruction, 2D and 3D Semantic Labeling.The results from participants were compared to the reference data and participants were informed about the results of the evaluation.Some outcomes of these benchmarks were published by Rottensteiner (2013)

ISPRS/EuroSDR Benchmark for Multi-Platform Photogrammetry
The aim of the project (Nex et al, 2015) was to assess the accuracy and reliability of calibration methods and image orientation from different platforms as well as their integration for image matching and dense point cloud generation.In current research, an important issue is that large changes in perspective and scale differences need to be tackled in image orientation and (dense) image matching and this is generally not approached systematically.By providing a new benchmark dataset consisting of state-of-the-art sensor data and covering different relevant tasks and scenarios, the current status of research was identified.Two test data sets of Dortmund (Germany) and Zurich (Switzerland) were released for this benchmark in 2014/2015.

ISPRS Benchmark on UAV Semantic Video Segmentation
This 2017 ISPRS Scientific Initiative aims to promote and advance the video segmentation task of VHR UAV sequences.With this project in 2017-2018, the authors wanted to pave the way for a unified framework towards meaningful quantification of semantic segmentation from UAV imagery and video.In this project, labelling image segmentation algorithms were developed.Eight classes (building, road, tree, low vegetation, moving car, static car, human and background) were detected based on datasets of 10 videos with 75,000 frames captured by DJI Phantom systems in Germany and China.In future, there are plans to develop a comprehensive benchmark creating the largest dataset of UAV scenes with high-quality annotations, a sound evaluation methodology for pixel-level semantic labelling, and a corresponding challenge associated with these.(Ying Yang & Yilmaz, 2018).

ISPRS Benchmark Challenge on Large Scale Classification of VHR Geospatial Data
Another 2017 ISPRS Scientific Initiative, the goal of this benchmark was the generation of a publicly available, largescale, VHR, multi-spectral dataset for training and the evaluation of sophisticated machine learning models.The authors were aiming at the provision of a complex and realistic benchmark in segmentation or VHR imagery in change detection.Worldview 2 satellite images of Toulouse were annotated into six land cover classes: 'impervious surface', 'building', 'pervious surface', 'high vegetation', 'cars', and 'water'.In the project implementation, a fully automatic evaluation tool for a comprehensive accuracy assessment of semantic instance segmentation and automatic evaluation tool for change detection were prepared.(Rosche, 2018).

Benchmark on Terrestrial Laser Scanning for Forestry Applications
This EuroSDR benchmark (Liang et al., 2018).aims to evaluate the quality, accuracy, and feasibility of automatic, semiautomatic or manual tree extraction methods based on highdensity TLS data.Provided test datasets (24 sample plots of size 32 m x 32 m) were collected in Finnish forests with terrestrial scanners.

Ongoing and Future Initiatives
Many benchmarking initiatives are still ongoing, and the following are examples that have been found or linked on the ISPRS webpage as open initiatives.There are many initiatives related to photogrammetry and remote sensing data processing that provide benchmark data for tests performed by other scientists.For example, the KITTI benchmark (Geiger, et al., 2012) and the "MLS 1 -TUM City Campus" benchmark (Gehrung et al., 2017) deliver many types of data from mobile mapping platforms.Some datasets collected on-board a Micro Aerial Vehicle can be found in the EuRoC benchmark (Burri et al, 2016).The Large-Scale Point Cloud Classification Benchmark (Hackel et al. 2017) provided a large labelled 3D point cloud data set of natural scenes with over 4 billion points of diverse urban scenes.The goal of the initiative was to help data-demanding methods like deep neural nets unleash their full power and to learn richer 3D representations.The aim of the "3D sensors and systems for metrology and industrial vision" benchmark, is to provide free samples of 3D sensor data suitable for industrial vision and close range 3D measurements.Moreover, the call for participants' "data for corridor mapping" was also announced.All of these benchmark datasets are still available despite the fact that they were initiated several years ago.In this section, a number of further initiatives are introduced in more detail.

Benchmark on Indoor Modelling
This ISPRS Benchmark on Indoor Modelling aims to deal with the fact that while results on various 3D indoor models from point cloud data have been reported in literature, a comparison of the performance of different methods has not been possible.Spatial models of indoor environments are needed in a growing number of applications including navigation, emergency response and a range of location-based services.Test data in this benchmark comprises five point clouds captured by different sensors in indoor environments of various complexities which should be processed to the model used for the purpose of indoor navigation (Khoshelham et al., 2017).

The FINE Benchmark
The Fisheye Indoor Narrow Spaces Evaluation (FINE) benchmark (http://www.3d-arch.org/FINE-benchmark.pdf) provides fisheye images and terrestrial laser scanning point clouds of complex environments.The aim of the project is to evaluate the performance of different image-based processing methods when surveying complex spaces.The main questions that the benchmark poses concern the potential of image-based fisheye techniques for 3D reconstruction of indoor narrow spaces and whether these can be considered as valid low-cost alternatives to static or mobile/hand-held laser scanning instruments.

The HyRANK Hyperspectral Dataset and Benchmark Framework
The HyRANK Dataset and Benchmark is another ISPRS scientific initiative.The main objective is to fill the current gap regarding the limited availability of hyperspectral datasets and benchmarking frameworks for validating new classification methods against the state-of-the-art.The testing dataset contains two hyperspectral images from the Hyperion sensor (EO-1, USGS) and the validation is a set of three images.The purpose for the benchmark participant is to use their classification algorithms to produce a land cover map in accordance to nomenclature that follows the CORINE Land Cover principles (14 classes) (Karantzalos et al., 2019).

GeoBIM benchmark 2019
This ISPRS and EuroSDR benchmark started in 2019.The aim is to investigate the available technical solutions to support research and activities related to the GeoBIM topic (integration and interoperability of data concerning 3D geoinformation and 3D building information models) (Noardo et al., 2019).This is the first project that will provide insight into the current state-of-theart of the open standards implementation in the 3D geo and BIM domain, and also identify remaining issues.It will compare the ability of existing software tools to use and process CityGML and IFC models and understand their performance.In this benchmark, three aspects are investigated in four tasks related to the types of data mentioned above, their georeferencing and conversion between different types of geodata formats.

3DOMcity benchmark
A novel multi-purpose benchmark for assessing the performance of the entire image-based pipeline for 3D urban reconstruction and 3D data classification is presented during the Workshop on "Evaluation and Benchmarking of Sensors, Systems and Geospatial Data in Photogrammetry and Remote Sensing" (Özdemir et al., 2019).This benchmark called "3DOMcity" is photogrammetric contest and is publicly available at https://3dom.fbk.eu/3domcity-benchmark.It covers the entire 3D reconstruction pipeline, including image orientation, dense image matching, point cloud classification and 3D building reconstruction.The performance assessment is performed within a metrological context.Provided datasets offer 2D and 3D data collected at a very high spatial resolution.

ISPRS/EuroSDR Single-Photon LiDAR benchmark
Single Photon and Geiger-Mode LiDAR (SPL/GML) are relatively new types of airborne laser scanning sensors, and are technologies that can potentially provide very accurate, highresolution 3D mapping in a more efficient way with respect to conventional airborne laser scanning.However, such LiDAR sensors can cause problems with noise and lower accuracy than linear full-waveform systems.Many tests already conducted by national mapping and cadastral agencies cannot comprehensively answer the question regarding a potential change in production pipelines and upgrading existing airborne sensors due to different times of acquisition, different land cover and various scanning parameters.Bernard et al. (2019) describes the activities that have so far been conducted as part of a EuroSDR initiative, including a global on-line questionnaire to understand the interest in SPL/GML technologies.

SUMMARY AND CONCLUSION
The research into the creation of benchmarks described in this paper is not an exhaustive list of all such initiatives.The wide variety of sensors and systems available on the market for collecting geospatial data renders the evaluation of derived information, calibration of sensors and benchmarking of systems a critical task.It is also an important scientific issue for many professionals.In daily work, the assessment of tools for collecting geospatial data resources is a crucial issue for all professionals handling such data.Although benchmarks can be understood differently, starting from preparing datasets for other scientists for evaluation of their methods to developed programmes of research, benchmarking can help in complex investigations and can build scientific networks, thereby bringing researchers together to conduct joint research projects.This paper was prepared to introduce the background to a joint ISPRS/EuroSDR Workshop on Evaluation and Benchmarking of Sensors, Systems and Geospatial Data in Photogrammetry and Remote Sensing, held in Warsaw, Poland in September 2019.The benchmarking workshop provides a good opportunity for attendees who want to extend their knowledge in the fields of photogrammetry and remote sensing.During the planned workshop, research on evaluation sensors and systems, and assessment of data quality will be presented in selected examples of applications.In this event, some benchmarking proposals will be also announced and presented for release to interested researchers from the photogrammetric and remote sensing community.The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-1/W2, 2019 Evaluation and Benchmarking Sensors, Systems and Geospatial Data in Photogrammetry and Remote Sensing, 16-17 Sept. 2019, Warsaw, Poland : − Select subject; − Define the process; − Identify potential partners; − Identify data sources; − Collect data and select all partners; − Determine the gap; − Establish process differences; − Target future performance; − Communicate; − Adjust goal; − Implement; − Review and recalibrate.

Figure 2 .
Figure 2. Subject areas of results for query: TITLE-ABS-KEY ("benchmark*" AND ("photogrammetry" OR "remote sensing") ) in Scopus database.TheWeb of Science database (https://clarivate.com/webofsciencegroup/solutions/web-ofscience/)was originally produced by the Institute for Scientific Information (ISI) and is currently maintained by Clarivate Analytics.Search tools include the possibilities to produce search results comparable to the first query performed with Scopus.The results of searching for papers with the phrase "benchmark" or related forms in photogrammetry and remote sensing in all fields returned 1699 results (accessed on 6 September 2019) (Figure4), and dates of publication confirm an increase in number over the last decade.Division of the results into the Web of Science categories (Figure5) shows that this research technique is popular in many groups from various disciplines related to remote sensing and photogrammetry.

Figure 4 .
Figure 4. Number of results for query: All = (benchmark* and (photogrammetry or remote sensing)) in the Web of Science database.

Figure 5 .
Figure 5. Disciplines of results for query: All = (benchmark* and (photogrammetry or remote sensing)) in the Web of Science database.

Figure 6 .
Figure 6.The results of query: TI=benchmark* AND all = (photogrammetry or remote sensing) in Web of Science database.
and the initiative is still ongoing.Reference datasets were published in 2018.More information about this benchmarks on: http://www2.isprs.org/commissions/comm3/wg4/tests.html3.1.2Joint ISPRS/EuroSDR Benchmark on High Density Aerial Image Matching The project (Haala, 2014) aim to evaluate the potential of photogrammetric 3D data capture, with reference to ongoing developments in software for automatic image matching.The basic scope of this benchmark was the evaluation of 3D point clouds and digital surface models (DSMs) generated from aerial images in different software systems.The benchmark considered state-wide-generation of high quality DSMs and the applicability of open and commercial tools.Three aerial image blocks were provided as test data.
Can semantic labeling methods generalize to any city?the inria aerial image labeling benchmark 2017 International Geoscience and Remote Sensing Symposium (IGARSS) Barelli L., Paolini P., Forti G.The XII century towers, a benchmark of the Rome countryside almost cancelled: The safeguard plan by low cost UAV and terrestrial DSM photogrammetry surveying and 3d web GIS applications 2017 International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences Karmas A., Karantzalos K. Benchmarking server-side software modules for handling and processing remote sensing data through Rasdaman 2017 Workshop on Hyperspectral Image and Signal Processing, Evolution in Remote Sensing