TIME-RELATED QUALITY DIMENSIONS OF URBAN REMOTELY SENSED BIG DATA

Our rapidly changing world requires new sources of image based information. The quickly changing urban areas, the maintenance and management of smart cities cannot only rely on traditional techniques based on remotely sensed data, but also new and progressive techniques must be involved. Among these technologies the volunteer based solutions are getting higher importance, like crowdsourced image evaluations, mapping by satellite based positioning techniques or even observations done by unskilled people. Location based intelligence has become an everyday practice of our life. It is quite enough to mention the weather forecast and traffic monitoring applications, where everybody can act as an observer and acquired data – despite their heterogeneity in quality – provide great value. Such value intuitively increases when data are of better quality. In the age of visualization, real-time imaging, big data and crowdsourced spatial data have revolutionary transformed our general applications. Most important factors of location based decisions are the time-related quality parameters of the used data. In this paper several time-related data quality dimensions and terms are defined. The paper analyses the time sensitive data characteristics of image-based crowd-sourced big data, presents quality challenges and perspectives of the users. The data quality analyses focus not only on the dimensions, but are also extended to quality related elements, metrics. The paper discusses the connection of data acquisition and processing techniques, considering even the big data aspects. The paper contains not only theoretical sections, strong practice-oriented examples on detecting quality problems are also covered. Some illustrative examples are the OpenStreetMap (OSM), where the development of urbanization and the increasing process of involving volunteers can be studied. This framework is continuing the previous activities of the Remote Sensing Data Quality Working Group (ICWGIII/IVb) of the ISPRS in the topic focusing on the temporal variety of our urban environment.


INTRODUCTION
Traditional means of data acquisition is usually carried out by remote sensing (RS) industry, government agencies such as national mapping agencies, surveying industry.Their data acquisition methods are usually well documented, and data quality information is provided together with the data.The new technology of crowd-sourcing has opened a new wide area in spatial data acquisition.In contrast, these methods have usually less documented means of acquisition.They carry more uncertainty in quality measures.The trust of their sources is much lower compared to the above mentioned traditional techniques.
Still they carry a vast potential that traditional data sources do not.
The rapid development of urban environment requires the tracking of fast changes in data acquisition.Crowd-sourced remotely sensed data may comply to this request by enabling the mapping of the rapidly changing environment, while traditional surveying techniques in many cases take up too much time to work effectively.Big data is nowadays a rapidly growing area of data processing.It is characterized by the 4V-laws: big data has extreme Volume (very much data), Velocity (it is captured very quickly), Variety (big data has very different types and nature), Veracity (data quality varies greatly).In remote sensing and * Corresponding author geographic information systems there are a lot of areas where big data and related analysis techniques can be involved, moreover this combination has advantages in comparison to the traditional methods.Land cover and land use mapping is an example of such an area, especially focusing on traffic data acquisition.Google traffic information (GoogleTraffic, 2018) is maybe the most known example, but also transportation networks and the corresponding base maps have been created by crowd-sourced big data collection and analysis techniques in the OpenStreetMap project.(OpenStreetMap, 2018)

URBAN REMOTELY SENSED DATA
There is a strong interrelationship between quality measures and the types of data sources.Data sources selection and collection have strong influence on the remote sensing data quality (RSDQ) dimensions to be used in the process.In order to contribute to this issue, in this paper we focus on big data sources in the domain of remote sensing.The area of crowd-sourcing internet technology has opened new perspectives to remotely sensed information collection and processing.Traditional methods of data acquisition have been extended with innovative means based on non-expert spatial data gathering.In the OpenStreetMap initiative, crowd-sourcing of spatial databases is based on aerial and satellite based optical data to obtain geographic data.Source of the spatial information is similar to the one of institutional data production chain.
However, the method adopted to derive geographic information from remotely sensed data is taking a different direction compared to traditional means.Deriving spatial data is on a voluntary basis and performed often by non-expert analysts with lower control on data quality during the production phase.Scarce information is available on quality measures such as trust of sources or consistency of data.Still its great value lies in fast renewable, open accessible nature of the data.
Fig. 1 proves that remote sensing data processing and crowdsourcing can be integrated smoothly.The area illustrated lies in Budapest, near to the campus of Budapest University of Technology and Economics (BME).The road axes are acquired with GPS and similar satellite based measurements, but the area around (buildings, places, etc.) has been obtained using remotely sensed imagery.
Closed courtyards of 4-5 floor buildings along the main roads cannot be mapped by other techniques, but by aerial or satellite image interpretation.
Another promising crowd-sourced big data capturing process is exploited by the assisted and autonomous vehicle technologies (Toth et al, 2018).There is already a pioneer approachcalled self-healing mapping technologyto collect environmental data by these special vehicles.Captured data set is transferred into the cloud, and after sophisticated processing they are fed back into the map database (Here, 2018).

Terminology
Prior to the discussion of the quality dimensions, some relevant definitions must be given.
 Time: "The indefinite continued progress of existence and events in the past, present, and future regarded as a whole."(https://en.oxforddictionaries.com/definition/time)Time is a fundamental scalar quantity, what a clock reads.(Considine, 1985) and "one-dimensional subspace of spacetime, which is locally orthogonal to space" (IEC, 2011)  Time scale: "system of ordered marks which can be attributed to instants on the time axis, one instant being chosen as the origin" (IEC, 2011)  Time axis: "mathematical representation of the succession in time of instantaneous events along a unique axis

Time-related quality dimensions and metrics
Remote sensing data quality has been described for traditional data sources in our previous papers (Albrecht, 2018, Barsi, 2018) As to information technology (Batini, 2016) 1. Time-related data quality dimensions and their metrics Remote sensing uses two common terms to describe environmental phenomena.The first is instant while an event occurs and date and time is assigned (e.g. a landslide in the Alps occurred at 8:00 on 10 June 2013).An instant has accuracy, resolution and precision as quality dimensions.The second is duration, i.e. how long does an observed event last (e.g.floods occurred in Budapest on the Danube occurred from 10 June to 18 June 2013).Duration can be characterized by accuracy and size (which means the length of the phenomena).Beyond these two terms, repetitive observations are described by frequency, having accuracy and stability as quality dimensions.The latter means how stable/constant is the measurement rate during the data acquisition procedure.Such repeating observations must also be featured by a measure how representative it is, i.e. whether the captured data are suitable to describe the monitored event.Such frequency type measure results in the comparison to the Nyquist rate (which is a sampling rate resulting alias-free signals) (Nyquist-rate, 2018).

Example 1: Traffic information
The OpenStreetMap (OSM) is one of the most commonly known crowd-sourced databases.The project has started in August 2004.This collaborative mapping aimed to involve skilled professionals and non-professionals to create a map database covering the built-up and rural environment with the main features like waters, roads, land use, buildings and many others.At the starting of the project low-budget portable satellite navigation systems have been used; since 2006, thanks to Yahoo's collaboration in the project, even aerial photographs can also be interpreted for data capturing.In 2010 Bing also allowed to use satellite imagery in map making.In 2012 Google Maps led several prominent websites, like Foursquare or Craigslist to switch from their service to OpenStreetMap.(OpenStreetMap, 2018) The basic statistics about OSM is the following as of 12 June 2018: The growth of this giant database shows an interesting path.Raw data are available in tagged XML format or after some conversions and layer creation in shape format.Take the example of the Hungarian OSM development in the last decade.Thanks to the German provider Geofabrik, the database has been yearly downloaded since 2010.The vector data in shape format representation has 18 layers and 91 files containing all the relevant themes such as waters, railways, roads, POIs etc. Fig. 2 shows the sizes of the shape files together with the yearly increase rates, the amounts are in MB.The dotted lines show the linear trends in growth.While the total information storage was 17.7 MB in 2010, the size of the database has increased by about 44 times, being 785.6 MB this year.
Figure 2. The total size of shape file and its yearly increase rates from 2010 to 2018.Blue represents the database size, orange the yearly increase The most rapid growth in the map database content can be experienced at the roads layer.Fig. 3 shows the growth of the amount of points and polylines.5).The minimum mapping unit is 400 m 2 .The nationwide land cover/land use database contains a huge amount of data, which is produced based on standard mapping procedure using high resolution remote sensing images.The classification accuracy of GCM project is higher than 95%.
In order to ensure the accuracy of the land cover/land use maps, strict quality control is required during the project implementation process.The process includes (1) data preparation control, i.e. data source quality, device configuration, and personnel qualification; (2) producing quality control, i.e. quality control of image preprocessing and information extraction; (3) quality inspection at two levels, i.e. first level is the inspection by image operation department, and second level is by quality supervision and inspection department.
The list of quality controls includes resolution, date, and mathematical basis of remote sensing images; map projection; data format; attribute table; topology; edge accuracy; classification accuracy, and so on.

CONCLUSION
This paper has opened new perspectives to remote sensing data quality management.The traditional means of RS data acquisition nowadays is extended by new methods of crowdsourced big data.Its strength lies in the rapid development of such databases in comparison to traditional spatial data collection.For this reason the emphasis of their quality measures essentially differs from usual RSDQ.Instead of resolution or accuracy the time related dimensions are the most important measures to evaluate quality.Crowd-source remotely sensed big data like OSM is a good example of the usefulness and the fitness for use of this kind of data.
Knowing the weakness of crowd-sourced data we should not misinterpret the strength of traditional RS data acquisition methods.The future perspective is most likely that both RS data collection methods will extend each other and together provide a strong basis for different application of spatial databases.

Figure 1 .
Figure 1.OpenStreetMap detail near to the Budapest University of Technology and Economics campus, Hungary.The color trajectories are from GPS, all the other parts (parcels, buildings, vegetation areas) from imagery evaluation

Figure 3 .
Figure 3. Growth of the road layerThe best visualization of the database evolution is the development of the map visualization of the data.Fig.4shows the road network density near to Budapest in years 2010, 2015 and 2018 respectively.Notice that first main roads have been mapped, then lower category roads are digitized.

Table
(Batini, 2016)ary of dimensions and related metrics for different types of data, such as relational tables, maps, images, linked open data, loosely structured texts and laws.Due to the Variety-law, the Big Data domain has a different nature compared to traditional remote sensing data sources and related quality dimensions.Due to high frequency of data collection and faster acquisition methods when compared to traditional means, time-related quality dimensions have an increased emphasis.E.g. while the resolution dimensions are highly relevant, accuracy has much lower significance in case of RS big data sources.The most important dimensions and their metrics are listed in the following table grouped by main data quality clusters after(Batini, 2016):

Table 2 :
Statistics of the OpenStreetMap