C-AQM : A CROWD-SOURCED AIR QUALITY MONITORING SYSTEM

European cities are currently facing one of the main evolutions of the last fifty years. ’Cities for the citizens’ is the new leitmotiv of modern societies, and citizens are demanding, among others, a greener environment including non-polluted air. Improved sensors and improved communication systems open the door to the design of new systems based on citizen science to better monitor the air quality. In this paper, we present a system that relies on the already available Copernicus Environment Service, on Air Quality Monitoring reference stations and on a cluster of new low-cost, low-energy sensor nodes that will improve the resolution of air quality maps. The data collected by this system will be stored in a time series database, and it will be available both to city council managers for decision making and to citizens for informative purposes. In this paper, we present the main challenges imposed by Air Quality Monitoring systems, our proposal to overcome those challenges, and the results of our preliminary tests.


INTRODUCTION
Air pollution is becoming one of the main threats for urban societies (Baklanov et al., 2016) due to its high impact over the public health (WHO, 2016).Besides the laws and efforts carried out to reduce the pollutants emissions, technicians and administrations are working intensively to develop alert systems aiming to protect the most vulnerable citizens during high pollution episodes (Jiménez et al., 2008).
A key tool for that purpose is the air pollution maps.National and regional authorities are mainly using maps generated from the interpolation of measurements acquired at reference stations located in relevant points of the territory (Jiménez et al., 2008).However, since the price of those reference stations is very high (each analyzer costs between e5,000 and e30,000), the number of existing reference stations is very low.Furthermore, the geographical distribution of those stations is not homogeneous, thus, the resolution and performance of the air pollution maps are directly related to the proximity to one of those stations.Figure 1 shows an example of NO2 concentration map provided by the Caliope system of the Barcelona Super Computing Center (Jiménez et al., 2008).In addition to these maps, the Copernicus European system will soon provide maps with high coverage and medium resolution (7x7km) around Europe. Figure 2 shows the first map of NO2 concentration provided by the Copernicus system at the end of 2017 (ESA, 2018).
Meanwhile, the latest developments in air monitoring sensors technology are facilitating the deployment of wireless networks (Castell et al., 2017) composed of high-accuracy sensors installed in buildings (Schneider et al., 2017) or in zero-emissions vehicles in highly populated areas, e.g., in bicycles (Aicardi et al., 2017), trams (Aberer et al., 2010) or buses (Boscolo and Mangiavacchi, 1998).This approach has been successfully proven in several experimental projects like the Citi-Sense-MOB (Castell et al., 2015), the OpenSense project (OpenSense, 2018), the European Despite the fact that the accuracy of the results obtained in those experimental projects is really promising, the prices of commercialoff-the-shelf top-class sensors range between e500 and e5,000.Consequently, it is very expensive to deploy a dense network of hundreds or thousands of sensor nodes for obtaining accurate air pollution maps with high spatial resolution.In addition, for those air quality monitoring systems based on a mobile cluster of sensor nodes, the properly geo-referenciation of each node is still an issue to be improved, specially in urban environments.Currently, existing systems are based either on expensive positioning systems or they can not provide an accuracy better than 40 meters.These are the main motivations for the work presented in this paper, which aims to fill this gap with the following contributions: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4, 2018 ISPRS TC IV Mid-term Symposium "3D Spatial Information Science -The Engine of Change", 1-5 October 2018, Delft, The Netherlands 2. We propose a low-cost sensor node, named AirCrowd, which includes low-cost gas sensors (NO2, SO2, O3), particle sensors (PM10), GNSS receiver, inertial sensor, magnetometer, and wireless communications transceiver for Narrow Band-IoT (NB-IoT).
3. We propose an algorithm for distributed calibration and georeferenciation of measurements provided by AirCrowd sensor nodes, reference stations, and the Copernicus system.
The remainder of this paper is organized as follows.In Section 2, we present an in-deep review of the challenges of AQM with lowcost sensor nodes.In Section 3, we present the requirements of our AQM system and the proposed architecture with some details on implementation.In Section 4, we present the results of our preliminary tests.The paper finishes in Section 5 with a short discussion and an overview of future work.

AIR QUALITY MONITORING CHALLENGES
The main technical challenges that need to be faced in the development of a fully operational AQM system are: 1) to properly calibrate the low-cost sensors to ensure a performance good enough to use them as a complement of reference stations and the Copernicus system; 2) the correct geo-referencing of all measurements, including those acquired inside urban canyons or tunnels; 3) the wireless transmission of measurements from thousands of sensor nodes to a Cloud-based server; and 4) the storage, analysis and representation of the information.Each of these challenges is described in detail in the following paragraphs.

Calibration
Although there are well established procedures to calibrate gas sensors (EU et al., 2008), the use of low-cost sensors requires innovative calibration approaches.Several previous works have assessed if the measurements provided by low-cost sensors could reach the Data Quality Objective (DQO) of the European Air Directive (EU et al., 2008) for indicative methods, or if they could only be used as quality indicators.In (Spinelle et al., 2015, Spinelle et al., 2017), several calibration methods were tested for single sensor units.The main conclusion of those reports is that Artificial Neural Networks algorithms are the most suitable ones for sensor calibration.This is mainly due to the cross sensitivity of a specific sensor to different gases.In (Hasenfratz et al., 2012), new calibration methods for a cluster of ozone sensors was presented.The main idea of these methods is to compare the collected data with reference stations and also to compare the measurements from different sensors among them.In this paper, we propose a new approach based on the two aforementioned ones, this is, to take advantage of a cluster of dynamic sensor nodes and reference stations and to model the calibration parameters using environmental parameters for calibration.

Geo-Referencing
Regarding the proper geo-referencing of data, it is well known that in open spaces with good sky visibility, GNSS receivers are good enough to provide accurate and reliable positioning.However, in narrow corridors with limited or fully denied constellation visibility, GNSS receivers are neither accurate nor reliable.(Adjrad and Groves, 2018).GNSS/INS sensor fusion (Titterton et al., 2004) and map matching (Quddus et al., 2007) have been the most extended techniques to overcome this problem.
In a low-cost application like the one desired in our research, good enough GNSS/INS sensors are too expensive (more than e3,000), thus a combined GNSS/INS/map matching technology should be envisaged.Map matching techniques computational burden can be high if the amount of data to process is high, if it is not processed properly.Furthermore, this technique relies on updated, and reliable cartography, only available recently thanks to crowdsourced tools like OpenStreetMaps.

Wireless Data Transmission
Existing air quality monitoring systems are typically based on the deployment of a Wireless Sensor Network (WSN) (Arco et al., 2016) composed of several dynamic sensor nodes, one or more gateways, and a Cloud-based server.Each sensor node is equipped with a microcontroller, a set of gas and particle sensors, and a wireless communications transceiver.Each sensor node acquires measurements from its sensors, and transmits via radio all collected data to a gateway, which forwards the data received from the sensor nodes to the remote Cloud server.
The selection of the wireless communications technology is crucial in order to provide continuous and reliable connectivity between the sensor nodes and the Cloud.Some previous works (Arco et al., 2016, Aicardi et al., 2017) propose the use of Bluetooth, Zig-Bee and WiFi.The main disadvantage of these wireless technologies is that their coverage area is very small, thus limiting the wireless data transmission only when the sensor nodes are within the communication range of the gateways (e.g., less than 100m from a WiFi access point).In order to overcome this problem, the work in (Arco et al., 2016) proposes the use of WiFi hotspots freely available in urban areas, as well as dedicated WiFi access points mounted at specific locations frequented by many users.This approach requires the installation of a large quantity of WiFi access points, which dramatically increases the cost of the system.
An alternative to the short-and medium-range wireless communications technologies (i.e., Bluetooth, ZigBee and WiFi) are the cellular networks based on 2G (GSM, GPRS), 3G (UMTS) and 4G (LTE) technologies.As an example, the system proposed in (Respira team, 2017) is based on GPRS.It is well known that cellular networks provide large bandwidth, high data transmission rates, and ubiquitous connectivity, especially in urban areas, which allows the communication of the sensor nodes with the Cloud server from the majority of the locations.In addition, sensor nodes can directly communicate with cellular basestations, thus, gateways are not required and the complexity and costs of the air quality monitoring system can be reduced considerably.However, the main disadvantage of this approach is that the energy consumption of cellular wireless communications transceivers is very high, which may reduce the battery lifetime of the sensor nodes, increase the batteries' weight and dimensions, and increase the costs in battery replacements.
Nowadays, the rapid introduction of Low Power Wide Area (LPWA) networks are facilitating the implementation of a large number of use cases of the Internet of Things (IoT).LPWA networks (e.g., Sigfox, LORA, Narrow Band-IoT) provide wide coverage ranges of tenths of kilometers from the base-stations, they have been designed to minimize energy consumption and to transmit short data packets at low and medium data-rates.In this work, we have considered a LPWA network in order to fulfill the basic requirements of air quality monitoring systems in terms of coverage, data-rate and energy-consumption of the sensor nodes.In particular, we have selected Narrow Band-IoT (NB-IoT), which is a LPWA network technology originated as an evolution of LTE.
NB-IoT provides a bandwidth of 200kHz, a data-rate of 150kbps in up-link and down-link, and improves the overall link-budget in 20dB with respect to LTE.

Data Storage, Analysis and Visualization
The selection of a database technology for Air Quality Monitoring may seem a trivial issue.What happens is quite the opposite.
The selection of the database should take into account not only the expected input data, but also who is going to see/use the data and what this data is going to be used for.
An AQM system like the one envisaged in this paper could be useful both for curious citizens that want to know about the air quality of their neighborhood, and for technical staff of city halls that need to manage road traffic or health alert systems.While citizens could only be interested in the concentration of air pollutants near their location, the second ones will be interested also in analyzing the temporal and the spatial behavior of the pollutants, using not only the amount of pollutants, but also the environmental information.This is, a city and an atmospheric model could be also needed (Aicardi et al., 2017).
The object-oriented data bases seem a good solution to satisfy all the needs.Thus, for example, the CityGML model provides a powerful structure to store and share air quality data but also environmental data.The problem of this model is that there are not good enough tools yet to manage it (Aicardi et al., 2017).
The proposed alternative of (Aicardi et al., 2017) is to translate the model into an object-relational model (SQL database), and assume that it is going not to be that much flexible neither fast.
Given the volume and acquisition rate of sensor data, in this paper we propose the use of a structured time-series non-SQL database, much simpler than a object-oriented database, but much more efficient than a object-relational database in terms of flexibility and access times.
Last, but not least, the AQM community needs to decide how to show the data to their different users.Not a single solution seems optimal for all of them.While citizens expect some user-friendly intuitive web-based tool (e.g., (BAM program, 2018, McArdle andKitchin, 2016)) or a smart-phone application like Caliope for iOS and Android, city council technicians need complex Geographic Information System (GIS) tools like QGIS or ARCGIS software tools.In this paper, we will show several solutions that can be of interest to different users, all of them able to manage data directly from the time-series database.

CROWD-SOURCED AIR QUALITY MONITORING
This section describes the system requirements and the architecture of the Crowd-sourced Air Quality Monitoring system (C-AQM) proposed in this paper.

System Requirements
The main requirements of the C-AQM system are listed below: • The aim of the C-AQM system is to create air quality maps which can be used in applications that require the representation of air pollution levels on a coarse scale, e.g., for awareness raising purposes (Castell et al., 2017).
• The C-AQM system is not aimed to be used in high accuracy applications that require to meet neither the DQO defined in air quality legislation, nor to be used in epidemiological studies.
• Once the C-AQM system has been properly calibrated, the C-AQM system must provide air quality measurements with an accuracy between 25% and 50%, and a precision of 10%.
• The C-AQM system must provide hourly updated information of the more relevant gases for human health: NO2, PM2.5, O3, PM 10 and SO2.
• The information provided by the C-AQM system must be properly geo-referenced, with one meter accuracy, even in tunnels or in narrow streets, with a network density of around 1 point every 100 meters.

System Architecture
The architecture of the C-AQM system is based on three subsystems as shown in Figure 4: 1) acquisition subsystem; 2) processing subsystem; and 3) storage, analysis and visualization subsystem.The functionalities of each subsystem are described in the following sections.

Acquisition Subsystem
The acquisition subsystem is the one in charge of collecting data, both air quality measurements and positioning data.The acquisition subsystem is based on three different data sources: (i) the Copernicus European system, (ii) the air quality monitoring reference stations, and (iii) a network of hundreds or thousands of low-cost sensor nodes.
In this work, we have developed a low-cost sensor node named AirCrowd (Air Quality Crowd-sourced sensing device).An Air-Crowd sensor node will be installed in the vehicle of each user or data provider of the C-AQM system.The AirCrowd sensor node has been designed as a low-cost, long lifetime, battery-powered, light-weight and small form-factor portable device.
The AirCrowd sensor node is composed of one CC2640R2 System-On-Chip from Texas Instruments, which integrates an ultra-low power microcontroller (Cortex-M3) and a Bluetooth Low Energy transceiver; one L80R GPS receiver from Quectel; one NO2, SO2 and O3 gas sensors from SPEC Sensors; one SM-PWM-01C particle sensor from Amphenol; one Inertial Measurement Unit (IMU) based on an MPU-9250 from TDK Invensense, which combines a 3-axis gyroscope, 3-axis accelerometer, 3-axis magnetometer and a Digital Motion Processor; and one BG96 wireless communications transceiver from Quectel.The BG96 is an ultra-low power consumption LTE Cat.M1 / Cat.NB1 (NB-IoT) / EGPRS module that offers a maximum data rate of 300Kbps down-link and 375Kbps up-link.
The AirCrowd sensor node implements four basic functionalities: (1) acquisition of measurements from gas and particle sensors; (2) acquisition of data from IMU and GPS receiver; (3) process data from IMU and GPS for coarse positioning based on Kalman filtering; and (4) wireless transmission of air quality measurements and positioning data to the Cloud.
The AirCrowd sensor node provides two different operation modes to wirelessly transmit data to the Cloud: direct mode and gateway mode.The operation mode of the AirCrowd sensor node can be configured from an Android application running in a smartphone of the user.The smart-phone and the AirCrowd sensor node can communicate via the Bluetooth radio interface.In direct mode, the AirCrowd sensor node transmits data to the Cloud through NB-IoT using the BG96 module.Once the AirCrowd sensor node is connected to the NB-IoT network, it periodically transmits data to the Cloud using the Message Queuing Telemetry Transport (MQTT) protocol.The time period between consecutive MQTT messages must be equal to or greater than 5 seconds.
The direct mode requires a SIM card (Subscriber Identity Module) installed in the AirCrowd sensor node and a contract with a specific NB-IoT operator.
In gateway mode, the AirCrowd sensor node transmits data to the Cloud by using the user's smart-phone as a gateway.The AirCrowd sensor node transfers all collected data via Bluetooth to the smart-phone, which finally forwards the data to the Cloud through the LTE cellular network using the MQTT protocol.Despite the gateway mode requires using a smart-phone, which has to be located very near to the AirCrowd sensor node, the gateway mode provides three interesting advantages.Firstly, the energy consumption of the AirCrowd sensor node is greatly reduced, thus enlarging its battery lifetime.Secondly, no SIM card is required in the AirCrowd sensor node.Thirdly, since the microcontroller of the AirCrowd sensor node is rather constrained in terms of memory and processing power, the computational resources of the smart-phone could be used in the future in order to implement heavy processing algorithms of raw data before transmission.

Processing Subsystem
The processing subsystem is the one in charge of processing the raw data in order to provide reliable calibrated and geo-located pollutant measurements.
In order to ensure a proper geo-referencing of all collected data, the C-AQM system uses information from the positioning sensors included in the AirCrowd sensor nodes (i.e., GNSS single frequency receiver, inertial and magnetometer data) and also from public data like street maps information available at Open street maps.All data provided by these sources is introduced in a DGNSS / INS / map-matching post-processing algorithm able to properly locate the measurements with one-meter accuracy, even in urban canyons or in short tunnels.
Every hour, the newest introduced data-set is processed in order to correct the measurements errors.This calibration process can be carried out thanks to the crowd-sourced nature of the C-AQM system.We can assume that some AirCrowd sensor nodes will acquire data near a reference station.Those that not, will at least acquire data in the same place, or near, where others did before.Thus, we can imagine the trajectories followed by all the C-AQM users as a dense network with nodes that must share the same measurement values.Furthermore, all the collected data has to be coherent with Copernicus provided maps.Consequently, a single least square adjustment of these network allows to calibrate all the sensors at once with an accuracy equivalent to the our sensors noise.

Storage, Analysis and Visualization Subsystem
The storage, analysis and visualization subsystem is the one in charge of storing and providing the information to the users in a friendly and understandable way.
Regarding the storage functionality, several databases have been considered.Although an SQL solution would be suitable for testing and validation purposes, it would not be fast enough when dealing with large amounts of data-sets.Thus, a time series database has been selected as storage solution.We have selected In-fluxDB because it is reported as the current most popular time series database.With this selection we ensure not only that we are using a good enough solution, but also technical support from their developers in the mid-term.
Since air quality is a topic of interest not only for technicians but also for citizens, the analysis and visualization component should be flexible enough in order to provide interfaces for both types of users.In this sense, we have envisaged two different user interfaces: expert and amateur.
The expert user interface relies on a GIS system.The data stored at InfluxDB can be downloaded as a shape file (.shp).The expert user will be able to analyze, manage and represent the data in any GIS system like QGIS or ArcGIS.
The amateur user interface offers the data in a web-based application.It provides both quantitative information about pollutants and also indicates if the values are above or below the health alert limits.Two different platforms have been tested for this approach.The first one is a SMART city oriented platform named Sentilo (Sentilo Team, 2018), among others, the information regarding pollutants can be published here in cities where this platform is already available.The second platform we used to present the data to amateur users is the Grafana platform, an open source platform for time series analytics (Grafana Team, 2018).Grafana is a worldwide known platform able to easily deal with InfluxDB time series database and with a friendly user interface.

PRELIMINARY RESULTS
In order to validate the operation of the C-AQM system, we have planned to test first all its components separately (i.e., unit tests), and then perform integration tests at the system level.In this section, we present the preliminary results of the unit tests.

AirCrowd Sensor Node
The validation of the AirCrowd sensor node has been carried out with real measurements.Two types of tests have been defined: static and dynamic tests.The aim of static tests is to verify the functionalities of the AirCrowd sensor node in a fixed position as well as to evaluate the sensors' specifications.The aim of dynamic tests is to validate sensors behavior under dynamic conditions, i.e., when the AirCrowd sensor node moves, and also to test the system in relevant environments, i.e., in outdoor urban scenarios.The following sections describe the results of the static tests.The execution of dynamic tests will be carried out in future test campaigns.

NO2 Concentration
This section shows the results of the NO2 concentration measurements acquired by the NO2 sensor of the AirCrowd sensor node.The tests have been realized in static (i.e., NO2 sensor in a fixed position) at 25 • C for 1.5 hours (short static) and 20 hours (long static).The errors of the NO2 concentration measurements in short static and long static tests are shown in Table 1. Figure 5 shows the measurements of NO2 concentration acquired by a reference station and the NO2 sensor of the AirCrowd sensor node within a time period of 20 hours.
As it can be observed in  small enough to consider the bias as constant within a temporal window between two and three hours.On the other hand, as it can be observed in Figure 5, the NO2 measurements are rather noisy, with a standard deviation of around 7 µg/m 3 , mainly due to a large quantification error of the NO2 sensor.
As it can be observed in Figure 5, by applying a moving average filter of 1 hour and a correction of a constant bias on the measurements provided the NO2 sensor of the AirCrowd sensor node, the results achieved fulfill the precision and accuracy requirements (better than 25%) defined in Section 3.1.

Positioning Sensors
As expected, in all the tests, the accuracy provided the GPS receiver (code-based, single frequency) is around 2 meters, with higher errors in long and narrow streets (up to 50 meters in our tests).

Communications
The communications between the Air-Crowd sensor node in direct mode and the Cloud server have been successfully tested.To this end, the microcontroller of the Air-Crowd sensor node configures the BG96 module in order to connect to the NB-IoT network provided by Vodafone, and opens a TCP socket client and an MQTT client that connect to an MQTT broker in a Cloud server.Once the MQTT client is connected to the MQTT broker, the client starts publishing MQTT messages with measurements data in JavaScript Object Notation (JSON) format.We have observed that the minimum time between consecutive MQTT messages is 5 seconds, otherwise the MQTT publisher client disconnects.
In order to verify the correct data transmission from the AirCrowd sensor node to the Cloud server, we have used an MQTT client implemented with the Node-Red platform, which is subscribed to the topics where the AirCrowd sensor node periodically publishes.As it was expected, the MQTT subscriber client successfully receives and parses all the MQTT messages transmitted by the AirCrowd sensor node.Finally, the measurements contained in JSON format in the payload of each MQTT message are stored in a time series data-base as detailed in Section 4.3.

Processing Subsystem
Due to the lack of a cluster of AirCrowd sensor nodes, the performance of the processing subsystem has been evaluated by using simulated data.Firstly, a set of noisy synthetic trajectories has been generated using the CTTC's GEMMA system (Navarro et al., 2016).The simulated trajectories have been located in the city of Barcelona and at 30% of the epochs their errors are above 10 meters.After that, for each of those trajectories, gas and particle synthetic measurements have been simulated using the error specification of our sensors and modeled data of the city of Barcelona on 2018 May.The real reference data was obtained from the Caliope system (Jiménez et al., 2008) in a day with good air quality but artificially degrading the pollutants concentration across the city.In order to generate our data we have added to the data provided by Caliope a simulated random error compliant with our sensor noise, quantization and drift specification, this is, the noise and quantization is about 10µg/m 3 and the biases is constant within an hour and their sigma value is around 40µg/m 3 .
Figure 6 shows the estimated position of a data set before and after positioning post-processing.The analysis of the results shows that in the worst case the positioning error is below 2 meters, fulfilling the positioning requirements presented in section 3.1).
Regarding sensor calibration, we have observed that the proposed calibration algorithm is able to correct the bias with a residual error of 10µg/m 3 , this is the sensor noise level.Regarding the sensor noise, the proposed approach is not yet able to deal with it.Figures 7 and 8 show a properly geo-referenced data set before and after the calibration process, respectively.Figure 7 shows the simulated data for seven sensors moving along the Barcelona city center.After the processing of the data it can be observed that the bias of the sensors have been corrected, all sensors measurements in the same areas are now in the same range of values.The second plot colors have been chosen to help people understand the plotted values: green is good air quality, while red is bad.

Storage, Analysis and Visualization Subsystem
The InfluxDB system was properly installed in a Cloud server and the measurements data received via MQTT were successfully  stored in a time-series data-base.The write and read accesses to the database are performed from the Node-Red platform, which also works as an MQTT subscriber client that receives messages from the AirCrowd sensor node, as explained in Section 4.1.3.Using this platform as a server, we have been able to generate a shape-file to be represented in the QGIS software tool (Figure 8) and also to represent the time-charts of measurements in Grafana and Sentilo.

CONCLUSIONS
In this paper, we have presented the idea behind the Crowd-sourced Air Quality Monitoring system (C-AQM).Based on the current state-of-the-art and the available technologies, we have designed a system that combines air quality measurements obtained from the Copernicus system, the available reference stations, and a cluster of low-cost low-energy sensor nodes.By jointly processing these measurements, the system is able to generate high resolution (10x10m) air quality maps.The first simulations and data acquisitions allow us to be optimistic on the suitability of the system.The short-term stability of the sensors (bias stable within 60 minutes) is good enough to be used in a system that should allow sensor re-calibration every few minutes (20 minutes maximum between calibrations), and the GNSS/INS/map matching technology provides enough accuracy (below 2 meters) for the application.The performance of the time-series databases and the new web-based applications like Sentilo and Grafana seems an optimal solution to present the results to the inhabitants of any European City.Despite of the promising preliminary results, the project is still on its initial phases and a full set of dynamic tests under several environmental conditions is still needed.

Figure 4 .
Figure 4. Architecture of the Crowd-sourced Air Quality Monitoring System.

Figure 6 .
Figure 6.Estimated locations of a C-AQM data set before (red)and after (green) positioning post-processing.

Figure 7 .
Figure 7. Simulated air quality maps of a neighbourhood of Barcelona obtained with the C-AQM system.Map of properly georeferenced but uncalibrated data.

Figure 8 .
Figure 8. Simulated air quality maps of a neighbourhood of Barcelona obtained with the C-AQM system.Categorized map of properly georeferenced and calibrated data generated using QGIS.

Table 1 .
Table1, the analysis of the NO2 concentration measurements shows, in one hand, a relevant bias in both short static and long static tests.However, the drift of the bias is NO2 concentration errors of the AirCrowd sensor node