TOWARD FLEXIBLE DATA COLLECTION OF DRIVING BEHAVIOUR

: Recently, driving behavior has been the focus of several researchers and scientists, they are attempting to identify and analyze driving behavior using different sources of data. The purpose of this research is to investigate data acquisition methods and tools related to driving behavior, in addition to the type of data acquired. Using a systematic literature review strategy, this study identified tools and techniques used to collect data related to driving behavior among 120 selected studies from 2010 to 2020 in several literature resources. It then measured the percentages of the most commonly used methods, as well as the type of data collected. In-vehicle and IoT sensors was found to play the greatest role in data collection in approximately 67% of the documents selected studies; And concerning the type of data acquired, those relating to the vehicle are the most widely collected. Thus, this study definitively answers the question regarding the different data sources and data types used among researches. However, further studies are needed to give more attention to the driver's data and also to investigate the data from the three dimensions of driving (driver, vehicle, and environment) together as an integrated and interconnected system.


INTRODUCTION
Recent research indicates that driver error contributes to up to 75% of all roadway crashes (Stanton and Salmon, 2009). Literally, human factors contribute in the manifestation of 95% of all accidents, according to study of 2041 traffic accidents conducted by (Sabey and Taylor, 1980). Reducing those huge numbers and save people's life become necessity, for that reason and in order to improve safety, security and comfort of the driver and other road's users, many studies were dealing with the topic of driving behavior (DB) using different approaches and techniques.
The common element in these studies, is represented by source of data according to (Elamrani Abou Elassad and Mousannif, 2019). In fact, the majority of researches at the field of DB are using one of those three types of studies, Naturalistic Driving Studies (NDS) , Field Driving Studies (FDS) or Simulator Driving Studies (SDS) to collect data (Yang et al., 2018a). Which clearly allow us to realize the importance of the data acquisition process to analyze driving behavior.
Moreover, (Andria et al., 2015) considered that the data acquisition in automotive environments is widely used in everyday applications. Actually, the recent computerizations of cars, together with the development of sensor technologies and car communication devices have transformed the cars into wealthy sources of information on the driver, the vehicle and environment (Bouhoute et al., 2019). In addition, the remarkable advancement of Internet of vehicles (IoV) technologies and big data technologies in recent years have offered new solutions to improve traffic safety and efficiency (Cen et al., 2017).
While these precursor works offer helpful insights into DB evaluation from a data-based perspective, it is crucial to note that through data-collection examination for DB analysis is quite limited; to the authors' knowledge minimal work has been directed to the investigation of the harnessed data characteristics in this domain. Therefore, this paper aims to present a short survey that reveals methods of data collection process and tools linked to Driving Behavior, in which we present the most techniques and measures used to collect and gather useful information to analyze driving behavior. Another aspect has also been covered in this work which concerns the three driver's dimensions data.
The rest of the paper is organized as follows: Section II presents the methodology adopted to select some related existing works to data collection of driving behavior, and the process of extracting and synthesizing the data. In Section III, results obtained about techniques and technologies used to collect data, then the three dimensions of driving behavior related data. Finally, Section IV concludes the paper.

METHODOLOGY
In order to identify, analyze and interpret all available evidence related to "data collection in the area of driving behavior", we planned, conducted and reported the review by following the systematic literature review SLR process (often referred to as a systematic review) suggested by (Kitchenham and Charters, 2007).
This process aims to present a fair evaluation of the topic mentioned above using a trustworthy, rigorous, and auditable methodology.
research questions and producing a review protocol are the most important pre-review activities.
The first two phases are described by (Wen et al., 2012) through their development of the review protocol that mainly includes six stages (Figure 1): research questions definition, search strategy design, study selection criteria and procedures, quality assessment, data extraction and data synthesis. The figure illustrates the whole process followed on this study. The first stage in this process involves raising a set of research questions (RQs) based on the main objective of the study.

Research questions
The aim of this paper is to summarize and describe the majority of techniques of data collection in the area of driving behavior including all its dimensions. Towards this aim some RQs were addressed. The

Search strategy
Once the RQs have been identified, a research strategy must be followed it. It consists of selecting the search key terms (keywords), resources (libraries or others with relevant experience) and search process.

Search terms
The search terms used in this paper were constructed using the following strategy (Wen et al., 2012), : a) Derive major terms from the questions; b) Identify alternative spellings and synonyms for major terms; c) Check the keywords in any relevant papers we already have; d) Use the Boolean OR to incorporate alternative spellings and synonyms; e) Use the Boolean AND to link the major terms from population, intervention, and outcome.
The result of analyzing RQs of topic "Toward flexible data collection of driving behavior" mentioned above brought us to extract the following keywords: Data collection -Driving behavior After that, we tried to find new words, synonyms and alternatives spellings of the keywords already found and the results are: • Data collection: acquisition, assembling, • Driving behavior: driving style, driving pattern, driving profile.
Once we identified the most keywords and their synonyms, we adopt the basic rule to establish the search string: for each separated word, we found its synonyms and concatenated them with the OR connector. After the definition of the groups of words with their synonyms, we concatenated them with AND to end the string. And search string extracted from are: ("Data collection" OR "data acquisition") AND ("driving behavior" OR "driving style" OR "driving pattern" OR "driving profile")  (3) This strategy was applicated on the title and abstract of each article.

Resources
In this study, we used four electronic databases as the literature resources to search for primary studies (IEEE Xplore, ScienceDirect, Web of Science and Google Scholar). Since the search engines of different databases use different syntax of search strings, our search string constructed previously was adjusted to accommodate different databases and used to search for journal papers in those electronic databases published between 2010 and 2020.

Search process
This research has been conducted on the four electronic databases separately, then export the CSV file of the returned papers and gather the results together to form a set of candidate papers ( Figure 3).A script of python was applicated to this set of papers to generate world cloud of the title and the abstract of each article. This script is free available on GitHub 1 .
Then, the set of articles selected has been scanned so as to remove duplicated documents. Some reading strategies has been used and described on next subsection 'study selection criteria' to identify 120 relevant articles which were then used for data extraction and data synthesis.

Study selection criteria
Search criteria for a first stage resulted in 1224 candidate papers (see Figure 3). Due to the fact that many of the candidate documents do not provide any useful information to answer the research questions raised by this paper, further filtering is needed to identify the relevant papers. Knowing that both the title and the abstract are generally written correctly, accurately, carefully, and meticulously, i.e. they confirm whether the document is strongly pertinent for the mean topic of the study or not. Moreover, the 'word cloud' technique reveals the essential from an extract of text, fast and engaging. It was applied to the title and the abstract of every documents, it is used to represent the words that compose the title and the abstract in different sizes according to the frequency of their use, as illustrated in ( Figure  4).
Analyzing the results obtained and keeping those in which the following words, 'data, behavior, driving, collecting and acquisition'. If one of the above-mentioned terms appeared widely and broadly, we select the article. As a result, we have selected 244 articles.
We have used the Skimming 2 and scanning 3 reading's techniques for the purpose of removing the duplicated articles and to get a general overview of the relevant article. During this stage, we try to preserve scientific documents that contain valuable insights about the data collection, thus permitting us to highlight 120 relevant articles.

Study quality assessment
On the one hand, the quality assessment QA of the selected studies is initially used as the basis for weighting the quantitative data extracted in the meta-analysis according to (Julian PT Higgins, 2009). And since we are interested in this first work by the percentage of data sources used and the percentage of driving's dimension data on the other hand, we do not specify a dedicated QA to this paper. Instead, we just verified whether the articles involved provide relevant information regarding of all these aspects.

Data extraction forms:
This subsection aims to clarify the process of extraction the data followed in this paper. We exploited the selected studies to collect the data that contribute to addressing the research questions concerned in this work. In fact, the data extraction process is designed to answer the following questions: ▪ what is the data acquisition tool used to acquire the data? ▪ Which of the three dimensions of driving is covered? driver, vehicle, or environment?
While trying to find answers to these questions, some data could not be extracted directly from the selected studies. Nevertheless, we were able to obtain them indirectly by processing the available data in an appropriate form. For example, there are some studies that use databases offered by other previous works, in this case we try to see sources of data in the original work if available, otherwise we conclude based on the rest of the article. ( Figure 5) illustrates some extracted data. As shown, the figure composed of three headings. These rubrics are in a way a reformulation of the previous questions. The data extraction process consists of giving "1" or "0" (green icon or reed icon) according to the presence or absence respectively of each item in the article, the comprehensive list of the relevant articles selected to this paper and extraction results are given in the appendix.

Data synthesis methods
Data synthesis aims to gather all previous results, interpret results, shed light on the interests of most researchers and reveal some future areas of research. Actually, the purpose of data synthesis is to aggregate evidence from the selected studies for answering the research questions. Therefore, by summing up the scores obtained through results of data extraction process and using some visualization tools, including pie chart to present the percentages pertaining to the source of data used and DVE's data among all the selected articles, we can move to the next section which will be dedicated to results and discussion.

RESULTS AND DISCUSSION
This section presents and discusses the findings of this short review. First, we introduce the data collection topic and the statistics of most commonly methods used to gather information related to driving behavior. Then, we present the instruments and measurements techniques used to collect data according to the selected studies one by one in the separate subsections. Second, we reported statistics of researchers' attention to driver's dimensions.

Data collection
This section aims to shed light upon the data collection process and the most techniques used in the literature research related with the field of driving behavior. Before citing methods of data collection, it appears necessarily to define what is the data acquisition first?
According to Cambridge Dictionary (Cambridge), data collection activity means collecting information that can be used to find out about a particular subject. This activity enables a person or organization to understand the relevance topic, answer its linked questions, evaluate outcomes, and make predictions about future probabilities and trends. So, in order to understand driving behavior and a major factor for road traffic safety, assembling and gathering its associated data is a mandatory stage. Nevertheless, the varieties on the sources of data cause a difference in understanding driving behavior among researchers. In fact, the studies on driving behavior assessment have not settled on a common framework due to this diversity (Zhu et al., 2017).
As mentioned above, it is assumed that the rest of this section will describe techniques used in data collection. Some driving style-related studies used self-report and driving behavior questionnaires to collect information, other several studies have taken advantage of new technologies and benefit from the incredible development of automotive sensors such as In-Vehicle Data Recorders, smartphones, IoT sensors and traffic surveillance technologies to sense and collect contributions attributes of DB.
In short, according to (Carvalho et al., 2017), the data collected from the action of driving can be carried out by several kinds of sensors, from those of a general kind in smartphones to dedicated devices such as monitoring cameras, telematics boxes 4 and OBD 5 (On-Board Diagnostic) adapter. (Figure 6) illustrates the statistics and the percentages of techniques and methods used to collect data for driving behavior study according to scientific researches selected for this work.
A brief description of these techniques is detailed in (Hata! Başvuru kaynağı bulunamadı.) as shown below.
As can be seen in (Figure 6), the most dominant data source for driving behavior among the 120 articles used in this paper is the integrated sensors "In-vehicle sensors", at the rate of 40%. Then IoT device and other sensors with a percentage of 26%. The smartphone has also demonstrated a strong capacity of data collection at the rate of 12%, followed by the use of Self-report technique by 10% and other databases with 8%. Finally, the use of the traffic surveillance's tools by 4%. 4 A telematics box or black box is a measurement probe installed inside the vehicle. It may be equipped with its own sensors or be connected to the vehicle's internal sensors via the CAN-bus. 5 OBD is a system that enables current vehicles to carry out a self-diagnosis and provide real time data (e.g., speed) via a standard communications port. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIV-4/W3-2020, 2020 5th International Conference on Smart City Applications, 7-8 October 2020, Virtual Safranbolu, Turkey (online) Therefore, the type of sensors embedded in the vehicle remains the best source to gather data according to the literature used for this study.

In-vehicle and other sensors:
Driving behavior related data can be acquired usually by on-board devices. In Vehicle Data Recorders (IVDRs) are one of the tools widely used for on-board data collection. They are devices installed on vehicles that monitor and record continuously the vehicle parameters (Bouhoute et al., 2019). In reality, car sensors can produce about 1.3 gigabytes of data every hour and an estimated 312 million gigabytes every year for 4 hours of daily driving according to IBM (Kimberly Madia, 2014), which provides a valuable opportunity for researchers in the area of Driving Behavior. Based on those sensors, (Boquete et al., 2010), (Xie et al., 2019), among other researchers, present a various platforms as the acquisition system. However, the major limitation of this approach is the availability of OBUs. It is likely that such an advanced technology is only available to a biased subset of vehicles.
Furthermore, the need of data from the driver's interaction with the vehicle requires more sensors to be added to the vehicle. For this reason, several demonstration tools have been developed to access the available telemetry data, (Ding et al., 2019) Used electroencephalography (EEG) and steering behavior in a simulated driving experiment to test the correlation between some patterns of driving behavior, cognitive states and personalities. Since driving is a social act which human factor has the most important role in it. Some researchers are interested in the study of the human contribution. (Yang et al., 2018b) used an electrode cap connected to Curry 7 software to collect EEG signals.

Self-report and questionnaires:
Studies in transportation psychology have traditionally employed selfreport measures to examine personality, motivations, cognitions, and perceptions on the one hand, and driving behavior, driving styles and skills and involvement in traffic violations and crashes on the other. Nevertheless, the usefulness and validity of such instruments is often questioned particularly when the aim is to capture risky driving behavior (Boufous et al., 2010). Generally, there is widespread use of self-report measures of driving behavior in the traffic psychology literature. Moreover, Most prevailing studies have used subjective questionnaire data and objective driving data to classify driving behavior whereas few studies have used physiological signals such as electroencephalography (EEG) to gather data (Yang et al., 2018b). One of the studies adopted self-report technique is conducted by (Useche et al., 2019) to collect data for their research that was composed of three core sections: The first part of the questionnaire asked about individual and demographic variables, job-related features and job type and road safety indicators.
Although surveys and self-reports represent a powerful and inexpensive tool for studying various topics in traffic behavior in addition to much of the knowledge in transportation psychology that has been gained by this technique, there is still a dispute regarding the usefulness and validity of such instruments, leading to less than ideal and trustworthy reports on one's own driving behavior and some serious limitations that must be taken into account when using these methods.

Smartphone:
The emergence of affordable sensing and computing platforms has a real impact on the appearance of new fields related to driving behavior. One of them is the analysis of driving performance through the use of mobile technology, a field also known as Smartphone Driving Analytics (Carlos et al., 2019). Recently, smartphones have a rich set of on-board sensors such as accelerometers, gyroscopes, GPS, and cameras. These sensors provide valuable information when investigating users' needs and behavioral patterns. Several researchers are currently using mobile phones to collect and gather driving related data. As reported by (Warren et al., 2019) data collection using un-obtrusive technology such as smartphone provides a valuable alternative to study-based data collection. The percentage of 52% (11 out of 21 studies) related to studies that used the mobile phone to acquire data is based only on the use of a cell phone. However, even smartphones are shown to have great potential in data collection. They are largely regarded as dangerous because of its potential to cause distracted driving and crashes.

In-vehicle sensors
Devices which can transform physical quantities such as pressure or acceleration into output signals (usually electrical). And always embedded on the vehicle.

Self-report questionnaires
A research instrument consists of a series of questions to gather information and data about the driver.
Other sensors sensors that are not integrated into the car, including sensors of IoT like Arduino and Raspberry Pi, … smartphone high-resolution and high-speed (CMOS) image sensor, global positioning system (GPS) sensor, accelerometer, gyroscope, ambient light sensor, and microphone, ….

Traffic surveillances
Observation from a distance, using some techniques such as closed-circuit television (CCTV), or interception of electronically transmitted information such as internet traffic.

Dataset
International driving-dataset projects.

Table 2. Description of measurement techniques used to collect data
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIV-4/W3-2020, 2020 5th International Conference on Smart City Applications, 7-8 October 2020, Virtual Safranbolu, Turkey (online)

Traffic surveillances:
Road infrastructure development has received widespread attention of many countries in recent years, as well as trying to equip the road with the latest technology. As a result, traffic surveillances instruments have brought new opportunities for researches in terms of gathering data of driving to investigate different DB's facets. Among researchers who have already taken advantage of this source of data (Zhou et al., 2011) built a framework to define driver behavior patterns by extracting vehicle information from traffic video sequences. Moreover, urban traffic surveillance data at both intersections and road segments were used by (Hongxin et al., 2016) to investigate the driver's involvement in the accident. In addition to a few other researchers addressed the topic of DB using traffic surveillance data, it remains limited use of this source of data.

Datasets:
the present time, data is becoming the key for a majority of challenges. Indeed, one-of them is the driving behavior. For many years several researches gathering data related to driving and generating datasets to better understanding the behavior of the driver. (Figure 8) shows some examples of several projects around the world that have collected on-road driving-data according to (Miyajima and Takeda, 2016).
As illustrated previously, the percentage of studies selected for this work that have used pre-collected data sets is 8%. The database of 100-Car Study conducted by the Virginia Tech Transportation Institute was used to modelling of driver Car-Following behavior (Sangster et al., 2013). While (Hamzeie, 2016) investigates how speed limits affect driver speed selection using both data collected on real-time with a Roadway Information Database. (Hallmark et al., 2015) and (Lv et al., 2019) among others several researchers take advantage of the rich Second Strategic Highway Research Program 2 Naturalistic Driving Study (SHRP 2 NDS) datasets to investigate and study assertive approaches of the driving. Furthermore, (Li et al., 2019) used the data set of the electric vehicles to identify the driving patterns.

Driver Vehicle Environment's Data
Driving is a driver-vehicle-road environment system and all the three elements affect each other and the whole system. One driver behavior error or vehicle fault or road environment anomaly may lead to another and a chain of reactions within the whole driving system (Mao et al., 2019).
Thus, researches have been interested in these three dimensions of driving behavior for a long time; in order to clearly distinguish the relationship between the different dimensions of the DB and the DVE model and also to better extract the driving's dimension addressed in each article, we have been based principally on theoretical framework proposed by (Elamrani Abou Elassad et al., 2020).
According to the statistics of the studies selected for this paper (Figure 7), it is clear that researchers are more interested in vehicle-related data collection than in driver-related data or the surrounding environment at rates of 52%, 29% and 19% respectively.
Vehicle related data includes several kinematics such as speed and acceleration/deceleration are the most common measurements in the scientific literature because of its direct impact on the driving behavior and also the opportunity to easily getting them.
The driver's profile and state, namely the physiological and psychological conditions provide in its turn relevant and essential information for understanding the driving behavior. In fact, due to the direct contribution of the human being in the driving process, his/her own data have a very strong influence on predicting and detecting of driving events. However, collecting such data usually needs more equipment and sensors.
Finally, surrounding environment data including road geometry, road condition, road type, traffic and the weather condition often requires remote access to the data.

CONCLUSIONS
The present survey, in particular the last two sections on data collection and types of data collected, provides a few good insights indicating the need of high-resolution driver's data. On the basis of this, we got the following results:

▪
Most researchers are interested in the use of in-vehicle sensors to acquire information related to driving behavior. Also, vehicle data have been the primary focus of data collection.

▪
The percentage of the studies that cover driver's dimension remains very limited, knowing that the driver plays both the role of the controller and the major evaluator of the vehicle quality and the pathfollowing. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIV-4/W3-2020, 2020 5th International Conference on Smart City Applications, 7-8 October 2020, Virtual Safranbolu, Turkey (online) This study cannot claim to be complete although we believe that it will be a valuable resource for anyone interested in research on driving behavior in general and data acquisition in particular.

Source of data Dimension
In-vehicle sensors