KINECT, A NOVEL CUTTING EDGE TOOL IN PAVEMENT DATA COLLECTION

Pavement roughness and surface distress detection is of interest of decision makers due to vehicle safety, user satisfaction, and cost saving. Data collection, as a core of pavement management systems, is required for these detections. There are two major types of data collection: traditional/manual data collection and automated/semi-automated data collection. This paper study different non-destructive tools in detecting cracks and potholes. For this purpose, automated data collection tools, which have been utilized recently are discussed and their applications are criticized. The main issue is the significant amount of money as a capital investment needed to buy the vehicle. The main scope of this paper is to study the approach and related tools that not only are cost-effective but also precise and accurate. The new sensor called Kinect has all of these specifications. It can capture both RGB images and depth which are of significant use in measuring cracks and potholes. This sensor is able to take image of surfaces with adequate resolution to detect cracks along with measurement of distance between sensor and obstacles in front of it which results in depth of defects. This technology has been very recently studied by few researchers in different fields of studies such as project management, biomedical engineering, etc. Pavement management has not paid enough attention to use of Kinect in monitoring and detecting distresses. This paper is aimed at providing a thorough literature review on usage of Kinect in pavement management and finally proposing the best approach which is cost-effective and precise.


INTRODUCTION
Pavement roughness and surface distress detection is of significant importance to decision makers due to vehicle safety, user satisfaction, and cost saving. In order to measure these defects, sustainable tools are required to be cost-effective, environmentally friendly, and user friendly. Generally speaking, defects have been recently detected by smart devices; however, most of them are expensive. There is still a huge gap in the related literature about implementing a novel cost effective device to capture pavement condition data in an efficient and effective way. For this purpose, the first step is to define distresses required to be detected and methods of capturing them.

Distress Definition
In FHWA 1 2003 the pothole is defined as "bowl-shaped holes of varies sizes in the pavement surface which has the minimum plan surfaces of 150 mm". The FHWA categorized the pothole into three severity levels which are low (<25 mm deep), Moderate (25 mm to 50 mm deep) and High (>50 mm deep) (Pavement & Program 2003). Table 1 shows the different categories of cracks. In literature, the longitudinal crack and transverse crack are in favor of researchers. According to FHWA, the longitudinal crack is defined as a crack that is parallel to centerline which can be occurred in wheel path or non-wheel path, and transverse crack is defined as a crack that is "predominantly perpendicular to pavement centerline" (Pavement & Program 2003). According to AASHTO, the crack is defined as discontinuity in the pavement surface with minimum dimensions of 1 mm(0.04 in) in width, and the minimum dimension of 25 mm(1 in) in length. Based on AASHTO, the severity level 1 crack is expressed as a crack which the width is less than 3 mm(0.125 in), severity level 2 crack is defined as a crack that the width is between 3 and 6 mm and the severity level 3 cracks is denoted as a crack that the width is more than 6 mm (Definition 2009).

Data Collection Method
Data collection as a core of pavement management systems, has two major methods: traditional/manual data collection and automated or semi-automated data collection. The automated data collection methods are widely used by the agencies, but the manual methods are used by some agencies for low traffic zones. Two main methods for colleting the pavement data in traditional/manual system are walking and windshield survey (McQueen & Timm 2005) discussed in the next two sections. The automated data collection methods utilize different technologies as described below.

Traditional/Manual data collection
Walking survey is dependent to a trained operator (trained rater), who collect defects data. The rater has to walk along the road and complete the form for type, severity, and density of defects. In the windshield survey, the rater seat in the moving car when the vehicle move along the lane or shoulder (Timm & McQueen 2004).

Automatic data collection
The authors found a very comprehensive literature for automated data collection tools. The automated/semi-automated collection machines utilize different technologies for detecting surface defects. During this study, the technologies are categorized for pothole and cracks. In the first part of this section, the technologies of data acquisition are discussed, and after that, several commercial vehicles which used in different countries will be illustrated, and in the last part, their applications are criticized.

Automated data collection technologies
The technologies for detecting the pothole are acceleration based, 3-D reconstruction and vision based methods (Kim & Ryu 2014). The detection accuracy is a determining factor in choosing the technologies. The acceleration based method is based on using a tilt motor for capturing the acceleration and in some tools, the quarter car model is simulated. In the acceleration based (vibration based) method, the accuracy barely reaches to vision based method because it is dependent to the accelerometer types and the the vehicles specifications such as tire pressure (Kim & Ryu 2014).
In previous studies, Erikson et al. in 2008 develop a system called Pothole Patrol (P 2 ) for detecting the pothole by smart phones. The accelerometer and GPS are utilized for this purposes. The machine learning approach was used to define a pothole from accelerometer data (Eriksson et al. 2008). Mendis et al. in 2001 use 4 android smart phones for detecting the pothole in the road and the results revealed that the true-positive ratio of their algorithm is close to 90 percent (Mednis et al. 2011).
This technology is effective for real time processing and due to availability of the cellphones and accelerometers, the cost of this technology is affordable for many users. The accelerometer is not able to detect the potholes, unless it locates at the wheel path, so the wheel path defects are only detectible defections in this method. According to AASHTO, Figure 1, shows the wheel path which is equal to 2.5 meter.
Figure 1 . Wheel path definition by AASHTO 2 (Definition 2009) 3-D laser scanners is one of the most famous tools for detecting of pavement surface defects which based on measuring the time spent for reflecting the laser pulse into the projector. The literature showed that this technology can detect the severity of pothole accurately but the laser scanner camera would be too costly (Li et al. 2009). Li et.al in 2009 detected some failure such as rutting, shoving and pothole by high speed 3-D transfer scanning like infrared laser and digital camera (Li et al. 2009). The visual based method includes two common imaging methods, which are analog imaging method and digital imaging method. Analog images are captured by camera in the 35 mm film. The technique was used for many years by the companies but the data processing should be accomplishing in the work station. The moving van is equipped by several cameras for capturing the image of the front side and left or right sides of the road. Because of shadows in the daylight, the operation is done in night with accessories for preparing enough light. The video taping technology is just the same as aforementioned technology but the cameras are replaced with video camera. The techniques might have an acceptable resolution but the taken image can't be easily converted into digital format, so the digital camera is widely spread in data collection tools. Line scanning and area scanning are two scanning approaches in digital imaging for scanning the pavement surface (McGhee  2004) The line scanning technique is used for creating a two dimensional images by moving a one dimensional sensor camera in extend of an object. The technique is used in fax machine. Mandly Inc. reports that, this technique is applicable for detecting the cracks with the width of at least 3 mm in fulllane of the road. A particular problem of this scanning method occur when the vehicles shade cover the roads surface. This shadow will appear in the images taken by the scanner as a prolonged shadow, if this shadow occurs in the defections point and the wheel path, the captured images are useless. For avoiding this, the appropriate lightening should be prepared (McGhee 2004). The images taken by an area scanning technique, consist of thousands of pixels that in PMS 3 , define the pavement area. This area depends on cameras specification like lens and placement of camera in the vehicle and also the vehicle movement speed. The dimension of the area is one-half to fulllane width and 3 to 5 meter long. The camera angle is an important factor that should be arranged accurately to be perpendicular to the pavement surfaces. The importance of the camera's angle is for avoid happening the distortion in image that cause by distorted pixels (this happens in some video capturing tools in which the video recorder doesn't have high frequencies for capturing picture) (McGhee 2004).

Automated data collection vehicles
This section, will inform some automated vehicle which is used in the globe for crack detection. The first company which has developed the automated crack data collection system was CSIRO 4 .The vehicle can detect the cracks which are wider than 1 mm by digital camera at highway speed by RoadCrack software (Gavilán et al. 2011 (Monti 1995;Gavilán et al. 2011) The Texas Department of Transportation developed the vehicle for data collection with the software, Vcarck. The vehicle is equipped with line scanning camera and the system for crack detection that can work at 112.5 km per sec (Huang & Xu 2006;Xu 2006). TRL in 2011 found that the ratio of true-negative crack detection is a bit high in ARAN and TRL system and Waylink systems. It means that these systems will detect some non-crack surfaces like patches or joint as a crack (McRobbie & Wright 2005) The end of this section, the authors prepare a comparison between traditional/manual data collection methods and automatic data collection methods. It can be figured that the manual data collection methods, is more time consuming and the safety factor of operator can't be reached completely during survey data collection. The main issue herein is the significant amount of money as a capital investment needed to buy the vehicle and provide services to the road networks. It means that the total cost of data collection by manual methods are so cheaper than the automatic methods. Furthermore, in the manual data collection methods, transferring the data into computer, has the risk of misplacing the data. The other benefit of data collection in automatic methods is its less dependency to trained individual for collecting the data rather than manual methods (Hall et al. 2013).

A NOVEL APPROACH, KINECT
The main scope of this paper is to study the approach and related tools that not only are cost-effective but also precise and accurate. The new sensor called Kinect has all of these specifications. During this section, in the first part the comprehensive specification of Microsoft  Kinect version 1 & 2 will be discussed, in the next part, the utilization of Kinect in monitoring and detecting distresses in pavement management will be provide.
The Kinect was first designed for Microsoft  Xbox 360  in November 2010 for gaming purposes that cost 100 US$ which is due to mass production of the Kinect. The Kinect V1 contains an infrared laser emitter, an infrared camera (infrared CMOS sensor), a visual RGB camera, motorized tilt (a three-axis accelerometer) and the multi array microphone. The Kinect V1 best operational range is 0.5 to 5 meter. The resolution of depth image is varied between less than 1 mm (at 0.5 m) up to 75 mm (at 5 m)(Mankoff & Russo 2013; P. ). Khoshelham and Elberink in 2012 showed that by increasing the distance between Kinect and the surface, the random error of depth measurement will increase quadratically from a few millimeters to 4 centimeter (in the 5 meter distance) (Khoshelham & Elberink 2012). The tool is so cheaper for collecting RGB-D data rather than similar tools like Laser sensors which cost up to several thousand dollars (Yang & Diez-Roux 2012;Butkiewicz 2014). It can record up to 30 frames per second which equals to saving more than 540 million pixels per minute (30Hz frequency). The Kinect V1 measured the depth by spreading a known pattern of infrared dots at 830 nm inside the range of its field of view(57 degrees Horizontal, 43 degrees Vertical,  27 degrees physical tilt range) from infrared laser emitter into the surface and capturing(recording) them by the infrared CMOS sensor (Mankoff & Russo 2013;Butkiewicz 2014). The temperature of laser is constant during the projection by a Peltier element but it's useful to mention that thermal changes would cause an inaccuracy in the captured data of Kinect (Mankoff & Russo 2013;Butkiewicz 2014). The infrared camera can't detect the IR pattern in returning phase in the sunlight because "the sun saturates the sensor" (Mankoff & Russo 2013), so the applicability of Kinect is limited to the environmental without direct sunlight.
The other camera that built in Kinect is Red, Green, Blue (RGB) camera. The infrared camera records the data in the internal sensor by 1280*1024 pixels' dimensions (Smisek et al. 2013) but the data is stored in the computer as a 640*480, 32bit color image at 30 frames per sec. the depth data dimensions in computer is 320*240, 16-bit color image at 30 frames per sec. Audio data saved in 16 bit audio at 16 kHz. The motorized tilt, control the Kinect orientation has a range of 2g at 200Hz. The Kinect power is 12 V at 1 A.

Figure 4 Kinect V1
There are two ways for creating depth images in RGB-D cameras, the first one is structure infrared which was explained a bit in the Kinect V1 and the second is time of flight sensing which is utilize in Kinect V2 which is released for Xbox One (Peter . So the Kinect calculate the depth in a different way with the human's eye's system which is stereo triangulation. In general, the structure infrared method is projecting a set of pattern into the object and record the reflecting pattern with an image sensor. The projector can be classified into Lasers, LEDs or lamps and inferred radiation. The time of flight (TOF) system will measure the shifted phase of a reflected signal (Link & Baraba 2014). This fast clock signal, flash the area intermittently with an array of three laser diodes which emit through diffusers with short pulse of infrared light. The distance is determined by the reflecting pulse into the camera (Demerjian 2013a;Demerjian 2013b;Lau 2013;Butkiewicz 2014 There are some mathematically based techniques for calibrating and matching the RGB-D data which can be found in the literature. (Xu et al. 2011;Tong et al. 2012;Herrera et al. 2012;Smisek et al. 2013). El-laithy et al. shows that integration of the Kinect sensors with other sensors like IMU is applicable and improve the indoor mapping system (El-laithy et al. 2012).
Some other similar tools has been used for scientific purposes, Hut et al in 2010 showed that Nintendo  Wii is an applicable tools for hydraulic studies (Hut et al. 2010). Asus Xtion and the PrimeSense Capri are two similar tools with the Kinect which are the product of PrimeSense Company. After releasing the Kinect V2, other companies like pmdtec, Infineon, melexis and softkinetic shows their time of flight sensors too.

Kinect V2
The Kinect V2 have RGB and infrared camera just the same as Kinect V1. The resolution of Kinect's depth images which is 512*424 at 30 frames per sec with the field of view of 70.6 degrees Horizontal and 60 degrees Vertical, is much higher than the previous version. The resolution for color camera which is 1920*1080 is also higher than the previous version of Kinect, but the accuracy of depth which is taken by Kinect V2 is found to be the same as the first version (the standard deviation of depth is less than 3.5 mm at 4.5 meters' distance) (Khoshelham & Elberink 2012). The color camera's field of view is a bit more than a depth image (84.1 degrees in Horizontal and 53.8 degrees in Vertical) (Butkiewicz 2014). The Kinect V2 has a features that can map the pixels from color image to pixels from depth image. Butkiewics accomplish some experiment on the accuracy of Kinect V2 and here are some of his conclusions: the speed of light is different in water and the air, so if we want to capture the depth data of an abject which is sink through the water, the measured data is not the same as dry object (Butkiewicz 2014).

Limitation of Kinect
The output data of Kinect sensors is similar to LIDAR sensors (Allouis et al. 2010). Mankoff and Russo in 2012 mentioned "a special footprint of 0.7 mm2 at 0.5m, 1.4 mm2 at 1m and 7 mm-2 at 5m, LIDAR accuracy is 0.7 mm spot spacing. The resolution for the depth is divided by 3" (Mankoff & Russo 2013). There is more limitation in the Kinects applications. For example, in the case that the surface is larger that the Kinect's field of view, for capturing the best images two policy is recommend. Lindner showed that Kinect fusion algorithm is applicable in the situation (Lindner et al. 2008) and Tong said that the pictures could have some overlap for easier further processing (Tong et al. 2012). The field of view of Kinect is very different with the common laser scanners field of view (near to 180 degrees) (P. . El-laithy et al. showed that the Kinect can't capture the data from glass or transparent plastic so for finding the depth of these surfaces, ultrasonic sensor should be used (El-laithy et al. 2012).

Utilize Kinect in Pavement Management
The specifications of the Kinect have been discussed. This technology has been recently studied by few researchers in different fields of studies such as project management, coastal mapping, biomedical engineering, 3-D mapping etc. (Khoshelham & Elberink 2012;Xu et al. 2011;Butkiewicz 2014). In the field of pavement management, has not paid enough attention to use of Kinect in monitoring and detecting distresses. It can capture both RGB images and depth which are of significant use in measuring cracks and potholes. This sensor is able to take image of surfaces with adequate resolution to detect cracks along with measurement of distance between sensor and obstacles in front of it which results in depth of defects. Moazzam et al. in 2013 develop an algorithm for calculation the volume of a pothole by Kinect V1. He use the Kinect sensor data which was held at 0.8 meter above the ground (which can be either asphalt or concrete), in the Matlab software for post processing. He also proposes a methodology for categorizing the pothole. The area of pothole is calculated by data from depth camera. The area is calculated by multiplying binary area at the depth level and the area of a pixel at the depth. At last, the approximate volume of the pothole was proposed using trapezoidal rule. The algorithm has 15 % error with respect to actual data (Moazzam et al. 2013). Deon Joubert et al. in 2011 present a cost effective system that can be mount into the vehicle for detection and analyzing of a pothole at maximum speed of 60Km per sec. This study is only in early stages which can't be so persuasive for researcher to use Microsoft kinect at high speed data capturing. The speed is highly related to the number of saved point cloud. At this speed there will be only 2 point clouds available per second which compelled the researcher to use high speed camera. The Joubert's system will have a GPS tools for detecting the pothole's location in the map (Joubert et al. n.d.). If the results are reachable in the near future, there will be a revolution in the automated data collection vehicle. Jahanshahi et al. in 2013, proposed an algorithm for detecting of pothole, crack and patching. He place the Kinect at 77.8 cm far from the ground which lead to that the size of each depth's pixel become 1.38 mm. The algorithm will be discussed. The depth's image of Kinect camera always has some noises. Elimination of the noises needed an 8*8 median filter. After this, a plane is fitted to the depth image's points which are not further than a threshold from the IR camera. The threshold is defined as the distance between the IR camera and the ground  2 standard deviation of the depth data captured in the laboratory situation. In order of finding the fitted plane, the RANSAC algorithm is used. The algorithm is applicable in the case of fitting a plane successfully in the defect-free region, if the depth image has at least 1% of defect-free region. If the camera has been calibrated, the algorithm can detect more accurately. In the next step, the depth data is subtracted from the fitted plane. The Otsu's method is used on the normalized depth's value histogram to discriminate between the defection area and the fitted plane. He evaluates his algorithm and reach that the crack, pothole and patching will be obtained by 78%, 92% and 80% accuracy, respectively. The algorithm can detect the cracks with minimum width of 15 mm and the minimum depth of 12 mm, because the IR pattern can penetrate into the crack (Jahanshahi et al. 2012).

CONCLUSION
Detecting the surface defects in pavement management at early stages is highly important using effective and efficient tools. Nowadays, different available technologies are used to detect defects automatically. The main issue is the significant amount of money as a capital investment needed to purchase these tools commonly mounted on vehicles and provide services to the road networks. So, the new technology, called Kinect is a cost effective tool to detect and post process defectsion pavement management which is precise and accurate. There are a few researchers who have used Kinect for monitoring the pavement surface. The Kinect V1 contains an infrared laser, an infrared camera and a RGB camera which operate at best in range between 0.5 to 5 meter. The resolution of depth image is varied between less than 1 mm (at 0.5 m) up to 75 mm (at 5 m). However, the resolution of depth image of Kinect V2 is 512*424 at 30 frames per sec with the field of view of 70.6 degrees Horizontal and 60 degrees Vertical, which is much higher than the previous version. The resolution of RGB camera in V2 version is 1920*1080. Finally, the Kinect V2 is a new sensor which has higher resolution than the previous version and can detect the surface defects with higher accuracy.