INTEGRATION OF PDR AND IMAGE-BASED POSITIONING AIDED BY ARTIFICIAL NEURAL NETWORKS IN INDOOR ENVIRONMENT

Location based service (LBS) is a popular issue in recent years, which can be applied widely. The most common one is providing the local information and the guide of the Point of Interesting (POI) to users, which means positioning is the necessary technique to put LBS into practice. In an outdoor scenario, the user’s position can be obtained relying on the Global Navigation Satellite System (GNSS), however, the signal of GNSS might be blocked in a building. So, many indoor positioning techniques are developed in the decades, which have the pros and cons respectively. This paper proposes an indoor positioning technique by integrating Pedestrian Dead Reckoning (PDR) with the image-based positioning method, which can decrease the cost significantly because it only needs a camera built-in the smartphone. In the first experiment, we verify the accuracy of positioning by the proposed method, that the mean error in the horizontal direction is about 0.25 meters. In the following experiment, comparing with the misclosure of PDR only and PDR integrated with the proposed method, it can decrease from 8.53% to 1.44%. The improvement is about 83%, therefore, this method is suitable for applying to indoor navigation.


INTRODUCTION
With the development of Internet of Thing (IoT), the way people interacting with each other is changed. Associated Location Based System (LBS) is highly concern too because most of the services executing IoT are location-and context-aware, which can supply the corresponding information to, meaning positioning technique is necessary (Meng et al., 2018). Relying on Global Navigation Satellite System (GNSS) to positioning is an adequate solution in an open area, conversely, the signal of satellite might be blocked in an indoor environment. Furthermore, the demand for indoor positioning has still been emphasized continuously since the time people staying in indoor is over 90% on average (Velux, 2018). In decades, many indoor positioning techniques evolve such as RFID, WIFI, or Bluetooth, which are based on wireless communication systems. In fact, RFID and WIFI were popular in the past. Even though RFID has wider coverage, the additional equipment is required which makes this way problematic on using their own devices (Wang et al., 2016). WIFI, which is one of the standards for Wireless Local Area Network (WLAN) that commonly works on 2.4 and 5 GHz Industrial Scientific Medical (ISM) radio band (Chruszczyk et al., 2016). WIFI infrastructures have been deployed in most indoor environments, which implies the low cost indeed, but, the relatively lower accuracy is caused by the impact of the environment (Yang & Shao, 2015). As for Bluetooth, low cost and low energy-consuming are the greatest strength comparing with other kinds of wireless communication techniques. But, considering the coverage area, higher accuracy indicated more devices and more money (Chawathe, 2009). An image-based positioning is another rising technique as a consequence, which only requires an off-the-shelf smartphone with a built-in camera. Generally speaking, this method can be separated into two categories, feature points tracking and marker recognition (Davison, 2003). The prior one needs more computational resources in order to do the image matching. In order to conquer those shortcomings, marker recognition can speed up the processing time by detecting certain points. Kim & Petriu (2010) perform a method to conduct Artificial Neural Networks (ANN) to aid image recognition. Another common kind of indoor positioning technology is Pedestrian Dead Reckoning (PDR), which can do positioning continuously. The basic concept of PDR is that utilizing the Inertial Measurement Unit (IMU) to detect the step length and azimuth, and then the user's position can be calculated constantly. However, the error of IMU might be accumulated with time. As a result, this research proposes integrating PDR with image-based positioning methods together to do indoor positioning. The accuracy of the x-axis and y-axis by the proposed image-based method is about 0.13 meters and 0.19 meters respectively. Comparing the result of adopting PDR only with PDR aided with image-based method, the misclosure can decrease form 8.53% to 1.44%. The improvement is about 83%, which represents that utilizing the proposed method can improve the result of continuously positioning obviously.

METHODOLOGY
This section elaborates on the methods utilized in this research, which can be separated into four parts, image recognition, distance estimated, trilateration, and PDR. Figure 1 illustrates the whole process of positioning. Every part will be explained in detail in the following.

Image recognition
The methods of image-based positioning can be categorized into two classes, feature points, and marker recognition, depending on what's the target. As for feature points, the famous algorithms include Scale Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF), and so on, which are applied widely. However, this way needs numerous computational resources to do image matching. Moreover, there might be no feature points in a pure environment, which means this way is invalid. Considering the environmental complexity and the platform implemented, this research adopts the self-designed marker to do positioning. Figure 2 shows the self-designed marker, which size is 18 × 18 cm.

Figure 2. The self-designed marker
In order to avoid the marker cannot detect completely, the border of the marker is thickened. The patterns within the marker represent the attributes, which recorded in the database constructed in advance. The right half of Figure 1 is the process of image processing, which is proposed by Hung et al. (2019). At the beginning, the RGB image is converted to grayscale one by equation (1). (1) Where R, G, and B represent the pixel of three channels respectively. Then, the grayscale is further binarized by fixed threshold, moreover the value of threshold is according to the rule of thumb. Next step is about edge detection by one of the common algorithm, Sobel, which formula show in equation 2 and 3.
Where and is the two-dimensional convolution edge extractions respectively. And G is the gradient magnitude. So as to fulfill the area, which might be the candidate of the marker, the morphology is the following steps. Here, we assume the biggest white area in the image is the marker, so, we retain the biggest one and other white areas change to black and then go to the next step, vertices detection. The concept of 45-degree detection is shown as Figure 3. Because of the deformation of the marker, an affine transformation is a must, which can convert the marker to the known original size. After decoding, if the marker is matched correctly, the corresponding attribute of the marker will be obtained.

Distance estimation
Traditionally, the rough distance between the projection center and the marker can be estimated by the theory of similar triangles. Werner et al. (2010) employ the idea to estimate the distance using the known length in environment. The method of distance estimation is shown as below: Figure4. The concept of distance estimation According to the theory, when the focal length of the camera (f), detected and the real size of the marker (W and w) are known, the distance between the marker and the camera can be calculated by the equation below: Nevertheless, the difficulty of this method is that if the image space is not parallel to the object space, the accuracy might be bad. That is if there is an angle θ between two space, shown in figure 5, the true distance is the hypotenuse. As a consequence, this research applies Cascade Correlation neural Networks (CCNs) to estimate the distance.

Figure5. The error of the distance estimation method
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2020, 2020 XXIV ISPRS Congress (2020 edition) In fact, Artificial Neural Networks (ANN) are computing systems, which simulate the biological neural networks that constitute animal brains. Different architectures are applied to different fields, such as speech recognition, video image recognition, and assistance in navigation for a long time. This research exploits CCN to estimate the distance between the marker and the camera by the deformation of the marker in the image. The main advantages of CNN is that he number of neurons and the hidden layers can be decided automatically by the unique algorithm. That is user might not waste time to find the most suitable relationship of topology. Figure 6 shows the concept of CNN. The crosses represent the training-completed neurons, whose weight wouldn't be adjusted anymore. Conversely, the circle means the neuron still need to be trained. As if the network can't satisfy the accuracy, CCN will generate new neurons with stochastic weights, until it meets the demand. Figure 6. The scheme of the CCN

Trilateration
As we know, GNSS is widely applied to positioning in an outdoor scenario, that the algorithm is trilateration. Actually, trilateration is a classical positioning techniques that can estimate the target's position by measuring the distances (Boukerche et al., 2007).The concept of the algorithm is that the positions of the satellites are known, in contrast, the positions of the receivers are unknown. As the distances between the unknown point and known points can be determined, the position of the unknown point can be calculated continuously. The idea is shown in Figure 7.

Figure7. The concept of the trilateration
This paper uses a trilateration algorithm to calculate the user's position. The unknown point is the position of the camera; oppositely, the coordinates of the markers are known. As long as the distance between the marker and the camera can be evaluated by CCN, the position of the user will be calculated by the trilateration algorithm.

PDR
PDR is another common technique that can successively acquire the user's position based on IMU. The concept is similar to traversing, which represents in Figure 8.

Figure 8. Calculation of PDR
The accelerometer is used to detect the step. As the value of the accelerometer is higher than the threshold, furthermore, the interval between two wave crests is longer than another threshold, a step is confirmed. Equation (5) is the model that Chen et al. (2011) proposed to evaluate the step-length according to the height of the user and step frequency. Where a, b, and c are the personal parameters, H is the height and SF is the step frequency. In addition to the accelerometer, gyro and magnetometer are utilized to calculate the heading, which a Kalman Filter (KF) is adopted to fuse. As soon as the step-length and the heading obtained, the position can be calculated by the equation (6).
Where is the step length based on the model, represents the heading in epoch k, and +1 , +1 are the position in the k+1 epoch.

Experimental setup
The experiment is carried out by Xiaomi Mi8 and in table 1, the specification of the smartphone is described. The scenario is in the underground parking garage below the building of NCKU library. The environment is shown in Figure 9.
Xiaomi Mi8 Calibrated focal length (mm) 7.21 Origin offset (m) (0.042,0.075) Focal length (mm) 4 Pixel size (mm) 0.0014 Image size (pixel) 4032 × 3024 Table 1. The specification of smartphone The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2020, 2020 XXIV ISPRS Congress (2020 edition) Figure 9. The environment of the parking garage scenario The experiment separates into two parts, first, we will verify the accuracy of the proposed image-based positioning method is adequate or not. And in the next part, the experiment of integrated PDR with the proposed method will be executed. In order to construct the CCN model to estimate the distance, collecting the training data is necessary. The marker is pasted on the wall, then the pictures within the markers are taken in a half-ring every meter class with a maximum length of 10 meters. At the same time, EDM is used to measure the distance as a reference. The schematic diagram is shown in Figure 10, where the red triangles represent the position of taking a picture for constructing the CCN model. Figure 10. The schematic diagram of collecting data 80% of all data are regarded as the training data, and others are for testing. The accuracy of testing data is about 0.23 m, which is suitable for applying to the testing stage.

Positioning result
For the first experiment, we select 10 points, which measure by a total station in advance as the reference. Then, taking four pictures including different markers in every point, and measuring the distance by EDM as the reference of distance. Based on the image recognition algorithm proposed by Hung et al. (2019) adding with the self-designed marker, the recognition rate can achieve to 95% even though the environment is relatively complicated. In addition, the mean processing time of every image is about 12 seconds. As soon as the marker can be recognized correctly, the coordinate and the related attributes of the marker can be acquired from the database. Then, utilizing the model, constructed in previous, to estimate the distance.
Comparing with the references measured by EDM EDM, the accuracy is as Figure 11. Figure 11. The error of estimated distance Sorting based on distances, the horizontal axis means the distances, and the vertical axis is the error. From Figure 11, we can find that the error trend of the distance estimated by the CCN model becomes larger when the distance is longer. Because when the distance increases, the marker might be more blur. The mean error of all 40 images is about 0.16 meters. Afterward, trilateration algorithm is implemented to calculate the position, Figure 12 shows the result and the errors of every point are represented in Table 2.  The red and blue triangles are the positions measured by the total station and calculated by the trilateration algorithm respectively.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2020, 2020 XXIV ISPRS Congress (2020 edition) The stars mean the positions of the markers. From Table 2, the mean error of all points in horizontal direction is about 0.25 meters, which is adequate for the following application.

The result of PDR integrated with proposed method
In the end, we integrate PDR with the proposed method together. Figure 13 shows the result. Figure 13. The result of PDR integrated with image-based positioning method The green one represents the trajectory only using PDR, which goes for two circles. The triangle at bottom left is the start point, and other three are the position updated by image-based method. Furthermore, when the user passes through the start point, the heading will be revised as the initial value of the magnetometer. The cyan one is the first circle integrated PDR and the proposed method, and the blue one is the second circle. The total length of the trajectory is about 120 meters. The misclosure is the accuracy index, which is calculated as below: According to the equation (7), the smaller value is better.The misclosure of the trajectory only using PDR is 8.51%; after integrating two methods, the misclosure decrease to 1.44%. The improvement is about 83%. Finally, the position error of the endpoint are 10.03 and 1.69 meters respectively.

CONCLUSION
This research explores the effectiveness of the image-based positioning method and integrated it with PDR. The result shows that even in a complicated environment, the marker still can be detected successfully, moreover the positioning error by trilateration algorithm is about 0.25 meters. Last but not the least, the improvement of the way integrated PDR with the imagebased positioning method is up to 83%. The positioning error of the endpoint decrease to lower than 2 meters, which is sufficient for indoor positioning.