INDOOR POSITIONING FOR SMART DEVICES BASED ON SENSOR FUSION WITH PARTICLE FILTER: LOCALIZATION AND MAP UPDATING

With every new generation of smart devices, new sensors are introduced, such as depth camera or UWB sensors. Combined with the rapidly growing number of smart mobile devices, indoor positioning systems (IPS) have seen increasing interest due to numerous indoor location-based services (ILBS) and mobile applications at large. Wi-Fi Received Signal Strength (RSS) based fingerprinting positioning (WF) techniques are popularly used in many IPS as the widespread deployment of IEEE 802.11 WLAN (Wi-Fi) networks, as this technique requires no line-of-sight to the access points (APs), and it is easy to extract Wi-Fi signal from 802.11 networks with smart devices. However, WF techniques have problems with fingerprint variance, i.e., fluctuation of the sensed signal, and efficient map updating due to the frequently changing environment. To address these problems, we propose a novel framework of IPS which uses particle filter to fuse WF and state-of-the-art CNN-based visual localization method to better adapt to changing indoor environment. The suggested system was tested with real-world crowdsourced data collected by multiple devices in an office hallway. The experimental results demonstrate that the system can achieve robust localization at a 0.3~1.5 m mean error (ME) accuracy, and map updating with a 79% correction rate.


INTRODUCTION
Location estimation is the essential procedure for several Indoor Location Based Services (ILBS) such as rescue management, patient monitoring in hospitals, and security applications that require a meter-level accuracy. Furthermore, the facing direction estimation is also needed for the navigation applications which guide users from point A to point B no matter indoor or outdoor.
Along with the proliferation of using smartphones, the ILBS solutions specified for the sensors embedded in smartphones have been gaining attention due to the increasingly emerging indoor commercial application market. From a commercial perspective, typical requirements for these applications are userfriendly, which means ease of use, low cost, robustness, high accuracy, easy to deploy, easy of calibration, and universal availability. Since GPS does not work indoors, many alternative localization techniques, based on various smartphone-equipped sensors/signals have been proposed to estimate user location. Among them, Wi-Fi received signal strength (RSS) based methods attracted continuous attention, as the widespread deployment of IEEE 802.11 WLAN (Wi-Fi) networks; the technique does not need line-of-sight to access points (APs), and it is easy to extract Wi-Fi signal from 802.11 networks with smart devices.
Two main categories of signal power (RSS) based positioning techniques are fingerprinting and ranging (path loss) techniques (Atia et al., 2012). Ranging methods (Bernardos et al., 2010) relates RSS and distance between signal receiving device to transmitter with regression-based algorithm. Fingerprinting methods is to determine the location of signal receiving device * Corresponding author by comparing the obtained RSS data with a database which contains measured RSS data at certain calibration points (CPs). Both fingerprinting and ranging technologies have one common issue that they suffer from RSS variance problem, which is caused by measurement noise, including the discrepancy of device types, user direction and environmental changes, such as altering in the layout of indoor environment, and removing, replacing, adding new APs. To continuously adapt to RSS changes for RF-only positioning system, the issue is traditionally treated as either calibration problem (Anagnostopoulos, 2017;Bernardos, et al., 2010;Lim, et al., 2013) or map updating problem (Atia, et al., 2012;Sun, et al., 2008;Wu, et al., 2017;Xu, et al., 2019;Yin, et al., 2005;Yin, et al., 2008) and solved with hyperparameter optimization methods. For example, (Anagnostopoulos, 2017) present an approach to on-line recalibration of the propagation model parameters, in a Bluetooth low energy beacons (BLE)-based RSS ranging positioning methods. (Atia, et al., 2012) and (Xu, et al., 2019) construct RSS radio map for each APs with Gaussian Process Regression (GPR) model and on-line updating/calibrating hyperparameters by using maximizing loss function and particle filter respectively. The common issues for these hyperparameter optimization methods are: (1) training process of these algorithms are usually computationally expensive, (2) there is no universal model for all places of indoor environment or all signal sources (APs) due to the complex structures and dynamic nature of indoor environment, (3) the quantity and quality of crowdsourcing data from a certain user may not be sufficient for updating the entire model, (4) all these algorithms need a continuing data collection or measurement covering the whole site which is not applicable for the instant static point positioning application, (5) these algorithms cannot adapt to large changes caused by removing or adding APs (Atia, et al., 2012;Bergamo et al., 2002;Chen et al., 2002;He et al., 2015;Lim et al., 2010;). A more adaptive scheme is to directly update the signal map through crowdsourcing data with optimized positioning results by using Wi-Fi fingerprinting-IMU sensor fusion algorithm (Chang et al., 2014;Kim et al., 2015;Taniuchi et al., 2015;Wu et al., 2015). The main drawbacks of these motion sensor-based methods are (1) the accuracy of motion sensor suffer from measurement noise, (2) executing the complex initialization process is not easy for unprofessional users.
In this work, we propose a novel system with particle filter (PF)based sensor fusion technology which integrates positioning estimates calculated with Wi-Fi RSS and visual data to simultaneously achieve efficient indoor positioning and radio map updating. The proposed system does not require special infrastructure or extra sensors. Additionally, this system is user friendly by using one-shot measurement, including 10~15 seconds of Wi-Fi RSS data acquisition and one image, as input for finishing the whole localization estimation process. Moreover, the system has the potential to be used in any indoor environment with meter level average positioning accuracy, and efficiently updating radio map with crowdsourced data. The rest of this work is organized as follows. In section 2, we introduce the proposed system including the particle filter and map updating methods. In section 3, we present the implementation of system, including details of building indoor map and test datasets. The test results are presented and discussed in section 4. Finally, the conclusions are summarized in section 5.

PROPOSED SYSTEM
In the previous work (Yang et al., 2020), we presented a system that allows user to gain static 6DoF (map frame) indoor positioning information with meter level average accuracy, by firstly running a Bayes'-based WF algorithm to generate a coarse location estimation (Figure 1). The fingerprinting matching problem in WF system is solved to obtain the posterior distribution by using Bayes' rule, which is described as, where ( | ) = posterior of a possible CP location by given the observation ( | ) = likelihood ( ) = prior probability ( ) = margin Following the location estimation is refined by InLoc 1 (Taira et al., 2018) which integrates deep-learning and photogrammetric technologies aided by the local 3D map. The advantages of this system are: (1) the search space of 3D map used in InLoc is reduced from a large-scale map to a local segment by using the coarse location resulted from WF, and (2) the refined localization estimation from InLoc outperforms the traditional WF in average accuracy. However, this system cannot optimize the localization results when drifts happen in InLoc, and the system cannot perform the map updating function. On the other hand, the Bayesbased sensor fusion method has been demonstrated to be a promising solution for the map updating task (Yang et al., 2019;Xu, et al., 2019).
Therefore, the goal of this work is to make one step further by applying an extra particle filter, which is suitable for no gaussian/nonparametric positioning task (Arulampalam, et al., 1 https://github.com/HajimeTaira/InLoc_demo 2002; Gustafsson, 2010) to combine WF and InLoc for localization and radio map updating ( Figure 2). Since the positioning result from WF is only a coarse estimation used for optimizing the search space of 3D map in InLoc process in most cases, the complex and computationally expensive models of signal propagation, such as path loss, Gaussian Process Regression (GPR) or deep-learning, are not necessary in our work. Thus, the strategy of map updating in our system is not the online calibration of hyperparameters but correctly locating and storing crowdsourcing data in map dataset. Similar map updating strategies were proposed by (Gallagher, et al., 2010;He, et al., 2016;Lim, et al., 2013). However, since all these researches applied only Wi-Fi sensor for both localization and map updating, the systems can easily fall into a vicious cycle in which the noisy positioning results were used for map updating and then the incorrectly updated map was used for the future localization process. In our system, the optimized localization results from particle filter can significantly mitigate this issue. The proposed filter is defined as follows: 1. The top ℎ WF results (equation 1), which are the possible 2D locations , ' and their corresponding posterior ' , are used as particles and weights. Therefore, the particles set has the form as (|()* = . , ' , ()* ' 1, = 1. . , ℎ ()* ' = ' . Note that the − 1 and here means before and after updating, since our system is Markovian, and the measurements do not contain multiple time frames. In the experiment, we tested different number of particles, = 3 and = 5 e.g., the top 3 and top 5 highest posterior probability in WF results. 2. Next, with InLoc horizontal localization result 9, the weight of each particle is then updated with the likelihood function same as the Bootstrap particle filter (Marron, et al., 2007): where = scaling parameter 3. Finally, the localization is estimated by plugging particles and updated weights in the following estimation function: For each users' measurement includes one fingerprint and one query picture, this particle filter runs one iteration since there is only one updating information from photogrammetric result. With estimated localization A via particle filter, the map updating mechanism is realized by using function: where , 56, is the localization estimation which has the maximum posterior probability among WF results. In this work, the threshold was set to =3 m; the same as the length of a cell in the radio map. Note that this parameter may vary with changing cell size. If it needs to be updated, the crowdsourcing Wi-Fi RSS fingerprint from user device will be added into the cell where A located.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2021 XXIV ISPRS Congress (2021 edition)

EXPERIMENT
The data gathering and map creating methods are similar to the previous work (Yang, et al., 2020), in which the 3D map is built by the RGBD SLAM (RTAB-map) with Kinect V1 RGBD camera and LooMo robot, and Wi-Fi fingerprints in radio map are collected by a laptop with a 3 m interval, as a balance between labor cost and accuracy, between every two calibration points. In this work, we implement experiments in another office hallway (55 m*3 m) at the Ohio State University, see Figure 3 and Table  1. Additionally, to comprehensively test the performance of proposed system under the RSS variance challenge, multiple devices are used to generate Wi-Fi fingerprints dataset.
In the 3D mapping process, the changing of illumination in the environment caused significant drift in the RGBD SLAM process which is still significant after post processing with only RGBD data. To further mitigate the drifting issue, we generated simulated Lidar point cloud data based on the depth data from RGBD camera, and then used it in the post processing for optimization, see Figure 4. All these functions are included in RTAB-map library. Finally, the algorithm resulted in a 3D model of hallway with 50 million colored points, 4568 key frames including 6DoF camera poses of RGB images (640*480), and 3D scan data.
The radio map is built with 18 calibration points, where the Wi-Fi RSS was measured within a period from 50 to 60 seconds, and the MAC addresses and IP information of APs are also recorded. Then the fingerprints are generated by PWF methods, the same way is described in (Yang et al., 2018). The RSS quality of all 18 fingerprints in radio map are stable and mainly distributed between -70 ~ -85 dBm, see Figure 5. A VAIO Z Canvas laptop was used for radio map data collection, as well as the coordinates of CPs are manually picked from 3D model rendered from RTAB-map (red marker).    Test datasets are built on 18 points, where 3 laptops (VAIO Z Canvas, ThinkPad X1, HP ProBook) and 2 cameras (Kinect V1, SONY XPERIA X smartphone) are used for collecting Wi-Fi RSS and query images respectively. The configuration of test points is set as in our previous work ( Figure 6) (Yang et al., 2020), which means that black points are randomly set in the hallway and green points are set between calibration points aligned with the middle line of the hallway where the mapping robot LooMo will pass over it. Therefore, the query images recorded by Kinect have 6DoF camera pose obtained from RGBD SLAM as ground-truth, while query images recorded by the smartphone have only 2D location ground-truth. Finally, 7 points with black maker and 11 points with green marker are set in the testing environment. For a thorough comparison in localization performance, in this study we also take query images on green points with the SONY smartphone (without heading ground truth). As a result, a totally 9 groups of test datasets are finally created, see Table 2. In the fingerprint datasets, the RSS variance problem can be seen from two aspects: (1) the RSS values of fingerprints which collected by different devices shown in Figure 7, and (2) the total number of APs in different dataset, see Table 3. The measurement period of collecting Wi-Fi fingerprints for building test dataset is varied from 15 to 25 seconds.
The query images (2,160*2,880) by SONY XPERIA X smartphone are acquired at both green and black test points. In the middle of experiment, there is a renovation project conducted at the testing site. Therefore, some images contain significant noise by dynamic elements, such as moving people and objects, see Figure 8 and Figure 9, which is a common challenge to robustness of photogrammetry-based ILBS in the real-word implementation. The 6DoF camera poses associated query images from Kinect V1 camera for black test points are selected from RTAB-map results in a similar manner as in previous work ( Figure 10). From Figure 10, it is easy to see that the Kinect RGB camera is more sensitive to the change of illumination in the environment. The intrinsic parameters of two cameras used in InLoc are same as in previous work. Due to the limitation of the 3D mapping technique used in this work, we only map the hallway in a short distance from one direction with RGBD SLAM. Correspondingly, the query images are also recorded in a similar heading direction as there is no observation from other directions stored in map dataset for localization. Therefore, the performance of proposed system may vary if tested with different map dataset.

EXPERIMENTAL RESULTS
In this section, based on experimental results, localization and map updating performance are discussed. As mentioned in section 5.3, two particle filter number configurations, 3 and 5, are tested in our experiments. Since the particles are set with the cell center, the changing of particle number has also an impact on the map search space in InLoc. The localization performance of the proposed system on each test point including WF, InLoc and final results are presented in Table 4 (3 particles) and Table 5 (5 particles). The F value in InLoc category means the algorithm failed in location estimation due to a poor initial location by WF which results in an incorrect search space and map data. Correspondingly, the following PF will have also a F result. Therefore, the mean absolute error of each algorithm is calculated The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII- B4-2021XXIV ISPRS Congress (2021 with the available data. Note that all F results happened in the test (particle number = 3) with the dataset which contains the Wi-Fi fingerprints collected by ThinkPad. The reason for this issue is that a high localization error in WF results, caused by the large RSS variance problem at certain locations, such as 11th green point shown in Figure 7, lead to a wrong map searching space for InLoc. To solve this problem, we extend the search space from 3 cells to 5 cells. The following test results show that the system can successfully estimate location with this search space, see Table 5. Therefore, localization results with 5-particle setting were used for the map updating experiment in the next step.
In spite of failed estimations, the PF performance in both tests achieved similar accuracy level in mean absolute error (MAE) which are 0.3~1.3 m and 0.3~1.5 m for 3 and 5 particle number settings, respectively. Notably, take the localization for green test points as an example, system performance with different devices are stable at a submeter level, e.g., 0.6m with h_g_k, 0.7 m with h_g_s, 0.4 m with t_g_k, 0.4 m with t_g_s, 0.3 m with v_g_k and 0.5 m with v_g_s. Such performance of the system can fulfill the accuracy requirement for most of ILBS, which usually require meter-level accuracy (Atia et al., 2012). It also clear that the PF outperformed the classic WF which achieved MAE accuracy in 1.6~3.0 m, and could correct the drifting in InLoc, such as the localization estimation for 5 th black point with the query image taken by the SONY smartphone. On the other hand, the overall performance of PF is slightly worse than only InLoc results due to the low accuracy WF positioning results used, but the final accuracy level is still acceptable for most ILBS applications.
With the localization estimation from 5-particle solution, the radio map updating method is tested, and the statistics of performance are summarized in Table 6. The result shows particle filter-based map updating method achieved a 79% correct rate, clearly outperforming the method we used in previous work which had an accuracy of 64%. As expected, this result demonstrates that the increasing positioning accuracy can significantly improve the performance of the map updating function. On the other hand, those wrong updating happened with PF error > 1 m yet the error budget of false updating is only 1 cell. However, when both WF and PF have a large localization error in the same direction, the updating mechanism encounters malfunction. For example, on the 5 th test point in t_b_s, the radio map needs to be updated but the updating function does not activate due to the gap between WF and PF localization results is not larger than the threshold.

CONCLUSION
A novel particle filter-based sensor fusion indoor positioning system that works with Wi-Fi and camera data from smart devices is presented in this chapter. This approach could offer an instant 2D localization result with a 0.3 ~ 1.5 m MAE accuracy in map frame to users as well as simultaneously update the radio map dataset with crowdsourcing Wi-Fi fingerprint data when it is necessary. The main attributes of the proposed system, which includes a robust solution to challenges, such as RSS variance problem and the presence of dynamic objects in the indoor environment, are accurate static indoor positioning and map updating with crowdsourced data, demonstrated with the data collected in a real-word office hallway.
Though most PF results (84%) are at submeter level accuracy, few localization estimations on certain test points shifted with 2 https://shop.leica-geosystems.com/ meter level error due to the drifting happening in the previous InLoc step. One main reason for this matter is the quality of the map dataset. Since so far there is no suitable benchmark dataset for the proposed system, the solution is to create map datasets with a more accurate and robust SLAM system. Obviously, there are many commercial SLAM systems, such as Leica BLK 360 2 and GeoSLAM ZEB-REVO RT 3 that are available on the market. On the other hand, the performance of SLAM used in this work, RTAB-Map, can be improved by integrating RGBD camera with 2D/3D Lidar in the SLAM system (Labbé et al., 2019). However, building this kind of multi sensor SLAM system face challenges such as accurate sensor-to-sensor calibration and good synchronization which are important to avoid poor registration of the data generated from different sensors. Therefore, to build a better indoor map dataset remains future work.   Table 4. Horizontal localization performance of proposed system with particle number = 3.

Dataset Method
Horizontal localization error on each test point ( Table 5. Horizontal localization performance of proposed system with particle number = 5. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2021 XXIV ISPRS Congress (2021 edition)