PERFORMANCE ANALYSIS OF SEMANTIC REFRESH INDOOR NAVIGATION FOR SMARTPHONE’S SENSORS USING INS/VINS INTEGRATION SCHEME

: Positioning and Orientation System (POS), which integrates Inertial Navigation System (INS) and Global Navigation Satellite System (GNSS), is widely used to accomplish outdoor navigation missions for land vehicles. However, the positioning accuracy would become worse in GNSS-hostile environments (Chiang et al., 2013), which is quite challenging to accomplish indoor navigation environments. Nevertheless, smartphones are contained many embedded sensors, including GNSS, IMU, camera, which have the potential to be an ideal personal navigation device. In this research, we mainly propose an integrated scheme of INS/VINS/object detection refresh (ODR) for indoor challenging environments. The goal is to achieve indoor navigation for vehicular applications only using smartphones. The algorithm is developed based on the smartphone. By the conventional inertial navigation system, which is integrated with two designed processes to further improve the performance. First is assistance from the visual-inertial navigation system (VINS). The long-term drift caused by the INS could be decreased effectively, and complete the extended Kalman filter (EKF) composition. The second is to apply neural network, YOLO-v3 (Redmon et al.,2018), to detect objects and provide the object's describer information to refresh the proper position. Therefore, the proposed method uses visual estimation and recognition methods to assist the smartphone platform to obtain a more accurate solution. Finally, we use the navigation-grade IMU as the reference system for accuracy verification. The accuracy comparisons of the three integration solutions are analysed reasonably. The position accuracy is reasonable. Compared with the original smartphone INS integration method, the proposed integration scheme improves the accuracy from the horizontal direction by 78.5%.


INTRODUCTION
Nowadays, the navigation estimation technology of land vehicles has been gradually innovated. This technology will integrate different platforms and a variety of sensors, equipped with a global navigation satellite system, inertial navigation system, integrated micro electro mechanical systems (MEMS), software engineering, and other components to assist the structure of navigation technology. Specifically, in GNSSchallenge environments like the concrete jungle of the urban city or indoors, the availability of GNSS would be degraded extremely (Chiang et al., 2013). Consequently, simultaneous localization and mapping (SLAM) using cameras or lidars in INS integration schemes is currently becoming more popular as well (Chu et al., 2012) (Li et al., 2019). The application of multi-sensors such as visual odometry (VO), lidar odometry (LO) enables the redundancy of the classical state estimation methods and improves the robustness of the current algorithms (Liang et al., 2020). Accurate indoor navigation is a coveted objective, which can be achieved through the strategy of multisensor integration. However, in indoor environments where GNSS signals are occluded, a variety of indoor positioning methods have been proposed. But most methods are expensive and can be difficult to implement. For example, Wi-Fi and Bluetooth based positioning and assistance from other sensors (Huang et al., 2017) (Zhuang et al., 2016). However, these solutions are mostly designed based on pedestrian thinking and require the placement of additional equipment such as Bluetooth beacons or WiFi routers. Smartphones are the most prospective personal navigation devices, whether in pedestrian or in-vehicle mode. However, with the low-cost design of smartphones, the measurement quality is unsatisfactory. At present, several studies using smartphones related to navigation have been proposed. For example, the smartphone fusion location method optimized by indoor/outdoor detection is proposed. Using the lightweight sensor, the magnetic sensor and the satellite signal are integrated to recognize the indoor/outdoor status (Zeng et al., 2017). And one completed the evaluation of sensors in modern smartphones (Forster et al., 2012). But the main purpose is vehicle traffic monitoring, not vehicle navigation applications. On the other hand, visual-inertial odometry (VIO) is proposed in the visual to assist the drift of the accumulated error of the IMU through visual estimation (Forster et al., 2012) (Mueggler et al., 2018) (Huai et al., 2018). Among those VIO algorithms, the VINS is an algorithm that integrates camera and IMU data to achieve visual odometry (Qin et al., 2018). So far, there is no single technology that can provide reliable indoor positioning. In particular, several major problems need to be overcome. The first is that indoor navigation solutions need to rely on additional equipment. The second is that most indoor navigation solutions using smartphones are specially developed for pedestrians. Applications need to be developed with vehicle navigation, even with indoor navigation. Therefore, two integration schemes are developed to overcome the above problems, namely INS/VINS with visual-inertial odometry assistance, and INS/VINS/ODR integration scheme with semantic recognition refresh algorithm. The purpose is to focus on indoor vehicle navigation applications using smartphones in this study. There is also no need to mount other additional equipment on the vehicle to achieve navigation.

The Proposed VINS aided INS integration scheme
In this research, this loosely coupled INS/VINS integration scheme is shown in Figure 1. The EKF is integrated into the multi-sensor measurement of smartphones. Particularly, the IMU raw measurements include specific forces and angular velocities in three axes with six degrees of freedom. The IMU error model is based on the mechanization of the INS. The position (r), velocity (v), and attitude (ψ) can be derived from the INS mechanization based on the compensated observations of three axes gyroscopes (g) and accelerometers (a). Among Modern people rely on smartphones to provide sufficient location services for GNSS navigation support in outdoor scenarios. Due to the limitations of smartphone sensors, integrated navigation solutions relied on low-cost inertial sensors within smartphones alone cannot retain sufficient accuracy in GNSS challenging areas or indoor scenarios. In this research, other vehicle motion constraints, such as NHC, ZUPT, and ZIHR, are also adopted. The details of these motion constraints will be explained in the following part. In the applications of indoor vehicle navigation, GNSS signals will be completely blocked and shielded. However, in this case, the conventional pure INS integration is expected to have a long-term error accumulation even after applying those motion constraints. Therefore, the VINS is applied to improve accuracy over the driving distance. The features of most indoor parking lot scenarios can be easily extracted and matched by the VINS algorithm, which not only provides a robust position but also limits drift caused by long-running INS. This is a significant pre-eminence to acquire continuous relative orientations including position and attitude. Finally, VINS solutions are transformed into the position and velocity as the input in the designed integrated scheme.

Joint Calibration
In this part, we adopt an extension to the joint calibration tool Kalibr that allows for determining the extrinsic and intrinsic of multiple IMUs in a single estimator. We further demonstrated that it is feasible to infer the location of individual accelerometer axes to millimeter precision (Rehder et al., 2016). The method mainly calibrates a sensor suite that includes one or multiple IMUs and one or multiple external perception cameras. The goal is to improve the state estimation results of all fused sensors in our experiment smartphone. Using an EKF based framework was proposed to estimate the exterior orientation between the IMU and the camera from a sequence of calibration images recorded by moving the device in front of the target relationship transformation (Kelly et al., 2009). Consequently, we can obtain the configuration of the smartphone with accurate internal and external orientation parameters, which can provide the solution to good initialization of subsequent visual-inertial navigation. However, the smartphone joint calibration is calibrated with reference to a combination of a low-cost IMU and a rolling shutter camera, as shown in Figure 2
On the other hand, with the popularity of inertial and inertial sensors, visual inertial odometry (VIO) has begun to apply it in fields such as autonomous driving. Many algorithms have also been proposed, such as MSCKF (Heo et al., 2018) (Sun et al., 2018), VIO-ORB (Mur-Artal et al., 2017), OKVIS (Leutenegger et al., 2015). In this study, after many literatures and self-test results, we selected VINS as the main algorithm. VINS is a state-of-the-art tightly coupled formulation that not only adds image features to the feature vector but also increases the dimension of the state vector of the entire system to a very high quality, which requires a lot of computational stabilization.
In order to limit the number of optimization variables to use a marginalization strategy, the algorithm is performed using a sliding window to make estimates. First, solve for the IMU estimation and the camera estimation, and then initialize the alignment to estimate the true scale of the camera trajectories for both. In addition, the IMU can accurately predict the pose of the image feature frame and the position of the feature point in the next frame of images. In addition, the matching speed of the feature tracking algorithm and the robustness of the algorithm to fast rotations are improved. The coordinate system must be transformed between the local coordinate system defined in the camera and the navigation coordinate system. Finally, the velocities in the three directions are converted and used for the actual navigation on the navigation coordinate system. Because practical car navigation applications do not discuss closed-loop scenarios common in vision, the closed-loop optimization effect will not be enabled in this study. The VINS sliding window of illustration is following as Figure 3.

Motion Constraints
When only INS operation and sensor cumulative error, the additional aids according to the physical facts of the landvehicle motion are essential to enhance stability, with the known values from specific modes. For example, zero velocity while stopping, and cars usually have near-zero velocities in lateral and vertical directions during drive straight. These known values are derived from commonly used motion laws and provide important updates to control error growth, which can be called motion constraints (Shin et al., 2005). These constraints can reduce the drift problems of conventional navigation systems, especially smartphone low-cost MEMSgrade IMU. With motion constraints are updated, components of the state vector in the Kalman filter are readjusted. Ensuring better estimation of the initial state for the following pose estimation. In this section, the mathematical models and equations related to ZUPT and non-holonomic constraints (NHC) are explained. Zero Velocity Update (ZUPT) is technique that utilizes the static motion of land vehicles to constrain error accumulation (Shin et al., 2005). The ZUPT bounds the error accumulation by assigning the velocity to be zero in any direction when the vehicle is stationary, as shown in Figure 4. In urban scenarios, because of obeying traffic rules to react, it is common to employ and implement these constraints when a vehicle must frequently start and stop. The equation is written as follows:  Non-Holonomic Constraint (NHC) expresses that unless the vehicle is off the ground or slides on the ground, the vehicle's velocity in the plane perpendicular to the forward direction is almost zero. NHC is used in most vehicle navigation and has effectively reduced velocity and position errors in inclined and vertical directions. The NHC-based measurement update shows that the forward direction cannot be obtained by the integrated navigation solution, which affects the position solution over time, and without other sources could lead to errors transiting in the forward direction.

Semantic Refreshed INS/VINS/ODR Integration Scheme
The INS/VINS integration scheme is still subjected to significant error accumulation indoors when the smartphone's inertial sensors are applied, even though both IMU and vision can complement each other's shortcomings. Therefore, this research proposed an idea to use the semantic location information extracted from images to act as a position aid to control the error accumulation of an INS/VINS/ODR integration scheme, as shown in Figure 5. The semantic location features here are mainly to give user refreshes that provide VINS location and velocity in indoor environments. In this way, the georeferenced semantic feature is recognized in the learning algorithm. The descriptor from the pre-built model (green shadow) is refreshed at the moment when the maximum area of the bounding box of recognition occurs. To sum up, using the smartphone to accomplish the pursuit of indoor navigation. Details concerning the proposed scheme are described below.

Object Detection Refresh
Geo-referenced features acquired by the proposed Object Detection Refresh (ODR) algorithm are applied for the position refresh process for VINS. This is an innovative algorithm developed specifically for indoor vehicular parking applications, which can individually provide location information and provide updates on indoor environments shown in Figure 6. When the vehicle moves closed to certain objects in the indoor parking lots such as pillars, the proposed ODR algorithm detects and records relevant descriptions, including timestamp, serial number, image coordinates, bounding box size based on YOLO-v3 (Redmon et al., 2018), and some value of prior information, such as locating target (green point) built with an indoor mapping procedure. The background of this geo-referenced feature includes inertial navigation and photogrammetry components. The location of each image is known, and the recorded images are used for classification and labeling. By appraising a variety of image detection algorithms, the most suitable one is YOLO-v3. This trained and predicted built model and the positions obtained by the mobile mapping system are translated into a pre-built dataset. As a result, the positioning target and its narrative content are obtained. With the above pre-built content, based on the condition that the vehicle must drive in the lane, the method of semantic detection can refresh the position by locating targets. Consequently, as long as the vehicle drives along the lane in that parking lot, location information is obtained whenever an object is detected. In conclusion, this approach can be achieved even without any pre-built location device.

YOLO-v3
The process of establishing a model by correcting the weights of each layer through learning samples is called training. At present, most of the algorithms involve object detection based on artificial neural networks (ANN). In this research, the most common pillars in indoor scenarios are selected primarily as geo-referenced objects, which are easily identified. The current algorithm for object detection contains R-CNN, Fast R-CNN, Faster R-CNN (Ren et al., 2015)  , YOLO (Chen et al., 2021). YOLO has continuously updated and improved algorithms to achieve better performance and improve detection capabilities at different levels. Therefore, the YOLO-v3 (Redmon et al., 2018) of the supervised learning network is adopted for object detection in this study. The smartphone is installed above the vehicle dashboard, and the recorded content handles labeling and description of object details. The results of YOLOv3 object detection are shown and the pillars (yellow bounding box) are well detected in Figure7.

Figure 7.
The results of YOLO-v3 object detection

Maximum Area Algorithm
The algorithm applied to determine the moment when the vehicle is about to drive past the pillar in this research is the proposed maximum area algorithm, as shown in Figure 8. The The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2022 XXIV ISPRS Congress (2022 edition), 6-11 June 2022, Nice, France images recorded by the smartphone camera are pre-processed first, and then the training model pre-trained by YOLO-v3 is used for prediction. During the process, the size and description of the bounding box of each frame will be continuously recorded. After the design time threshold is passed, it considers whether the area size of the bounding box is reduced from the maximum value to near zero. These updated measurements are also used to calculate the primary navigation states, which are the 10th to 15th values in the S state from equation 1. As for the maximum area algorithm, it is designed based on the change of bounding size. The judgment method is composed of the threshold value of the area and duration time of the bounding box in which the image appears, both of which must exceed the threshold value at the same time. After the maximum area algorithm is established, the semantic refreshed INS/VINS/ODR integration scheme can be carried out. As a consequence, the algorithm will then update the position and velocity estimated by real-time VINS to correct the accumulated error. And record the maximum area value (red circle) of each frame, and provide information in combination with a semantic application, as shown in Figure 9.  The antenna used in one of the systems was an Antcom 72GNSSA-XT-1 choke ring for PwrPak7D-E2, and the other system used a NovAtel GPS-703-GGG. In particular, the choke ring is resistant to multipath effects. The smartphone can be thought of as a multi-sensor combined system in this research. Which contains the low-cost MEMS-grade IMU, and rolling shutter camera. And the smartphone is mounted on the holder on the dashboard, as shown in Figure10. The configuration of the experimental reference system is shown in Figure 11.

Performance Analysis of INS Integration Scheme
This is the analysis result of the INS integration solution of the smartphone compared with the reference system. Therein, the blue line means the smartphone pure INS integration result, and the red line means the reference, as shown in Figure 12. The size of this indoor parking lot is about 80x75 meters. And the experimental test indoor travel time is about 300 seconds. This shows that it is not ideal to only use the IMU of a smartphone for navigation estimation applications. Using it for navigation may cause the user to become disoriented. The position errors of the E and N directions are 6.96 meters and 2.87 meters, and the height direction is about 7.53 meters, as shown in Table 1.

Performance Analysis of INS/VINS Integration Scheme
The statistics with the smartphone for INS/VINS integrated filtered solutions are shown in Table 1. The horizontal position of the proposed INS/VINS scheme is shown in Figure 13 as well. The orange line is the reference system trajectory, and the pink line is the INS/VINS integration scheme. The results illustrate that with the addition of VINS assistance, the errors in the horizontal and vertical directions have been greatly improved and smoothed. In the RMSE position error analysis, the horizontal error is reduced by about half compared to the previous one, and the height error was reduced to 0.178 meters.

Performance Analysis of Semantic Refreshed INS/VINS/ODR Integration Scheme
The orange line is the reference system trajectory, and the green line is the proposed semantic refreshed INS/VINS/ODR integration scheme, as shown in Figure 14. When the vehicle passes through the pre-build model and detects objects, that is the described point (red square) to refresh the position in the proposed algorithm. Overall, the position accuracy analysis shows that the trajectory results have been greatly improved. Whether it is E and N direction errors or three-dimensional errors are drastically reduced.  The position error of the final scheme compared to the reference system is plotted in Figure 15. Whenever the vehicle records an image and detects this geo-referenced feature that has been built, the semantics of the object are refreshed. In other words, wherever the location and velocity information are refreshed, the position is going to be moved to the established locating targets (red circle). And continue to calculate the semantic refreshed INS/VINS/ODR integration navigation solution. To sum up, the RMSE of the horizontal position is less than 3 meters which is very accurate when a smartphone and a navigation grade IMU are applied as test and reference systems, respectively. Therefore, it can bring very good benefits to car navigation with smartphones in indoor scenarios. In conclusion, the final proposed fusion model has overcome more of the issues mentioned and achieved. The improvement in each direction is organized in

CONCLUSIONS
This study proposed indoor navigation by smartphone sensors. One is the scheme for the VINS aided INS integration, and another scheme is semantic refreshed INS/VINS/ODR integration. The smartphone that collected the data was used for testing. Moreover, a system comprised of navigation-grade IMU by iMAR iNAV-RQH-10018, and differential GNSS receivers from PwrPak7D-E2 was used to generate reference solutions. The accuracy comparisons of these integration solutions are analyzed. As a result of the INS/VINS integrated solution, the horizontal position error is about 3.893 meters, which improves by 48%. Finally, the result of the INS/VINS/ODR integration scheme, the horizontal position error is about 1.620 meters, which improves by 78%. For future work, different georeferenced objects will be applied as recognition models in the integrated system to add new datasets and tests. And the objective is optimized for the semantic refreshed INS/VINS/ODR integration scheme, and it is expected that smartphone GNSS can also be added to form an integrated scheme for indoor/outdoor seamless positioning. Ultimately, this framework should be integrated with the smartphone environment and run directly on this low-cost platform.