USING AN LOW-COST STEREO CAMERA FOR AUTONOMOUS NAVIGATION OF MOBILE ROBOT

The paper considers the possibility of using low-cost stereo cameras for autonomous robot navigation. An low-cost stereo camera with a focal length of 5 mm and a photo base of 6 cm was chosen for the research. Experimental studies have shown that the accuracy of determining the coordinates of object points from a pair of images obtained by such a stereo camera is sufficient for organizing autonomous navigation of the robot. In order to improve the reliability and accuracy of determining the trajectory of the robot, this paper proposes to use two stereo cameras. One is directed forward by the robot's movement, and the other is directed at the nadir. Thus, the trajectory is determined twice, independently of each other. Moreover, each case has its own algorithm for finding the homologue points. In the first case, a sparse point cloud is constructed for each stereo pair based on the selection of interesting points and their identification based on the comparison of descriptors. In addition, blunder detection of points identification are realized based on the analysis of the values of the relative orientation equations using the fundamental matrix. In the second case, when the stereo camera is pointed at the nadir, the usual method of correlation is used in the nodes of the grid specified at one image. Experimental studies have shown sufficient efficiency of autonomous navigation of mobile robot based on the use of two stereo cameras.


INTRODUCTION
At the photogrammetry department of MIIGAiK is developing an "Automated information generation system" that is designed to perform aerial photography of an object using an unmanned aerial vehicle and obtain documents about the area (orthophoto,DTM, etc.) in automatic mode. A mobile robot is an integral part of this system ( Fig. 1) designed to automatically change and charge the battery of the copter. The system for changing and charging the battery operates as follows. There is a battery charging point (point A) and a battery change point (point B), it is also the point of takeoff and landing of the copter and they are marked on the ground (Fig.2). Distance between these points can be from 10 meters to 1 kilometer. The mobile robot must travel this distance, change the battery, launch the copter for the next aerial survey, and return to point B to charge the battery. All this is carry out automatically. Approximately the coordinates of points A and B are known from the navigation GPS. However, the mobile robot must approach points A and B with an accuracy of a few centimeters and stop in front of the point at a distance of approximately 50 cm. In this case, when the robot moves, it may encounter obstacles that should be avoided.
To solve this problem, it is necessary first of all to develop algorithms for autonomous tracking of the robot's trajectory (autonomous navigation), based on low-cost cameras. Today, there are some solutions based on stereo cameras on the market, for example, the Artisense system (Artisense Corporation, 2019), ZED stereo camera (StereoLabs, 2020) or the Parrot system (Parrot 2020). However, these systems are designed mainly for drones, cars, etc., where the distance from the camera to the object is calculated in meters. In our case, the distances can vary from centimeters to meters. In addition, these systems are quite expensive. So we decided to use low-cost stereo camera, that costs about $ 70. Figure 3 shows the stereo camera that was used to implement autonomous robot navigation (Chibunichev A. G., et al, 2019). Table 1 shows its characteristics. Figure. 3. Stereo camera with a photo base of 6 cm and a focal length of 5 mm Table 1 Main parameters of the camera Table 2 evaluating the accuracy of determining the coordinates of object points Table 2 shows the results of evaluating the accuracy of determining the coordinates of object points from a pair of images, taken by this camera after its calibration. This table shows that this camera can be used for autonomous robot navigation in terms of accuracy. The navigation itself is carried out using two stereo cameras, one pointing forward in the direction of travel, and the second pointing down. The first camera is used to track obstacles and orient the robot relative to points A and B. Both cameras are used for navigation

AUTONOMOUS NAVIGATION
Currently, quite a lot of algorithms for autonomous navigation of robotic systems based on the use of single or stereo cameras have been developed. The problem of determining the orientation and position of images in a spatial coordinate system is a standard photogrammetric problem (Wang et al., 2019), which is often solved using the method of "structure from motion" (SfM), "simultaneous localization and mapping" (SLAM), or "visual odometry" (VO). SfM allows you to simultaneously build a three-dimensional model of an object and determine the camera positions from multiple images of the object. Usually, the final position of cameras and object models is obtained as a result of bundle adjustment. This process takes quite a long time and grows with the increasing number of images (Frahm et al., 2010). And vice versa, visual odometry (VO) is mainly aimed at obtaining a three-dimensional trajectory of the camera's movement in real time, analyzing images sequentially. Phototriangulation can be used for local trajectory correction when mapping the surrounding area and locating the camera on this plan, as implemented in SLAM (Scaramuzza et al., 2011). At the same time, this method aims to implement everything in real time. The SLAM method can be implemented on the basis of a single camera (Davison et al., 2007;  The accuracy of determining the trajectory is low and largely depends on many factors, primarily the focal length of the camera, field of view and the ratio of the basis of photography and the distance to the object (Paz et al., 2008;Strasdat et al., 2011;Mur-Artal et al., 2015). The accuracy of determining the trajectory by a stereo camera is naturally higher than that by a single camera. To improve positioning accuracy using a stereo camera, it is proposed to increase the angle between the optical axes of the cameras (Shtain, et al. 2019).
In fact, all these methods for determining the robot's trajectory (VO, SLAM, SfM) are based on the same principles. First, the corresponding points on neighboring images are found in one of the known ways, and then direct intersection, resection and/or aerial triangulation are solved to find the spatial position of the images.
The main stage that determines the accuracy of the problem solution is to find the homologue points. Currently, a lot of algorithms for solving this problem have been developed, that are invariant to scale changes in images, their orientation and illumination (Lowe, 2004;Rublee et al., 2011;Leutenegger et al., 2011). However, research shows that these methods do not always give a good result, in the sense of identifying the homologue points, and as a result, the trajectory of the robot is determined with errors.
In order to improve the reliability and accuracy of determining the robot trajectory, this paper proposes to use two stereo cameras. One is directed forward by the robot's movement, and the other is directed at the nadir. Thus, the trajectory is determined twice, independently of each other. Moreover, each case has its own algorithm for finding the homologue points. Figure 4 shows a schematic diagram for solving the problem of determining the robot's trajectory using two stereo cameras (forward, in nadir). The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B1-2020, 2020 XXIV ISPRS Congress (2020 edition) Figure. 4. Schematic diagram of solving the problem of determining the robot trajectory using two stereo cameras.
Finding the homologue points on a pair of images is performed by different methods, depending on the direction of imaging. If the stereo camera is directed forward in the direction of the robot's movement, then the homologue points are found by selecting interesting points on each image, using the Harris operator, calculating the descriptor for each point using the SIFT algorithm, and finally using the RANSAC algorithm to find the homologue points. For a stereo camera aimed at the nadir, first a regular grid is set on one of the images, and then the corresponding points on the other image are found in the nodes of this one using the correlation method. The use of the correlation method in this case is because there are usually few clear contours (the robot moves on asphalt, ground, etc.).
Regardless of the method used to find the homologue points, there is always the possibility of identifications with an error. Therefore, the next stage involves the blunder detection of identifications. To do this, it is enough to make a relative orientation of a pair of images and reject the points based on the value of the residual transverse parallax. Traditional relative orientation takes quite a long time, since there is an iterative process. It is much faster to make relative orientation through the fundamental matrix. The original equation is known to be written as follows: Here r1, r2 are vectors that determine the position of the corresponding points on the first and second images in the image coordinate systems; F is the fundamental matrix. Equations (2) are linear with respect to unknown elements of the fundamental matrix. To find the elements of the fundamental matrix, it is sufficient to have at least 8 homologue points. However, it is advisable to find the fundamental matrix using the RANSAC method to avoid finding an erroneous matrix. Having found the fundamental matrix, you can detect blunder of identifications by checking each pair of points for compliance with condition (2). As you know, in the right part of expressions (1) and (2) is the volume of the figure built on the vectors r1, r2 and the photo base.
To say in advance what amount should not exceed this volume for a good identification is quite difficult. Therefore, in the paper (Khrush R.M, et al. 2020), it is proposed to calculate the yparallax in the model space and use it as a criterion for blunder detection of identifications. However, this requires additional calculations, which is not always acceptable when calculating the robot's trajectory in real time. Therefore, it is proposed in equations (2) to use homogeneous coordinates of the image points and normalize the fundamental matrix to f33, then (2) will be written as: In this case, the right side of expressions (3) will contain a value comparable to the residual y-parallax q, which can be used to detect blunder of identifications. To do this, we calculate the root mean square error of the residual y-parallaxes: where n is the number of homologue points in the pair of images.
Calculating the position of the cameras (the robot's trajectory) is performed by solving space intersection by the previous stereo pair for corresponding points and resection for images of the subsequent stereo pair. To reduce accumulation errors, bundle adjustment is performed for every 5 stereo pairs, and the entire block is adjusted at the end.
When execute resection and bundle adjustment is also performed blunder detection of identifications. For this purpose, known robust method is used, in which each equation of corrections from the collinearity equations is multiplied by the weight, which is calculated using the following formula: where vi is the discrepancy in the i correction equation;  is standard error calculated by vi ; N is the iteration number.
This way the robot's trajectory is obtained. Moreover, the trajectory is obtained twice for each of the stereo cameras. During imaging, the time when the camera shutter is triggered is recorded. This allows us to compare the two paths, other word to get ,, X Y Z    between the cameras looking forward and into the nadir. These values must be constant and equal to the values determined as a result of system calibration. If ,, X Y Z    differ from the calibrated values by more than the specified value, then  for stereo pairs from different cameras are analyzed. The final trajectory is the one that  is smaller.
A sparse point cloud obtained from a forward-facing stereo camera describes the surrounding space at inflection points (obstacles). This information is also used to detect obstacles. If the distance to the nearest points in the point cloud is less than the allowed distance, the robot's trajectory changes in order to avoid the obstacle.
When point A or B (figure 2) is in the field of view of the first camera, the algorithm for tracking the distance to the point is enabled. It works in the following way. A point A or B is allocated on each stereo pair, then their relative position on the stereo pair is specified by correlation and the space intersection is solved. Since points A and B (figure 2) are well marked (different colors are used), it is not difficult to find them automatically in the images. If the distance corresponds to the specified distance, the robot stops.

RESULT
The algorithm described above is implemented as software in the "Python" language, using standard image processing functions from the OpenCV library. The software is installed on the StereoPic microcomputer by the basic raspberry Pi CM 3 Lite microprocessor, which controls a robot, implemented by a crawler scheme. To evaluate the accuracy and determine the robot's trajectory, several closed paths and paths between two points with a known distance between them were set. A trajectory with arbitrary rotation angles. The minimum trajectory length is 5 m, the maximum is 50 m. The estimation was based on discrepancies between the coordinates of the start and end points of the trajectory. In addition, the distances between two points measured using a tape measure and a stereo camera were compared when the robot moved in a straight line. The relative position error in all experiments was 0.1% -1% of the trajectory length.
In addition, the accuracy of stopping the robot before points A and B (figure 2) was evaluated. To do this, the robot started moving in the direction of points A and B from a distance of 1 to 10 meters (a total of 15 times). After stopping the robot at a distance of 50 cm from points A and B, this distance was measured using a tape measure. In all cases, the discrepancy did not exceed 2 cm, with an average value of 1.2 cm.

CONCLUSION
The paper considers the possibility of using low-cost stereo cameras for autonomous robot navigation. Experimental studies have shown that the accuracy of determining the coordinates of object points from a pair of images, obtained by such a stereo camera is sufficient for organizing autonomous navigation of the robot. The use of two stereo cameras increases the accuracy and reliability of autonomous robot navigation. However, the results of experimental studies should be considered preliminary. These studies should be continued for various operating conditions of the robot (indoor and outdoor, driving on asphalt, soil, sand, grass, etc.). In addition, the algorithms should be investigated for various implementations of the robot's mechanics (tracks, wheels), its ability to accurately perform specified turns and rectilinear movements.