DEEP LEARNING FOR CODED TARGET DETECTION

: Coded targets are physical optical markers that can be easily identiﬁed in an image. Their detection is a critical step in the process of camera calibration. A wide range of coded targets was developed to date. The targets differ in their decoding algorithms. The main limitation of the existing methods is low robustness to new backgrounds and illumination conditions. Modern deep learning recognition-based algorithms demonstrate exciting progress in object detection performance in low-light conditions or new environments. This paper is focused on the development of a new deep convolutional network for automatic detection and recognition of the coded targets and sub-pixel estimation of their centers.


INTRODUCTION
Coded targets are physical optical markers that can be easily identified in an image. Their detection is a critical step in the process of camera calibration. Usually, each target has its unique ID that can be decoded from its shape or color. A wide range of coded targets was developed to date. The targets differ in their decoding algorithms.
Most of the algorithms include three steps for target recognition. Firstly, target locations are detected in the image. After that, target IDs are decoded to matching their locations in image and object coordinate systems. Finally, target area is analyzed for estimation of sub-pixel coordinates of the target's center. The main limitation of the existing methods is low robustness to new backgrounds and illumination conditions. Modern deep learning recognition-based algorithms demonstrate exciting progress in object detection performance in low-light conditions or new environments. Still, deep-learning-aided coded target recognition received little scholar attention to date. This paper is focused on the development of a new deep convolutional network for automatic detection and recognition of the coded targets and sub-pixel estimation of their centers.

RELATED WORK
Coded targets is important tool for automation of point identification and stereo matching, especially for photogrammetric applications addressing for weak-textured objects. They allow to solve automatically the correspondence problem in calibration and orientation tasks. Therefore the design of coded targets that and provide the high recognition and measurement performance, attracts attention of photogrammetric and computer vision researchers.
The first studies addressed to the development of targets that can be automatically detected in an image has been fulfilled at the end of XX century (Russo, Knockeart, 1972), (Fraser, 1997), (Knyaz, Sibiryakov, 1998). Since then many different configuration of coded targets and techniques for targets detection and identification has been proposed. The efforts of researches were aimed at designing more robust targets and accurately detecting their image coordinate.
First hand-crafted algorithms used dots, segments or colours to encode target's identification (ID). A coded target system (Shortis et al., 2003) consists of the central circular target and a surrounding square. The detecting algorithm is based on Hough transform, and segment matching is used for automatic recognition and identification of the targets in digital images. The algorithm exploits a pre-detection processes, developed to improve the performance under unfavourable conditions. The improved coded target system (Shortis, Seager, 2014) was widely used in a variety of calibration and measurement applications.
Dot-distributed coded targets (Feng et al., 2010) for digital industrial photogrammetry was composed of 5 template points and 3 code points. Each coded target contained different character strings as a coder, max code capacity of the target being 496. The decoding algorithm was based on affine transform and cross ratio analysis, and allowed to identify the targets automatically and rapidly. The applicability of this type of dot-distribution coded targets for the task of the automatic digital industrial photogrammetry has been proved experimentally.
The color coded target for vision measurements (Yang et al., 2014) uses a pair of concentric circles for the precise location of the target. For avoiding repeated coding and for simplifing the structure of the code a reference position is defined. The edge-based identification and location algorithm exploits the coarse-to-fine strategy for improving the efficiency and the correctness of target identification. The proposed algorithm demonstrated high accuracy and robustness in experiments performed.
Analysis of grey gradient for coded target detection (Ran et al., 2016) is utilized to estimate the central angles of each coded section. The algorithm for accurate ellipse detection extracts ellipse centres at sub-pixel level, and eliminates false ellipses with great best-fit error. Gradient analysis of the coded band is used obtain the central angle of each coded section.
The combined circular dot-distribution coded target were proposed to improve the coding capacity and accuracy of coded target identification in large-scale industrial photogrammetry (Jingui, Liyuan, 2020). These coded targets were used for point matching and image orientation in large-scale photogrammetry applications, and also were applied for the automatic management of a large number of measuring points.
Influence of target parameters on their performance was studied as from the point of view target shape, so from the point of view target design and size. A practical method for selecting a proper size of circular targets (Liu et al., 2019), basing on the theoretical model of perspective imaging. The authors experimentally demonstrated that the measured semi-major axis of the image ellipse corresponding to the circular target agreed well with its theoretical design value. The maximum difference is estimated at the level of 0.5 pixel, that, in fact, is not accepable for the practice of accurate photogrammetric measurements.
To overcome coded low performance of coded targets in degraded images, such as motion blur images, Chinese character coded targets has been proposed (Shi, Zhang, 2020). These targets are of square form with a relatively small circular feature overlaid in the middle of a square Chinese character. To facilitate extraction of the center point of a target, central black circle contains white circular ring, serving as a circular feature for target localization. The Chinese character is the square out region serves for target identification. The Faster Region-based Convolutional Neural Network (Ren et al., 2017) is trained for recognition of the targets in motion blurred images.
The concentric circular coded targets designed for applying in complicated conditions of poor illumination and flat viewing angles (Yan et al., 2021) utilize practical error-compensation model to correct the eccentricity errors. To improve recognition abilities in poor illumination, such as overexposure and underexposure, an adaptive brightness adjustment has been proposed. Also four vertices of the background area of the target added to improve the robustness of the recognition by perspective correction.

METHOD
The coded targets used in the current study has been developed for automatic target detection and identification and sub-pixel image coordinates measurements (Knyaz, 2006). They were successfully applied in various applications, such as motion capture (Knyaz, 2015), pose estimation for unmanned aerial vehicle (Kniaz, 2016), photogrammetric system calibrations (Knyaz, Zheltov, 2017, Knyaz, 2021 etc., providing high robustness of target recognition and high accuracy of target center measurement.

Hand-crafted algorithm for target recognition
Current algorithm for target recognition firstly detect all image regions meeting the given requirements and then approximate the border of each detected region by ellipse and extract target code. The full procedure includes the following steps (Algorithm 1): Bi(x, y) = 0, f (x, y) < ti(x, y) Bi(x, y) = 255, f (x, y) ≥ ti(x, y) where f (x, y) -original image, b(x, y) -binary image, ti(x, y)threshold level.
Constant threshold t(x, y) = t0 applying has significant drawback caused by non-uniform contrast for different parts of the image, and as consequence, possibility of loosing targets during binarization. Therefore a set of the thresholds is applied, with further tracking the detected targets through the set of binary images Bi.
All eight-connected regions R 8 are detected In each binary image Bi, excluding those of them that lay on the border of the image. Then each detected region R 8 is checked to be the structure "white region that is surrounded by black region, that is surrounded by white region, that is surrounded by black region" R w∈b∈w∈b (Figure 2). Figure 2. Coded target structure: Outer border serves for elliptic approximation of the target; Coding strip stores the ID of the target; "Anchor" define the initial position for code reading; Centre can be used for correction of target coordinates.
All detected regions R w∈b∈w∈b are verified to be a region having correct geometric parameters, such as ratio of areas of bounding boxes of targets' elements ( Figure 2).

Elliptic approximation
The approximation of the target by an ellipse solves two tasks: 1. Identifying target region as a projection of a circle; 2. Measuring the position of the target with sub-pixel accuracy.
The border of the target region should be described by the equation of an ellipse: Ax 2 + Bxy + Cy 2 + Dx + Ey + F = 0 To solve the problem of the ellipse parameters estimation by least mean squares adjustment, it is more convenient to write down the equation of an ellipse in the following form: ax 2 + bxy + cy 2 + dx + ey = −1 + (4) Or in matrix notation Equation 4 can be written as: The solution of the Equation 5 is: The parameters of the ellipse, needed for target location measurement, are given by the following expressions: • center of the ellipse: • large semi-axis d l d l = 1 a · cos 2 (β) + b · sinβcos(β) + c · sin 2 (β) (10) • small semi-axis ds ds = 1 c · cos 2 (β) − b · sinβcos(β) + a · sin 2 (β) The ellipse should satisfy following invariants where I1, I1, I3 are: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIV-2/W1-2021 4th Int. Worksh. on "Photogrammetric & computer vision techniques for video surveillance, biometrics and biomedicine", 26-28 April 2021, Moscow, Russia Figure 3. YOLO-Target model. It divides the image an S x S grid for each grid cell predicts B bounding boxes with centers, confidence for those Boxes, and C class probabilities. These predictions are encoded as an S x S x (B * 7 + C) tensor.
The invariants I1, I1, I3 are used for verifying if the detected region is an ellipse.

Target code reading.
For each detected region, that passed the "elliptic" check the coding strip is extracted, and the target ID is retrieved. To extract the coding strip The following coordinate transform: is applied to extract the coding strip. Then parametrical equations of an ellipse can be written as the following: where t is a parameter that varies from 0 to 2π.
The extracted area of the coding strip is then transformed to rectangular one (Figure 4).
If x 1 0 , y 1 0 , β 1 , d 1 l , d 1 s are the parameters of the internal ellipse, and x 2 0 , y 2 0 , β 2 , d 2 l , d 2 s are the parameters of an external ellipse, then eectangular transform of a coding strip K(d, t) using original image f (x, y) and parametrical equations of ellipse (19) is given by: where ξ = d · cos(t); η = d · sin(t) (21) The example of code reading for a target is shown in Figure 4. Coding strip is extracted using the target parameters. Inside the strip (Figure 4) grey-level profile is binarized by Otsu criterion (Otsu, 1979).

YOLO-Target Convolutional Network
Our YOLO-TARGET is inspired by the YOLOv3 model. It solves two tasks. Firstly, it detects the bounding box of the coded target. Secondly, it estimates the position of the target center. The details of our YOLO-TARGET architecture are presented in Table 1 and Figure 3.

Dataset Generation
To train our YOLO-TARGET framework, we collected a new dataset CodeSet. It includes synthetic images of calibration page with 49 coded marks (see figure 1). To increase the variability of the training dataset, we used Blender3D creation suite. It is a free and open-source 3D computer graphics software tool set used for The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIV-2/W1-2021 4th Int. Worksh. on "Photogrammetric & computer vision techniques for video surveillance, biometrics and biomedicine", 26-28 April 2021, Moscow, Russia creating animated films, visual effects, art, 3D printed models, motion graphics, interactive 3D applications. We exported the calibration page texture image to 3D creator and developed special software using Python3 that allows you to randomly position and rotation the calibration page on a frame. The calibration page was shot from different angles. We used different turning HDR textures as background. A part of the dataset was created with low light conditions and with adding "salt and pepper" noise effect. Label images were geometric aligned with color images. Each coded mark had has a unique RGB-code that allowed to mark up the generated data in YOLO format and transferred it to the neural network for training. We synthesized about 1000 images for network training and about 100 images for evaluation results.

Input
Output Figure 5. Examples from the dataset.

EXPERIMENTS
The algorithm was evaluated using the generated dataset. The following section presents details on the learning of the YOLO-TARGET and the perceptual and the quantitative evaluation of our algorithm.

Network Training
Training was carried out on a video card Nvidia Geforce GTX 1080 with 8GB video memory for 4 hours. The result of training a neural network is shown in the figure 1. As you can see from the figure, coded labels are well recognized and identified.

Qualitative Evaluation
We evaluate our model and baseline qualitatively by running inference on challenging images ( Figure 6). Qualitative evaluation proves that our YOLO-Target model successfully detects targets in challenging configurations and low-light conditions.

Quantitative Evaluation
The results of the evaluation are encouraging and demonstrate that our YOLO-Target model outperforms baselines by a large margin. Specifically, we evaluate our model and baselines in terms of target localization intersection over union, in mAP of target ID recognition, and endpoint error of the target center estimation with sub-pixel accuracy. Our model increases the target recognition mAP by 40% compared to the loading baselines. The endpoint error estimation accuracy is comparable or better than the accuracy of baselines.

CONCLUSION
We developed an end-to-end pipeline for localization and recognition of circular coded targets using a deep convolutional network. Moreover, our model is capable of estimating the target's center with sub-pixel accuracy. Our model provides real-time performance for 720p input video on modern GPUs. We made our code and dataset publicly available.