AUTOMATIC ANTHROPOLOGICAL LANDMARKS RECOGNITION AND MEASUREMENTS

: Many anthropological researches require identiﬁcation and measurement of craniometric and cephalometric landmarks which provide valuable information about the shape of a head. This information is necessary for morphometric analysis, face approximation, crania-facial identiﬁcation etc. Traditional techniques use special anthropological tools to perform required measurements, identiﬁcation of landmarks usually being made by an expert-anthropologist. Modern techniques of optical 3D measurements such as photogrammetry, computer tomography, laser 3D scanning provide new possibilities for acquiring accurate 2D and 3D data of high resolution, thus creating new conditions for anthropological data analysis. Traditional anthropological manual point measurements can be substituted by analysis of accurate textured 3D models, which allow to retrieve more information about studied object and easily to share data for independent analysis. The paper presents the deep learning technique for anthropological landmarks identiﬁcation and accurate 3D measurements. Photogrammetric methods and their practical implementation in the automatic system for accurate digital 3D reconstruction of anthropological objects are described.


INTRODUCTION
Object's visual and geometrical characteristics serve as essential data sources in anthropological and paleoanthropological studies. Sophisticated mechanical tools has been specially designed for analysis of object morphology. Such instruments as sliding caliper (Martin type), coordinate caliper (Aichel type), spreading caliper, craniofor (Mollison type) and mandibulometer are usually exploited for obtaining specific craniometric parameters.
The progress in 3D acquisition techniques creates the background for introducing in practice of anthropological research accurate 3D models of anthropological objects, allowing not only to perform accurate measurements in separate points, but to carry out complicated morphological analysis of an object. With the growing the possibilities of collecting and processing huge amount of data, new techniques for morphological analysis appear providing automatic extraction of the necessary characteristics.
The important part of information needed for paleoanthropological study comes from geometric measurements of skulls and skeletal bones. Measurements and analysis of linear, angular and shape parameters allow to make decisions on paleoanthropological characteristics of an object. A set of standard landmark points (Table 1 and 2) is used for the analysis, these points reflecting anatomical features of an object.
The paper presents the deep learning technique for recognition of the craniometric landmarks, that are used for morphometric anthropological research. The performed study addresses to a problem of craniometric landmarks recognition in skull 2D images * Corresponding author that is necessary for craniofacial identification. Photogrammetric methods for acquiring required 2D and 3D data and practical implementation of the developed techniques in the automatic system for crania-facial superimposition are described.

RELATED WORK
Craniometric landmarks (or craniometric points) have been used in anthropology since the XIX century. They are used in different anthropological applications such as craniometry, craniofacial identification (superimposition), face approximation and others. Anthropological landmarks serve as some reference points for comparative morphological analysis.
Initially an expert-anthropologist found the landmarks on a skull and applied anthropological measuring tools to find necessary morphometric characteristics. With digital technologies coming into the practice of anthropological study, landmark detection is performed manually with the aid of dedicated software. Along this, semi-automated and automated method of the landmarks detection were developing and implementing in the practice.
Wide variety of methods for facial landmarks detection was developed recently, the most effective being related with deep learning techniques (Keustermans et al., 2011, He et al., 2017, Chim et al., 2019. The comprehensive survey of facial landmark extraction (Bodini, 2019) gives an analysis of many state-of-the-art approaches, along with performance comparing and datasets review.
Significantly less studies address to the craniometric landmark detection, and most of them uses computed tomography images The study (Cheng et al., 2011) proposes a learning-based approach for automatic extraction the Dent-landmark, that is one of the key landmarks to construct the mid-sagittal reference plane. The proposed detector is learned using the random forest with sampled context features for landmark detection in the 3D conebeam computed tomography (CBCT) dental data. Spacial prior is used to build a constrained search space other than use the full three dimensional space. The proposed method has been evaluated on a dataset containing 73 CBCT dental volumes and yields promising results.
Craniometric landmark detection algorithm (Zhang et al., 2013) uses some reference skull model with known landmarks to detect the craniometric landmarks for an arbitrary skull. The reference skull 3D model is registered to the target model using fractional iterative closest point algorithm (FICP) (Phillips et al., 2007). Then the algorithm refines iteratively the landmarks of Frankfurt plane and the mid-sagittal plane. Such iterative registration maps the landmarks on the reference to the target. An automatic method for definition of the craniometric landmarks and soft tissues thickness measurement in these landmarks (Gorbenko et al., 2014) uses MRI data for craniometric landmarks extraction. The proposed technique isalso based on the non-rigid registration of the target image to the template.
An approach with a cascaded three-stage convolutional neural networks (Zeng et al., 2021) predicts cephalometric landmarks automatically in 2D radiograms. Initially, the lateral face area is located using high-level features of the craniofacial structures.
This step serves for to overcoming problem with the appearance variations Next, the aligned face area process to estimate the locations of all landmarks simultaneously. Finally, the developed network performs the refinement of the landmarks. Using high-resolution image data for the region of the initial position allows to achieve more accurate location.
A semi-supervised deep learning method for 3D landmarking (Yun et al., 2020) takes advantage of anonymized landmark dataset with paired computed tomography data being removed. The proposed method is three-staged. Firstly it detects a set reference landmarks that has high distinctive features. Then, basing on detected reference landmarks, it roughly predicts the other landmarks by utilizing the low dimensional representation learned by variational autoencoder. Variational autoencoder is trained on anonymized landmark dataset.
At the last stage for each bounding box provided by rough estimation coarse-to-fine detection is performed, estimation strategies being defferent for mandible and cranium. For mandibular landmarks, patch-based 3D CNN is applied to the segmented image of the mandible (separated from the maxilla), in order to capture 3D morphological features of mandible associated with the landmarks. The proposed method achieved an averaged 3D pointto-point error of 2.91 mm for 90 landmarks only with 15 paired training data.

MATERIALS AND METHOD
The performed study addresses to a problem of automatic recognition of a set of craniometric and cephalometric landmarks for crania-facial superimposition problem. Crania-facial superimposition is a widely used technique to identify a person basing on a skull and a person photograph. This method is widely used in the forensic practice and nowadays "craniofacial superimposition has attained a reputation of being a reliable anthropological method, especially for exclusion" (Blau, 2016).
To make decision on correspondence of a skull and a given photograph, a forensic expert try find the best fit mutual location of the photograph and the image of the skull. The anthropological landmarks serve for matching and the correspondence between the landmarks on the skull and on the face are usually used as a criterium of best-fitting.
So accurate and robust anthropological landmarks identification is the basis for correct decision on person identification. Superimposition techniques nowadays usually use digital 3D skull model and digital photograph, and a special software allowing to find the landmarks and match them in the photograph and the scull 3D model. DIgital technologies for acquiring data and its processing provide a background for collecting valuable data that can be used to apply modern machine learning techniques for data analysis.
To solve the problem of automatic craniometric landmarks recognition we used data obtained by original photogrammetric system for crania-facial identification.

Photogrammetric system for craniofacial identification
Photogrammetric system for craniofacial identification is designed for automatic skull 3D model generation and computer aided superimposition (Knyaz et al., 2019). It consist of ( Figure 3): • four high resolution cameras • precise PC-controlled rotation stage • two laser line source • shadow-free light sources for texture generation • personal computer as a processing unit Figure 3. Automated digital photogrammetric system The photogrammetric system provides all functions needed for cranifacial identification, beginning with system calibration and 3D model generation and completing with forensic report generation.
The original software allows to obtain accurate skull 3D model in automatic mode and perform accurate texture mapping. The textured digital model provides the expert with more information, as some features can only be found on the color image of the object. Since the photogrammetric system has been calibrated for all four cameras using a single calibration field, this ensures that the texture is accurately superimposed on the geometric coordinates of the digital model.

Current procedure of landmarking
Currently a forensic expert starts the procedure of crania-facial identification from acquiring skull 3D model in automatic mode.
The photogrammetric system generates 3D model and performs photorealistic texture mapping. For texturing a set of the skull images obtained during the 3D scanning process. An accurate skull 3D model and a set of skull images are available for the expert as a result of scanning procedure. To perform craniofacial identification the expert manually find and marks a set of anthropological landmarks. There are two ways to mark the landmarks: in oriented three images of the skull ( Figure 5) or in the textured skull 3D model. The result of the landmarks marking procedure in both cases is the following data: • three images of the skull; • a set of image coordinates of the anthropological landmarks for each image of the skull; • a set of 3D coordinates of the anthropological landmarks for each skull.
The expert can check the quality of manual marking by re-projecting 3D landmarks back to oriented images (Figure 4). If the quality is not satisfactory, the expert can correct the location of identified landmarks in the images or in the 3D model.

C2F dataset
A crania-to-facial (C2F) dataset was initially created for training the developed skull2face model for face approximation Figure 5. Three oriented images of a skull with manually marked craniometric landmarks (Knyaz et al., 2020a). The C2F dataset includes data of two modalities: skull 3D models and face 3D models. For the presented study the dataset has been extended by including in the dataset the following data for each skull 3D model: • 12 oriented images of the skull acquired during 3D scanning process; • image coordinates of craniometric landmarks for each image; Figure 6. Sample images from C2F dataset.
This data has been used for generated craniometric landmark dataset containing images of anthropological landmarks, centred relatively the coordinates of a landmark.

CL-net model
The problem of craniometric landmarks detection can be formulated as following.
Let A ∈ R w×h×3 denotes image of a skull, and L ∈ R N L ×3 is a tensor of NL given craniometric landmarks to be found. Each element of L includes image coordinates x l , y l of a landmark l and the probability of correct landmark identification p l : L = {xj, yj, pj}, l = 1, . . . NL. Then it is required to find mapping f from given image A to landmarks tensor L: To obtain an accurate and reliable map f : A → L in the supervised learning framework, the loss function L should penalize for incorrect predicted landmark location (x l ,ŷ l ) and for low probability of landmark identification p l .
Inspired by high performance of YOLO network (Redmon et al., 2016), we made some modifications in baseline model for incorporating in training process all available information.
The YOLO CNN model treats detection as a regression problem. It divides the image into an S × S grid and for each grid cell predicts B bounding boxes, confidence for those boxes, and NL (number of landmarks) class probabilities. The bounding box is predicted as its center x, y and dimensions w, h, and the confidence prediction is the Intersection-over-Union (IoU ) of the predicted and ground truth bounding boxes.
These predictions are encoded as an S × S × (B × g + NL) tensor, with g being the number (cardinality) of predictions for each bounding box: Loss function for YOLO detector LD consists of three parts: Here Lxy -penalty for incorrect prediction of landmark localization: The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIV-2/W1-2021 4th Int. Worksh. on "Photogrammetric & computer vision techniques for video surveillance, biometrics and biomedicine", 26-28 April 2021, Moscow, Russia LC -penalty for incorrect landmark classification: Lp -penalty for low probability of landmark classification: Here 1 clm ij denotes if craniometric landmark belong to cell i and 1 clm ij denotes that the j-th bounding box predictor in cell i is responsible for that prediction.
To incorporate information about 3D coordinates of the landmark a spatial consistency loss function LSC (A, L0) is introduced. Specifically, similarly to (Kniaz et al., 2020, Knyaz et al., 2020b we add information about predicted location of detecting points as masked image M containing epipolar constrains for NL landmarks detected in the reference image A0. LSC (A, L0) expresses the requirement for the landmark l m k detected in the image k (k = 1, . . . , K) to be close to epipolar line E m k for landmark l m 0 detected in the reference image A0: So the full loss function is given by: The overview of the proposed framework is given in Figure 7.

RESULTS AND DISCUSSION
The proposed CL model was trained on the C2F dataset using PyTorch library (Paszke et al., 2017). The training was performed using the NVIDIA 1080 Ti GPU and took about 16 hours for the model pre-trained on MS-COCO dataset (Lin et al., 2014). For network optimization, minibatch stochastic gradient descend with an Adam solver was used.
F1-score (as the harmonic mean of precision P and recall R) considered as recognition performance metrics for CL-net model evaluating.
To assess the quality of landmark localization in the image the average error δx l : was estimated, using coordinates of detected landmarkx j l ,ŷ j l in the j-th image and ground trouth for this landmark x j l , y j l .
The evaluation was performed on testing part of C2F dataset. Results of evaluation on C2F dataset is shown in  Table 1 shows, that the developed CL-net model recognizes the anthropological landmarks with high performance, and including spatial consistency condition in training process improves the as recognition performance so the accuracy of landmark localization.
Evaluation results demonstrates that the developed technique for automatic anthropological landmark detection can be implementing in the practice of anthropological study.

CONCLUSION
The technique for automatic anthropological landmark detection and recognition is developed. It is based on deep learning and works with arbitrary skull photograph as input.
For developed CL-net deep learning model training modified C2F dataset was used. The C2F dataset is extended by including skull images and landmark annotations.
The developed deep learning technique provides reliable anthropological landmark identification and measurements. 3D coordinates of craniometric landmarks, obtained from accurate skull 3D model, serves as a reference points for corresponding cephalometric landmarks measured in the given photograph. Such photogrammetric technique allow to achieve high quality of superimposition needed for reliable craniofacial identification.