MACHINE LEARNING FOR APPROXIMATING UNKNOWN FACE

: The problem of facial appearance reconstruction (or facial approximation) basing on a skull is very important as for anthropology and archaeology as for forensics. Recent progress in optical 3D measurements allowed to substitute manual facial reconstruction techniques with computer-aided ones based on digital skull 3D models. Growing amount of data and developing methods for data processing provide a background for creating fully automated technique of face approximation. The performed study addressed to a problem of facial approximation based on skull digital 3D model with deep learning techniques. The skull 3D models used for appearance reconstruction are generated by the original photogrammetric system in automated mode. These 3D models are then used as input for the algorithm for face appearance reconstruction. The paper presents a deep learning approach for facial approximation basing on a skull. It exploits the generative adversarial learning for transition data from one modality (skull) to another modality (face) using digital skull 3D models and face 3D models. A special dataset containing skull 3D models and face 3D models has been collected and adapted for convolutional neural network training and testing. Evaluation results on testing part of the dataset demonstrates high potential of the developed approach in facial approximation.


INTRODUCTION
Craniofacial reconstruction (or facial approximation) tries to solve a problem of forecasting a person appearance having only a skull. This problem is important for various application areas such as forensics, archaeology, anthropology. First techniques for facial approximation operated with real skull (or its gypsum copy) and a clay, used by an expert-anthropologist for creating a sculpture forecast of the unknown face. Such a procedure was some kind of art, so it resulted in subjective appearance reconstruction depending on the author's foresight.
The problem of face reconstruction (or face approximation) has attracted the attention of anthropologists over a long period of time. First attempts of facial reconstruction was done in 1895. German anatomist Wilheim His in collaboration with German sculptor Karl Seffner has made first sculpture reconstruction of the face of German composer Johann Sebastein Bach (His, 1895). The result was the face of Bach, similar to the best of the portraits.
The process of gathering and processing data about skulls and face tissue allowed to develop manual face reconstruction methods, which can be divided into the three main techniques: anthropometrical American method (tissue depth method), anatomical Russian method, combination Manchester (British) method.
Recent progress in optical 3D measurements allowed to substitute manual facial reconstruction techniques with computeraided ones based on digital skull 3D models. The 3D modeling software (such as Free Form Modelling PlusTM; Sensable * Corresponding author Technologies, Wilmington MA) enables the operator to manually model the face onto the skull.
With growing amount of digital data about skull and face and developing methods for data processing a background for creating fully automated technique of face approximation has appeared. Some studies propose the automated methods for facial approximation based on statistical deformable shape models for skull and face morphology.
Another great tendency of nowadays is incorporating machine learning techniques in wide variety of applications. The successful application of machine learning for solving a lot of challenging tasks is based on availability of large scale annotated datasets and data processing algorithms.
The performed study addressed to a problem of facial approximation based on skull digital 3D model with deep learning techniques. The skull 3D models used for appearance reconstruction are generated by the original photogrammetric system in automated mode. This 3D models are then used as input for the algorithm for face appearance reconstruction, which results in face 3D approximation corresponding to the skull. The main contributions of the paper are (1) the pipeline for facial approximation based on machine learning, (2) the annotated dataset for generative adversarial network (GAN) training for skull a 3D model translation to the corresponding face 3D model, (3) the evaluation of the proposed GAN model performance for facial approximation.

RELATED WORK
An extensive overview of existing facial approximation (craniofacial reconstruction) techniques considered as a com-mon framework using a general taxonomy could be found in (Claes et al., 2010) and (Wilkinson, 2010). These studies show that implementation of the new technologies and the process of permanent evaluating the achieved results and improving the applied techniques provide more and more accurate reconstructions. Below short overview of the main studies creating modern state of the art is given.

Manual techniques
Statistical anatomical data about face muscles and soft tissues serves as a reference information for modern manual facial reconstruction methods (Gerasimov, 1955, Gerasimov, 1971, (Caldwell, 1986). Manual facial approximation now is presented by the three main techniques: anthropometrical (American) method, anatomical (Russian) method, combination (British) method. The first one is based on soft tissue data and requires highly experienced stuff. Russian method (Gerasimov, 1971) is performed by modeling muscles, glands and cartilage placing them onto a skull sequentially. This technique requires sufficient anatomical knowledge for accurate facial approximation. British method exploits the data of both soft tissue thickness and facial muscles. Facial tissue pegs are placed at the anatomical points onto a skull, presenting the mean tissue depth at these points. Then they serves as reference heights for the reconstruction using a clay.

Computer aided techniques
With the progress in digital technologies, the computer aided techniques for skull digitizing and digital data processing has opened a new possibilities for achieving realistic facial reconstruction. Facial approximation can be carried out through a programmatic face modeling by surface approximation basing on a skull 3D model and tissue thickness (Knyaz et al., 2002) or manually with 3D modeling software (such as Free Form Modelling PlusTM; Sensable Technologies; ZBrush, Pixologic, Inc. etc.). In latter case an operator performs facial reconstruction using digital 3D model of a skull and software tools which models manual clay techniques.
The 3D reconstruction of the face of Ferrante Gonzaga (1507 -1557), an Italian noblemen of the Renaissance period has been performed using physical model of the skull obtained by methods of computed tomography of his embalmed body and rapid prototyping (Benazzi et al., 2010). The reconstructed face has been compared with portraits of Ferrante Gonzaga showing good similarity in terms of superimposition method.
The facial approximation of a 3,000-year-old ancient Egyptian woman (Lindsay et al., 2015) has been made with the use of medical imaging data and the digital sculpting program ZBrush has demonstrated a synthesis of the different computer aided techniques based on the most solid anatomical and/or statistical evidence.
The study for accuracy evaluation of facial reconstruction made by forensic computer-aided software has been performed using results of cone-beam computed tomography from live subjects (Lee et al., 2012). Three 3D computerized facial reconstructions has been carried out using skull 3D models of live adults. The 3D skeletal and facial data has been acquired by a conebeam computed tomography scanner from subjects in an upright position. The accuracy of the face approximation has been evaluated as surface to surface deviation between the reconstructed and real faces, demonstrating accuracy below 2.5 mm for about 60% of the reconstructed face surface.

Automated techniques
Recent possibilities for collecting and processing big amounts of digital anthropological data allow to involve statistical and machine learning techniques for face approximation problem.
The applying statistical shape models representing the skull and face morphology for the face approximation problem has been studied (Paysan et al., 2009b, Paysan et al., 2009a by fitting them to a set of magnetic resonance images of the head. The authors used ridge regression on the resulting model parameters. Evaluation experiments has showed that the face reconstructions are generally close to the original face. Method for automated estimation of a human face given a skull remain (Gietzen et al., 2019) exploits three statistical models developed using data derived from computed tomography head scans and optical face scans. The three models are a volumetric skull model encoding the variations of different skulls, a surface head model encoding the head variations, and a dense statistic of facial soft tissue thickness. To recover a face from a skull remain, the skull model is firstly fitted to the given skull. Next spheres with radius of the respective soft tissue thickness value at each vertex of the registered skull is added. Finally, a head model is fitted to the union of all spheres. The estimations generated from the given skull visually match well with the skin surface extracted from the CT scan.
A large scale facial model -a 3D Morphable Model (Booth et al., 2018) has been automatically constructed from 9663 distinct facial identities. The 3D Morphable Model contains statistical information about a huge variety of the human population. It is constructed by fully automated pipeline, informed by an evaluation of state-of-the-art dense correspondence techniques. The performed experiments with 3D Morphable Model models demonstrated their quality and descriptive ability.
Also deep learning models appear that are capable of multimodal data translation (Isola et al., 2017, Kniaz et al., 2019a or generating object's shape 3D reconstruction basing on a single image (Kniaz et al., 2019b, Knyaz, 2020.These approaches are also can be applied for facial approximation.

Datasets
The essential part of machine learning techniques is the representative dataset allowing convolutional neural network model to aggregate and generalize main features and hidden relation of the data. Due to the specificity of the anatomical data there are not many public datasets containing corresponding 3D skull and face data available.
A large dataset containing more then 300,000 head computed tomography scans (Chilamkurthy et al., 2018b, Chilamkurthy et al., 2018a has been retrospectively collected from around 20 centers in India between Jan 1, 2011, and June 1, 2017 for the purposes of automatic diagnostic tool developing. It contains the training and validation (Qure25k and CQ500 dataset) subsets and the original clinical radiology report and consensus of three independent radiologists, considered as gold standard for the Qure25k and CQ500 datasets, respectively. This data set is intended for the development and evaluation of automated diagnostic techniques, and it is not suitable for face approximation learning.
Unfortunately this (and similar computed tomography) publicly available datasets are not suited for the considered task of skullbased facial approximation. So for this study a special dataset has been created.

FACE APPROXIMATION APPROACH
Recent progress in deep learning techniques demonstrates state of the art results of modern deep convolutional neural network models in various tasks of computer vision such as object detection and recognition, image segmentation, multi modal image translation, 3D reconstruction and others. The generative adversarial approach in deep learning allows to gain high performance in image-to-image translation, image segmentation, single photo 3D reconstruction. The study considers facial approximation as multi-modal data translation problem and develops a generative adversarial network model for skull to face model translation.

skull2face neural network model
The problem of facial reconstruction is considered as a data translation from one modality (skull) to another modality (face). The goal of the proposed skull2face neural network model is the translation of a skull depth map to a face depth map. The skull2face model is based on generative adversarial approach, including a generator G and a discriminator D. The overview of the proposed framework is given in Figure 7.  Usually, X is an image that is transformed by the generator network G. The discriminator network D is trained to distinguish "real" images from target domain Y from the "fakes"Ŷ produced by the generator. Both networks are trained simultaneously. Discriminator provides the adversarial loss that enforces the generator to produce "fakes"Ŷ that cannot be distinguished from real images Y .
The generator G is trained using a semi-supervised approach. The training dataset includes data of two types: paired samples and unpaired samples. The paired samples include a skull depth map and a face depth map. They were provided from the computer tomography of modern anonymous volunteers.
The unpaired samples include 3D models of skulls and 3D models of face of different humans. There is no correspondence between the face models and the skulls model from unpaired subset. To avoid collapsing of modes while training on unpaired samples, an anthropological loss function LA is used for this case. This loss function introduces a penalty for misalignment of anthropological landmarks on the input skull depth map and on a face predicted by the developed skull2face model.

C2F dataset
For training the developed skull2face model a special craniato-facial (C2F) dataset was created. The C2F dataset includes data of two modalities: skull 3D models and face 3D models. For CNN training these 3D model were translated in depth map form. The C2F dataset has two parts. The first part is paired samples subset, containing the corresponding 3D models of a face and a skull, generated by processing computer tomography data. The paired samples subset contains 24 pairs of skull and face 3D models.
The second part of C2F dataset includes unpaired skull 3D models and face 3D models. Skull 3D models of unknown persons are generated by the original photogrammetric system for automated person identification (Knyaz et al., 2019). It provides a set of functions needed for crania-facial identification, beginning with system calibration and 3D model generation and completing with forensic report generation.
Photogrammetric system is designed as fully automated device for textured skull 3D model generation (Figure 2). It includes: 4 high resolution cameras, precise PC-controlled rotation stage, 2 laser line source, shadow-free light sources for texture generation. The original software supports a set of functions providing the complete process of system calibration, textured 3D model reconstruction and crania-facial superimposition based on craniometric landmarks.
The output skull 3D models (Figure 3) have accuracy of 0.05 mm based on the original calibration procedure and are conjugated by 3D coordinates of craniometric landmarks (Table  1)     Face 3D models ( Figure 4) were generated by photogrammetric technique (Knyaz, Zheltov, 2008) providing high resolution and high accuracy of the 3D models. Cephalometric landmarks ( Table 2) were marked on the face 3D model by an expert for further applying in the network training.
The unpaired samples subset contains 316 skull 3D models and 316 face 3D models. For training process all 3D models were aligned and transformed into depth map representation.

Loss function
For paired 3D models conditional generator and conditional discriminator coupled with L1 loss is used similarly to arbitrary image-to-image transforms (Isola et al., 2017). The loss is given by For unpaired 3D models anthropological loss function LA(F, S) expresses the requirement of a face 3D model F being close in N cephalometric points f i = (f i x , f i y , f i y ) (i = 1, . . . , N ) to a skull 3D model S in N craniometric points s i = (s i x , s i y , s i y ), corrected by soft tissue thickness t i = (s i x , s i y , s i y ) in these points: So the full loss function is given by:

TRAINING RESULTS
The proposed skull2face model was trained on the C2F dataset using PyTorch library (Paszke et al., 2017). The training was performed using the NVIDIA 1080 Ti GPU and took about 14 hours for G, D. For network optimization, minibatch stochastic gradient descend (SGD) with an Adam solver was used. For better convergency the training process has been carried The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) out iteratively, beginning from low resolution of the predicted face 3D models (8 × 8 pixels) and increasing resolution from stage to stage. The illustration of the staged training process is presented in Figure 5.
Firstly the network training was performed directly on the depth maps obtained from 3D models. However, such approach resulted in weak reconstruction of facial details (e.g., nose, mouth). We hypothesized that the reason was the small differences in distance for facial features compared to the whole depth of a face.
To overcome this problem, the depth maps were preprocessed by raising the depth values to a power of four ( Figure 6). After this transformation, linear depth maps by raising the resulting depth to a power of 0.25 were obtained. The trained skull2face model was tested on testing part of the C2F dataset to predict unseen faces. For the evaluation surface-to-surface root-mean-square error (RMSE) metric was used. The mean RMSE of surface-to-surface deviation was at the level of 1 . . . 2 mm for the test part of the C2F dataset.
Also facial approximation by skull2face model was carried out for paleoanthropological findings having no ground truth models. The face approximation done by skull2face were qualitatively evaluated by experts in anthropology and compared with manual artistic reconstructions made using manual face approximation method. The results of machine learning face approximations were estimated by experts as adequate and having high potential for automated skull based face 3D reconstruction.

CONCLUSION
Machine learning approach is developed for an important task of anthropology -appearance reconstruction of unknown humans using digital skull 3D model. The proposed skull2face generative adversarial network model predicts a face appearance from skull 3D model presented in depth map form. For skull2face model training the special annotated C2F dataset has been created including paired and unpaired skull and face 3D models.
Evaluation results on testing part of the dataset demonstrates high performance in facial approximation. The evaluation of performance of developed algorithms for face approximation in comparison with available ground truth and artistic manual reconstruction (for absent ground truth) showed high quality of face approximation by skull2face model.