INVESTIGATING 3D RECONSTRUCTION OF NON-COLLABORATIVE SURFACES THROUGH PHOTOGRAMMETRY AND PHOTOMETRIC STEREO

3D digital reconstruction techniques are extensively used for quality control purposes. Among them, photogrammetry and photometric stereo methods have been for a long time used with success in several application fields. However, generating highly-detailed and reliable micro-measurements of non-collaborative surfaces is still an open issue. In these cases, photogrammetry can provide accurate low-frequency 3D information, whereas it struggles to extract reliable high-frequency details. Conversely, photometric stereo can recover a very detailed surface topography, although global surface deformation is often present. In this paper, we present the preliminary results of an ongoing project aiming to combine photogrammetry and photometric stereo in a synergetic fusion of the two techniques. Particularly, hereafter, we introduce the main concept design behind an image acquisition system we developed to capture images from different positions and under different lighting conditions as required by photogrammetry and photometric stereo techniques. We show the benefit of such a combination through some experimental tests. The experiments showed that the proposed method recovers the surface topography at the same high-resolution achievable with photometric stereo while preserving the photogrammetric accuracy. Furthermore, we exploit light directionality and multiple light sources to improve the quality of dense image matching in poorly textured surfaces.


INTRODUCTION
In the field of industrial metrology, there is a rising need for 3D information at a very high resolution for micro-measurements and quality inspection of object surfaces. Photogrammetry has for long been regarded as a successful method for 3D modeling of well-textured objects due to its low cost, portability and flexibility in a variety of fields, including cultural heritage (Nicolae et al., 2014;Guidi et al., 2015;Menna et al., 2016;Aicardi et al., 2018), reverse engineering (Harvent et al., 2013;Tang et al., 2016), medicine (Kim et al., 2018), industrial inspection Rodríguez-Gonzálvez et al., 2017;Zhang et al., 2017;Lu et al., 2020) and quality control (Gao et al., 2019;Tang et al., 2019). However, it is still challenging to achieve high-accuracy 3D measurement of non-collaborative objects ( Figure 1) due to the sensitivity of photogrammetry to the textural properties of the surface (e.g., opaque, translucent, roughness). Consequently, noisy results on poorly textured objects are typically generated (Ahmadabadian et al., 2017;Santoši et al., 2019;Hafeez et al., 2020). Unlike photogrammetry approaches, photometric stereo can recover a very detailed topography of objects even with textureless or shiny surfaces (Li et al., 2020;Wang et al., 2020;Wei et al., 2020). However, the derived 3D shape results globally deformed due to the dependency of the method on the lighting system and the simplifications often made to the mathematical model (Ackermann and Goesele, 2015;Shi et al., 2018). This paper presents the main concept and the high-level architecture behind a prototype system developed to synergistically combine photogrammetry and photometric stereo techniques. The developed system can provide detailed and accurate 3D reconstructions of non-collaborative surfaces such as shiny and texture-less. Our work aims to complement weakness of photometric stereo and photogrammetry by (i) deriving scale and accurate low-frequency measurements using photogrammetry, (ii) acquiring finely detailed topography of the surveyed object using photometric stereo and (iii) co-registering and merging all acquired 3D information.
The key contribution of this paper is to utilize photometric stereo lighting system to highlight microstructures of a noncollaborative object surface exploiting light directionality shadowing and shading, in particular obtained at grazing angles. Then, we use the signal generated by the surface roughness as a pattern to facilitate dense image matching. Separated and complementary 3D point clouds are generated using only stereo pairs but under different grazing angles. Finally, we fuse all the point clouds generated under different grazing angles obtaining an improved 3D object reconstruction. Furthermore, we use a dense 3D reconstruction to compute light direction at each surface point, which better represents the shape, especially for complex objects. This dense cloud can also be used to mitigate the global shape deviation or predict the object's regions that have shadow and specular reflection. The rest of the paper is organized as follows. Section 2 reviews previous works regarding the 3D reconstruction of noncollaborative surfaces. In Section 3, we describe the proposed methodology as well as the proposed image acquisition system. Preliminary results of the combination of photogrammetry and photometric stereo methods with critical analyses are reported and discussed in Section 4. Finally, conclusions are drawn and presented together with future research plans.

STATE OF THE ART
This section presents an overview of literature related to photogrammetry, photometric stereo, and the combination of both methods for high-resolution 3D reconstruction of noncollaborative surfaces.

Photogrammetry
Over the years, different photogrammetric methods have been developed to deal with the 3D reconstruction of noncollaborative objects. For 3D reconstruction of Lambertian textureless objects, previous studies have been concentrated on improving surface texture by projecting, for example, a known pattern (Menna et al., 2017;Mousavi et al., 2018), random (Hosseininaveh et al., 2015;Ahmadabadian et al., 2019) or a synthetic texture (Santoši et al., 2019;Hafeez et al., 2020) onto the objects. These methods, however, assume that the object surface is Lambertian, which is not the case with objects that have specular reflection or interreflection effects. Another popular method for reducing the reflectivity of complex objects is to coat the surface with a thin layer of white or colored powder (Ackermann et al., 2008;Hosseininaveh et al., 2015;Palousek et al., 2015;Lin et al., 2017;Mousavi et al., 2018;Pereira et al., 2019). However, in some cases, such as for delicate cultural heritage measurements or in-line inspection of industrial components, such surface treatment might not be practical. Furthermore, the added layer can increase the object's size or hide geometric details such as defects on the surface (Palousek et al., 2015;Pereira et al., 2019).

Photometric stereo
Photometric stereo, as a basic idea, was initially presented by Woodham (1980). This technique reconstructs the shape of an object using illumination changes. The classical photometric stereo approaches work with perfectly diffuse (Lambertian) surfaces, which is often an improper assumption for many objects such as metallic, glossy, and shiny. In this regard, some researchers developed new methods that physically model how light interacts with the surface of non-Lambertian surfaces (Lu et al., 2013;Santo et al., 2020;Boss et al., 2020). Other works (MacDonald, 2014;MacDonald et al., 2016;Sun et al., 2017;Wei et al., 2018;Wen et al., 2021) classify and remove the specular highlights to deal with reflective surfaces. Unlike photogrammetry approaches, photometric stereo can recover a very detailed surface's topography even in case of textureless or shiny surfaces (Zheng et al., 2019;Wang et al., 2020;Wen et al., 2021). However, a global deformation of the shape is generally present due to some assumptions made to the mathematical model, such as parallel light direction and orthogonal projection of the imaging sensor (Fan et al., 2017;Shi et al., 2018;Ren et al., 2021;Li et al., 2020). For example, Fan et al. (2017) report a maximum shape deviation of about 13mm on a flat object with 340*270mm dimensions when ignoring the assumptions mentioned above.

Combined methods
Various researchers attempted to combine photometric stereo with other methods and techniques such as structured light or photogrammetry. In this way, high-frequency spatial information is recovered from photometric stereo, whereas other techniques are applied to retrieve low-frequency information. Nehab et al. (2005) combined 3D reconstruction generated from a range scanner with photometric normals to improve the accuracy and level of detail. Hernandez et al. (2008) used the silhouettes of the object to get the geometry information and added details with photometric stereo. More recently, Ren et al. (2020) combined photometric stereo with sparse 3D points extracted using contact measurements (CMM) to mitigate the global deformation obtained by photometric stereo. Although these systems can achieve high accuracy performances, the use of costly instrumentation limits the technique only to specialized laboratories and projects with unique metrological specifications. Multi-view stereo approaches were also used to create sparse 3D shape reconstruction, free of global deformation, of objects with Lambertian surfaces (Vlasic et al., 2009;Park et al., 2013;Grochulla et al., 2015;Park et al., 2016;Logothetis et al., 2019;Li et al., 2020;Xie et al., 2020). These were then used as a base for high-resolution measurements produced using photogrammetry. However, the performances of these methods seem to reduce significantly when dealing with complex and noncollaborative surfaces such as textureless and highly reflective. In particular, Park et al. (2013) used a single stereo camera to acquire stereo images for 3D reconstruction, which can be limiting and may negatively impact the final 3D results when the surface is non-collaborative. A common solution used to mitigate this issue is to increase the number of synchronized cameras (Vlasic et al., 2009;Xie et al., 2020), although it might be unsuitable for low-budget industrial inspection projects. Another alternative is to increase the number of image stations (Grochulla et al., 2015;Park et al., 2016;Logothetis et al., 2019), which is a time-consuming procedure, unsuitable for real-time 3D inspection. Li et al. (2020), rather than rotating the camera around the object, used a turntable to rotate the object in order to capture multi-view images. This means that the light sources are not fixed with respect to the object. This would have a negative impact on the object texture that varies from image to image, even for the same light. In particular, in case of complex objects causing selfshadows and occlusions, this might lead to false matches and artifacts in the generated 3D dense point clouds.

METHODOLOGY
Driven by industrial demands for effective optical inspection methods, we propose a simple methodology that can cope with a large variety of objects characterized by different surface properties, even textureless and shiny. The method combines photogrammetry and photometric stereo ( Figure 2) with a dedicated acquisition system (Figure 3-4). The first step is to provide an automatic image acquisition system to capture images under different illuminations and from different camera stations (camera position). Depth maps and dense 3D point clouds using images taken from various points of view are generated. Light directions at each surface point are estimated, exploiting the 3D surface approximations obtained by photogrammetry and the calibrated light positions. From the photometric stereo side, surface normal and, consequently, depth maps are recovered once the light direction at each surface point is computed from the previous step. Finally, depth maps obtained with photogrammetry are used to compute the scale factor for the photometric stereo depth maps. These are then transformed into 3D point clouds using the exterior orientation parameters of the cameras known from the photogrammetric process.

Light direction per pixel
Conventional photometric stereo assumes that the light rays coming from the illumination source are parallel when they hit the object's surface. In reality, it is evident that these rays are no longer parallel, particularly when the illumination system is close to the object. This effect has to be considered for accurate measurement of the normal surface as any variation in lighting location affects the recovery of the depth map from the normals and introduces global shape distortions. Moreover, providing parallel illumination conditions is more complicated to implement in practice. For these reasons, we consider a different geometric model with punctiform light sources and light divergence ( Figure 3). This model requires that the light positions and the 3D shape of the object be known in the same reference system (camera coordinate system) to compute the light direction at each surface point. The lighting system's geometry was accurately measured during the system calibration using photogrammetry (MacDonald et al., 2015). Regarding the 3D shape of the object, we generate a dense 3D point cloud with photogrammetric measurements using images taken from various points of view. The reconstructed 3D points ( ( , , )) are then back-projected on each 2D image to their corresponding pixels (I(u,v)) using the interior and exterior orientation parameters. Then, if the coordinate of the k th light source is ( , , ), the normalized light direction ( , ) at each surface point ( ( , , ) can be calculated as (Fan et al., 2017): Given , , and their corresponding image intensities (I(u,v)), the unit vector of the surface normal at each point can be written as: ̂= ( , . , ) −1 . , .
‖( , . , ) −1 . , . ‖ Although photometric stereo produces an approximation orientation in the form of one normal per pixel, the shape of the surface (depth map) is often desired. The depth map is specifically given as = ( , ), with as = − and = − , and the normal of the surface points towards the gradient direction (Scherr, 2017).

Experimental setup, calibration, and data acquisition
The system (Figure 4) is composed of four main segments: 1) a digital camera fixed at an appropriate distance from the object (400 mm approximately); 2) multiple dimmable LEDs lights on vertical poles (currently up to 10 LEDs on four vertical poles) mounted on an optical breadboard; 3) a support to place the object to be surveyed; 4) a microcontroller (Arduino) to manage the synchronization and control LEDs and camera. Figure 4. The prototype image acquisition system for combining photometric stereo and photogrammetry techniques.
The prototype system so far features a single DSLR camera placed on a tripod in different positions. A Nikon D3X DSLR camera with a resolution of 24 Mpx mounting an AF-S Micro NIKKOR 60mm f/2.8G ED lens is placed at about 400 mm from the object. The camera height can be set at different levels through an adjustable-height tripod. The camera parameters, i.e., distance to the object, focal length, F-Stop, and ISO, are manually set by an operator and kept constant. Regarding the image acquisition, once the object is placed on the support, the first LED is switched on, an image is acquired, the LED is turned off, the second LED is switched on and the second image is captured. This process is repeated until an image for each LED is acquired (approx. 5 sec). The camera is then relocated to the next station, and the image acquisition process is repeated.
A simple yet effective method for calibrating the geometry of the lighting system and computing the camera interior and exterior parameters is to use some coded targets embedded in the scene within the photogrammetry pipeline. The coded targets allow us to define a fixed local coordinate system on the breadboard that remains stable and unchanged over time as well as to scale the photogrammetric processing. It is essential to have the light positions and dense clouds in the same coordinate system to compute the light direction per pixel in a multi-view photometric stereo procedure. In this way, whenever a set of images is taken, the camera pose is referred to the fixed reference system of the breadboard, where light positions are known from the calibration phase.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2021 XXIV ISPRS Congress (2021 edition) In order to measure the light positions with respect to the coded targets, a close-range photogrammetry survey is carried out. The LEDs were turned on during the image acquisition and considered as circular targets. In this way, during the calibration step, along with the coded targets, the positions of the LEDs were measured in a local coordinate system.

EXPERIMENTS AND DISCUSSION
The currently implemented system was tested with various noncollaborative surfaces and two different examples are hereafter reported. For the first object (100x100 mm gold foiled surface shaped like a Euro coin - Figure 1a), we aimed to show how the proposed integrated approach performs in fusing photometric stereo and photogrammetry measurements. Using the proposed image acquisition system, a set of multiple images (ground sample distance -GSD ≈ 37 µm) was captured from 20 different stations, each one under ten different illuminations. A dense 3D point cloud was generated using the images acquired from the various points of view. The light coordinates (determined during the system calibration) and the approximate 3D object shape are then used to estimate the light directions at each surface point. Following that, given the light directions and intensities, the surface normal and depth map are computed at each surface point. Finally, the estimated depth map was transformed to the same coordinate system of the photogrammetric 3D point cloud, using the interior and exterior orientation camera parameters. The comparison between the refined photometric stereo depth map and those generated using basic photometric stereo proves the clear benefits of this combination. Indeed, thanks to the inclusion of photogrammetric 3D measurement in the light direction computation, the global shape deviation caused by parallel light direction was greatly reduced (Figures 5 and 6). The 3D results presented in Figure 5 show that the integrated method took the advantages of photogrammetry and photometric stereo to generate a high-detail 3D reconstruction of the surface topography. The 3D reconstruction from photogrammetry ( Figure 5a) provides accurate low-frequency information compared with photometric stereo 3D reconstruction (Figure 5b) that, on the contrary, results globally deformed. The proposed integrated algorithm recovered 3D micro-measurements ( Figure  5c), improving the global shape deformation aided by photogrammetry. To evaluate the accuracy potential of the proposed method in the low-frequency domain, the 3D results achieved with the basic photometric stereo (Figura 5b) and with the integrated solution ( Figure 5c) were geometrically compared against the photogrammetric data (Figura 5a). To this end, the obtained 3D point clouds were aligned to the reference data using the exterior orientation parameters, in common for the two methods, and the RMS of the Euclidean point-to-point distances was calculated. This geometric comparison allows to estimate global deformations of the recovered 3D shapes since its lowfrequency information is reliable. The results of the point-topoint comparison for the basic photometric stereo and the proposed integrated methods are presented in Figure 6. The negative value of the diagram in Figure 6 (towards blue color) means that the generated 3D points are below the reference surface, while the positive values (towards red color) are above the reference surface. The quantitative analysis shows that the proposed approach outperforms traditional photometric stereo by a significant margin. The estimated RMS of Euclidean distances for the basic photometric stereo is 5.76mm, with maximum difference of 10.1mm for the central part. On the other hand, the RMS of Euclidean distances for the integrated decreases remarkably to 0.81mm with a maximum of 2.2mm at the bottom of the object.  The second experiment (metallic industrial component with an overall size of 60x60x60 mm - Figure 1b) aims to demonstrate how a photometric stereo lighting system can highlight surface roughness to exploit these irregularities for dense image matching. So light directionality and shadows obtained at various grazing angles are used as a pattern to improve the photogrammetric 3D reconstruction of textureless and reflective surfaces. A set of images (GSD ≈ 38 µm) was captured from only two stations (stereo pairs), under ten different illuminations, in order to show reconstruction improvements when using the minimum configuration that generally suffers the most from poor texture signal, specular reflections, and shadows. On the contrary, using more images (or camera stations) would have mitigated the random and systematic errors and provided a better signal to noise per each object point, but it would be a more timeconsuming and expensive solution. Therefore, all point clouds generated under different grazing angles were finally merged to generate a reliable, precise, and complete 3D reconstruction of the non-collaborative surface. Figure 7 presents the experiment results, where surface roughness, highlighted with directional lights (grazing angles), is exploited to locally improve the dense image matching. Three different light directions were used, obtained following the photometric stereo image acquisition protocol, to highlight roughness and micro-structures on the surface. Depending on the direction of the light, different spatially varying chiaroscuro patterns can be produced using shadows and shading phenomena (  The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII- B2-2021XXIV ISPRS Congress (2021 Although this example emphasizes the capability of light directionality to improve dense image matching, this procedure highlights locally the surface's topography depending on the light direction. To address this issue, in each dataset, the individual point clouds generated under different grazing angles (Figure 7 second row) can be merged into a unique 3D model (Figure 9a). Since the selected point clouds were all oriented and registered within the same reference coordinate system, merging them to create the final point cloud was a simple process. Thus, in this way, a complete 3D object reconstruction can be achieved. However, since the accuracy of individual models varies and is dependent on light direction, the merged model can often include noisy 3D points, particularly in areas where shadow and interreflection occurred. As a result, a statistical noise/outlier reduction algorithm (Carrilho et al., 2018) was employed, which can also aid in the saving memory and computing resources. To provide a quantitative evaluation of the achieved results, a point-to-point comparison between reference data and the generated 3D reconstruction was accomplished. An Aicon/Hexagon Primescan sensor (63 µm spatial resolution - Figure 8) was used to obtain reference 3D data. The last row of Figure 7 shows the outcome of this evaluation on three selected areas. The estimated RMS of Euclidean distances varies based on where the light was located. For example, the estimated RMS for generated 3D reconstruction under top-right (a) and top-left (c) lights was about 0.07mm, while this value for 3D reconstruction under bottom-center light was about 0.05mm. As mentioned, these variations may occur as a result of the arrangement of the microstructures on the object's surface, as well as the lighting direction, which is aimed at emphasizing more textures on the surface. However, after merging all point clouds (Figure 9a), the obtained RMSE for 3D reconstruction was 0.063mm (Figure 9b), which was nearly the average of the local values for each directional light.

CONCLUSIONS
In this paper, we presented the preliminary results of a simple and effective method that aims to combine photogrammetry and photometric stereo techniques to achieve an accurate, detailed and deformation-free 3D reconstruction of optically noncollaborative objects. In this regard, geometric information such as scale and low-frequency shape deviation is generated using photogrammetry in areas where photogrammetric measurements are reliable. The high spatial resolution capability of photometric stereo was exploited to acquire a finely detailed topography of the surface. In order to evaluate the proposed method in terms of low-frequency information, a point-to-point comparison between reference data and the generated 3D reconstruction was carried out. The proposed integrated algorithm recovered high-resolution details similar to photometric stereo, though inheriting geometric information from photogrammetry (RMS of signed Euclidean distances lower than 0.81 mm). Furthermore, based on the light directionality and the advantage of casted shadows and shades, some roughness and microstructures on the surface were highlighted as patterns of spatially varying intensities used to improve the dense image matching. However, our method has some limitations that need to be improved. First, we did not consider angular intensity attenuation in our preliminary implementation. Second, the algorithm needs some improvements with the high-reflective surfaces since we did not model all radiance parameters. Moreover, each LED lits the object with different strengths, so this parameter needs to be taken into account during the system calibration. Fourth, the shadow and specular reflections on the object surface can be predicted by having the accurate geometry of the lighting system and the objects' 3D shape. Therefore, as future work, instead of using a digital camera mounted on a tripod and moving it manually, we will use two or more synchronized industrial cameras calibrated and fixed in proper positions to acquire a complete data set within few seconds. We will also increase the number of LEDs to improve the system's flexibility for better surface inspection taking advantage of light directionality.