3D Surface Reconstruction From Multi-Date Satellite Images

The reconstruction of accurate three-dimensional environment models is one of the most fundamental goals in the field of photogrammetry. Since satellite images provide suitable properties for obtaining large-scale environment reconstructions, there exist a variety of Stereo Matching based methods to reconstruct point clouds for satellite image pairs. Recently, the first Structure from Motion (SfM) based approach has been proposed, which allows to reconstruct point clouds from multiple satellite images. In this work, we propose an extension of this SfM based pipeline that allows us to reconstruct not only point clouds but watertight meshes including texture information. We provide a detailed description of several steps that are mandatory to exploit state-of-the-art mesh reconstruction algorithms in the context of satellite imagery. This includes a decomposition of finite projective camera calibration matrices, a skew correction of corresponding depth maps and input images as well as the recovery of real-world depth maps from reparameterized depth values. The paper presents an extensive quantitative evaluation on multi-date satellite images demonstrating that the proposed pipeline combined with current meshing algorithms outperforms state-of-the-art point cloud reconstruction algorithms in terms of completeness and median error. We make the source code of our pipeline publicly available.


3D Surface Reconstruction with Satellite Imagery
The creation of large-scale environment reconstructions is relevant for several application areas such as urban planning or environmental monitoring. While there is a variety of modalities and sensors to compute such models, the usage of observation satellites has recently gained significant attraction because of the progress made in the field of image-based reconstruction. In contrast to pictures of ground-level and airborne devices, satellite images cover huge areas, which allow for reconstructing large-scale environment models with less effort. In contrast to active sensors like Lidar or Radar, image data inherently represents appearance information, which allows image-based reconstruction pipelines to derive not only geometry, but also the corresponding texture information. Stereo Matching and Structure from Motion (SfM) are currently the predominant methods to derive geometric structures from satellite images (de Franchis et al., 2014;Zhang et al., 2019). While SfM reconstructs information of multiple images, Stereo Matching is restricted to single image pairs. Thus, SfM based approaches are inherently better suited to process large (unstructured) image sets such as multi-date satellite imagery. Since SfM obtains intrinsic and extrinsic camera parameters during reconstruction, the corresponding results allow to utilize dense and surface reconstruction methods to compute dense point clouds, to triangulate corresponding meshes and to perform mesh texturing. In comparison to point clouds, textured meshes oftentimes create a similar appearance with a lower information redundancy and inherently contain specific properties that simplify the computation of consistent visualizations or intersection tests at different scales.

Paper Overview and Contribution
We propose a pipeline that allows to compute textured meshes from multi-date satellite imagery. To apply available surface re-construction libraries to satellite images, several preprocessing steps are required. This leads to the following main contributions: (1) We leverage PAN-sharpening (Pohl and Genderen, 1998) to combine information of image data captured by different sensors, i.e. we compute high-resolution satellite images with color information exploiting the high resolution resolution of panchromatic images and the color information of lowresolution multispectral images -see Section 3. Since satellites cover huge areas the corresponding reconstructions can easily exceed available memory capacities of modern workstations. Thus, we extract and align only a specific area of interest of the panchromatic and multispectral imagery using the metadata of the respective satellite image.
(2) The projection properties of satellite imaging sensors are typically described by Rational Polynomial Camera (RPC) models (Tao and Hu, 2001). Since the reconstruction operates only on a specific (continuous) part of each satellite image, the corresponding (complex) RPC model can be substituted with a Finite Projective Camera (FPC) model that approximates the RPC model locally. Since current state-of-the-art dense and surface reconstruction libraries do not support the skew parameter present in finite FPC models, we perform a specific adaption of the reconstructed FPC models that allows us to maintain the consistency of the reconstruction results by performing a skew correction of the corresponding images and depth maps. A mathematical derivation that justifies the selected transformation is provided in Section 4.
(3) To improve the numerical stability of the depth map computation, we follow the MVS step proposed by Zhang et al. (2019). Thus, the converted (and undistorted) depth maps are defined w.r.t. to a reparameterized space. Since Zhang et al. (2019) do not provide information on how to obtain the actual depth values, we provide a concise derivation in Section 5 that demonstrates how the inverse projection matrix allows to recover the actual depth values.  , MVE , OpenMVG & OpenMVS (Moulon et al., 2013;Cernea, 2021) and Meshroom (AliceVision, 2018).
state-of-the-art dense and surface reconstructions algorithms to reconstruct textured meshes from multi-date satellite imagerysee Table 1 for the main components of the proposed pipeline. We demonstrate in Section 6 the validity of our approach by performing a comprehensive quantitative evaluation, since the computed meshes exceed (in terms of completeness and median error) dense point clouds reconstructed with previously presented methods.
(5) To foster future research we make the source code of our pipeline publicly available 1 .

Related Work
In current literature of satellite image reconstructions there are two types of approaches: Stereo Matching and SfM. While the majority of methods such as (Kuschk, 2013), S2P (de Franchis et al., 2014) and ASP (Beyer et al., 2018) rely on Stereo Matching to compute 3D point clouds from satellite images, VisSat (Zhang et al., 2019) recently presented an SfM based approach to tackle this problem. While SfM allows to compute a three-dimensional scene representation reflecting the information of multiple images, Stereo Matching is limited to reconstruct point clouds for single image pairs. To apply Stereo Matching in the context of multi-date satellite imagery requires additional post processing steps to fuse the individual point clouds into a single scene representation -see Facciolo et al. (2017). Because of the number of image pairs increase quadratically with the number of input images, Stereo Matching based methods do not scale well for large image sets. SfM tackles the quadratic increase by exploiting specific algorithms and data structures such as scene graphs to accelerate the computation. Thus, SfM requires substantially less computation time than Stereo Matching to reconstruct multi-date satellite images -a corresponding evaluation is provided by Zhang et al. (2019). In addition, the output of SfM allows to directly leverage dense and surface reconstruction algorithms for dense point cloud computation, meshing and texturing. Current state-of-the-art dense reconstruction pipelines have been presented in Colmap , Meshroom (AliceVision, 2018), MVE  and OpenMVS (Cernea, 2021), which build on popular algorithms for dense reconstruction Goesele et al., 2007;Langguth et al., 2016;Barnes et al., 2009), meshing (Labatut et al., 2009;Kazhdan and Hoppe, 2013;Ummenhofer and Brox, 2017;Jancosek and Pajdla, 2011) and texturing (Burt and Adelson, 1983;Waechter et al., 2014). Table 1 shows a comparison of the different image-based (general purpose) pipelines and our proposed approach for multi-date satellite image reconstruction.

SATELLITE RECONSTRUCTION PIPELINE
In the following, we describe the proposed pipeline for reconstructing textured surface models of multi-date satellite imagery. The dependencies of the different pipeline components and their relationships are depicted in Fig. 1. Processing multi-date images of observation satellites with current state-of-the-art image-based reconstruction libraries easily exceeds memory capacities of modern workstations. To control memory requirements and processing times our pipeline determines an area of interest (AOI) in a given set of panchromatic and multispectral image pairs using a single bounding box defined in UTM coordinates. The panchromatic band is agnostic to color information, but shows a lower ground sample distance than the color bands captured by the multispectral sensor.
Since many state-of-the-art feature descriptors such as Root-SIFT (Arandjelovic, 2012) exploit image gradients instead of plain color information, we utilize panchromatic images to reconstruct the scene geometry, i.e. to perform SfM, MVS and meshing computations. The SfM component exploits the approach by Zhang et al. (2019) to locally approximate the RPC model (Tao and Hu, 2001)  ibration matrix is shown in (1). In order to represent the obtained SfM reconstruction with camera models containing no skew parameters (i.e. 10 degrees of freedom), we apply an intrinsic parameter decomposition of the registered cameras as well as a skew correction of the corresponding input images and depth maps. Further details are provided in Section 4.1 and Section 4.2. This representation allows us to utilize current state-of-the-art dense and surface reconstruction libraries to compute dense point clouds and non-textured meshes. In order to texture the reconstructed surfaces, we leverage PAN-Sharpening (Pohl and Genderen, 1998) to compute highresolution satellite images with color information exploiting the high resolution resolution of panchromatic images and the color information of low-resolution multispectral images. Analogously to the panchromatic images, we apply the skew correction step to the PAN-Sharpened images to utilize current texturing libraries.
We considered several state-of-the-art SfM pipelines to reconstruct satellite images as shown in Table 1. The evaluation conducted by Knapitsch et al. (2017) and Stathopoulou et al. (2019) suggest that Colmap  is currently one of the top performing SfM pipelines. Conveniently, there exists already VisSat (Zhang et al., 2019), a satellite specific adaption of Colmap, that allows us to perform a sparse reconstruction of multi-date satellite images. We considered several methods (Kazhdan and Hoppe, 2013;Ummenhofer and Brox, 2017;Jancosek and Pajdla, 2011) to reconstruct a watertight surface of the environment. Our quantitative evaluation in Section 6 suggests that the algorithm by Ummenhofer and Brox (2017) is a reasonable candidate to perform the surface reconstruction in the context of satellite images. As shown in Table 1 our pipelines uses the method by Waechter et al. (2014) in the texturing step.

Area of Interest Extraction
As previously mentioned, earth observation satellites typically use panchromatic and multispectral sensors for image acquisition to capture different resolutions and wavelength spectra. The panchromatic band is agnostic to color information, but shows a lower ground sample distance than the color bands captured by the multispectral sensor. We use the provided metadata of the corresponding multispectral image to extract the blue, green and red color band. Further, we filter all panchromatic and multispectral images with high cloud coverage (i.e. > 50%) to reduce the processing time using the corresponding metadata.
To compute the area of interest for each panchromatic or multispectral image, we use the corresponding RPC model to project a geo-registered rectangle (provided by the user) to pixel locations.

Tone Mapping
Since the satellite images are defined as high dynamic range images with long-tailed intensity distributions, a naive scaling results in images with low contrast. To tackle this issue we perform the tone mapping procedure proposed by Zhang et al. (2019) for panchromatic images that includes a gamma correction and a filtering of intensity values below the 0.5 percentile and above the 99.5 percentile. The gamma correction is defined by Ig = I 1/2.2 , where Ig denotes the gamma corrected image and I the original satellite image. To extend this approach to multispectral images, we apply the percentile filtering for each channel individually. This is necessary to avoid that a single color band dominates the filtering process, which potentially results in unintended color shifts.

PAN-Sharpening
Since the RPC models are geo-registered with a set of image specific ground control points, we are able to project the area of interest to each panchromatic and multispectral image pair. By extracting the resulting sub-parts, we obtain an aligned image pair that represents the same area. Because of the alignment we are able to exploit PAN-Sharpening (Pohl and Genderen, 1998) to leverage the high resolution of the panchromatic image and the color information of the multispectral image to compute a high-resolution image including color information.
Recently published PAN-Sharpening approaches (Fu et al., 2019;Deng et al., 2019) evaluate the algorithms only on images with low resolution. By conducting several experiments we observed that processing high-resolution imagery with these methods is infeasible because of their high computational requirements. For instance, the implementation by Fu et al.
(2019) required on our workstation more than 30s to process a 100 × 100 multispectral and a 400 × 400 panchromatic image. Instead we utilize a weighted variant (GDAL/OGR contributors, 2020) of the widely used Brovey transform (Pohl and Genderen, 1998). While the algorithm produces overall good results, it occasionally shows artifacts for individual pixels or small pixel regions -caused by strong reflecting surfaces such as vehicles. Since our pipeline computes the final mesh textures with the algorithm proposed by Waechter et al. (2014), a photo-consistency check ensures that such artifacts do usually not hamper the final appearance.

SUBSTITUTION OF RECONSTRUCTION RESULTS
We leverage the approach by Zhang et al. (2019) in order to locally approximate each pushbroom camera model with a finite projective camera (FPC) model (Hartley and Zisserman, 2004). Current state-of-the-art libraries for surface reconstruction and texturing such as MVE  or OpenMVS (Cernea, 2021) do not support the skew parameter contained in finite projective camera models. We address this issue by substituting the reconstructed FPC models with simpler calibration matrices containing no skew factors. To maintain the consistency of the reconstruction we perform a skew correction of the corresponding images and depth maps. In the following, we derive a decomposition of the FPC calibration matrix that justifies the selected camera model substitution and the conducted image and depth map transformation.

Decomposition of Finite Projective Camera Models
The calibration matrix of the FPC model is shown in (1), where fx and fy represent the focal length of the camera in terms of pixel dimensions in the x and y direction. Similarly, px and py define the principal point with respect to the two pixel dimensions. The value s reflects the skew of the camera. If s = 0 this means that the pixels are skewed such that the x-and y-axes are not perpendicular. We decompose the calibration matrix Kp of the FPC into a simplified calibration matrix K p and an affine transformation T p →p with Kp = T p →p K p ⇔ T p →p = KpK −1 p . In general, the transformation T p →p = KpK −1 p of two FPC models Kp and K p is given by (2).
The projection of a point in camera coordinates pc onto an homogeneous image point pI may now be reformulated according to (3).
(3) Equation (3) shows that the original calibration matrix Kp can be substituted with an arbitrary finite projective camera model K p by transforming the image projections with T −1 p →p pI .

Camera Substitution and Skew Correction
Many state-of-the-art MVS, surface reconstruction and texturing libraries including the (official) implementations used in our evaluation (see Section 6) support only skew-free camera models Ks, i.e. s must be 0. The choice for f x , f y , p x and p y is not restricted. By defining (f x , f y , p x , p y , s ) according to (f x , f y , p x , p y , s ) := (fx, fy, px, py, 0) we obtain the transformation Ts→p shown in (4). Thus, a part of the pixels will be translated outside of the image boundaries and important information will be lost. Instead, we define Ks according to (f x , f y , p x , p y , s ) := (fx, fy, px − s·py fy , py, 0) to obtain a translation-free skew-correcting matrix Ts→p as shown in (5). For each registered image Ip (corresponding to Kp) we compute a skew-free image Is and a skew-free depth map Ds (corresponding to Ks) using Is = T −1 s→p · Ip and Ds = T −1 s→p · Dp. While removing the skew factors we applied a cubic interpolation to compensate artifacts caused by the discrete nature of pixel based image representations. See Fig. 1 for a comparison of the original and the corresponding skew-free image.

REAL DEPTH MAP RECOVERY
Because the distances of the registered cameras and the reconstructed point cloud (in the context of satellite images) are extremely large, the values in the corresponding depth maps are compressed to an interval that is far away from the origin. To improve the numerical stability of the depth map reconstruction, it is reasonable to perform the computations in a reparameterized space. We follow the reparameterization proposed by Zhang et al. (2019), which uses plane-plus-parallax (Irani et al., 1998). Since Zhang et al. (2019) do not describe how to obtain the real depth map values, we provide a concise explanation in the following. The reparameterization is defined according to (6), where [x, y, z, 1] T denotes a point in homogeneous world coordinates and [u, v, 1] T a point in homogeneous image coordinates. P represents the (reparameterized) projection matrix from world to image coordinates, Z defines the conventional depth with Z = P31x + P32y + P33z + P34,Z indicates the average of the conventional depth values of the points computed in the SfM step, (0, 0, 1, d) defines a plane below the scene and m denotes the reparameterized depth with m = (Zz −Zd)/Z. Following this reparameterization the MVS step yields reparameterized depth maps representing vectors of the form [u, v, m] T . By rewriting (6) according to (7), it becomes apparent that the actual Z value can be obtained using only the reparameterized depth value m and the 4-th (the last) row of P −1 -in the following denoted as (P −1 )4.
To increase numerical stability, Zhang et al. (2019) normalize P and P −1 . Let n P −1 denote the normalization factor of P −1 . We obtain the real depth values using n P −1 · Z. By computing Ks, Is (see Section 4.1) and Ds for each registered image, we are able to utilize established libraries for MVS and texturing in the context of satellite images such as the algorithms proposed by  and Cernea (2021).

EXPERIMENTS AND EVALUATION
For our experiments we use multi-date satellite imagery of the IARPA MVS3DM dataset (Bosch et al., 2016). The ground sample distance of the panchromatic and the multispectral images is approximately 0.3 m and 1.3 m.

Quantitative Evaluation
To compare the proposed pipeline with previously reported results we utilize the benchmark presented by Zhang et al. (2019), which is based upon the IARPA MVS3DM dataset (Bosch et al., 2016). The corresponding ground truth consists of a set of geo-registered airbourne lidar scans covering different urban areas near San Fernando, Argentina. Zhang et al. (2019) follow the evaluation protocol by Bosch et al. (2016) to perform a metric comparison of the MVS and the ground truth point cloud. To this end, both point clouds are mapped to a predefined geo-registered grid representing a height map -with cell sizes of 0.5 × 0.5m 2 . The value for each grid cell is defined by the maximum height value of all corresponding points. Since the height map comparison is subject to the initial geo-registration, the evaluation protocol refines the alignment before performing the actual evaluation. Bosch et al. (2016) define two metric scores that represent the median error (ME) and the completeness (CP) of the reconstructed height map. Concretely, CP denotes the percentage of cells with an error less than 1m.
In order to exploit this benchmark for mesh evaluation, we extract a set of points from the mesh surface. Since the reconstructed vertex positions usually show a non-uniform (and texture dependent) distribution, they are only partly suited to represent the full surface geometry. Thus, we utilize poisson disk sampling (Cook, 1986) to extract a set of points following a random distribution while ensuring that the distance between any pair of samples is larger than a certain predefined threshold. We follow previous practice (Zhang et al., 2019) and fill small holes with adjacent values. In order to determine a state-of-the-art surface reconstruction approach for multi-date satellite imagery, we quantitatively evaluate the proposed pipeline with several modern meshing algorithms. Table 2 reports the corresponding ME and CP metrics based on surface points extracted with poisson disk sampling. Supplementary, Table 3 provides an analysis using only the reconstructed mesh vertices as surface representation. This allows us to compare the reconstructions with previous results  Table 2. Quantitative evaluation of the proposed pipeline with different state-of-the-art surface reconstruction algorithms. The surface points used to compute the metrics are extracted with poisson disk sampling. The evaluated surface reconstruction methods include Poisson (Kazhdan and Hoppe, 2013), Colmap Labatut et al., 2009), OpenMVS (Cernea, 2021;Jancosek and Pajdla, 2014), FSSR  and GDMR (Ummenhofer and Brox, 2017). FSSR and GDMR require scale information for each point in the fused point cloud to triangulate the corresponding mesh. In this case we substitute the depth map fusion approach of VisSat (Zhang et al., 2019) with the equivalent processing step of MVE (Goesele et al., 2007). The metrics completeness (CP) ↑ and median error (ME) ↓ follow the definition by Bosch et al. (2016) Table 3. Quantitative evaluation of the proposed pipeline with different state-of-the-art surface reconstruction algorithms. The reconstructed mesh vertices are used as surface representation to compute the metrics. See Table 2 for a description of the different algorithms and metrics. As baseline for our comparison we use the vertices of the fused point clouds computed with VisSat (Zhang et al., 2019) and MVE (Goesele et al., 2007). The top three results are highlighted with green (first), blue (second) and red (third).   presented by Zhang et al. (2019). If viable, we utilize the default mesh reconstruction parameters whenever possible -an overview of (non-)deviating parameters is presented in Table 4. Since points extracted from mesh surfaces with poisson disk sampling are more evenly distributed than reconstructed mesh vertices, they provide a better representation of the corresponding surface. Thus, Table 2 contains the main results of our quantitative evaluation. By analyzing the corresponding outcome, we observe that our pipeline achieves the best results with GDMR (Ummenhofer and Brox, 2017). This approach yields not only the highest completeness score, but also stateof-the-art median height errors. This result is very remarkable considering the fact that GDMR relies on the depth map fusion presented in MVE , which produces noisy point clouds compared to the fused points obtained with VisSat.

Meshing
In Table 3 we use the mesh vertices to define a point set representing the reconstructed surfaces to conduct a fair comparison of the triangulated meshes and a set of fused point clouds computed with the approach by Zhang et al. (2019). We observe that the reconstructed mesh vertices outperform the fused points for both metrics, which demonstrates the benefits of the proposed pipeline. Note that our approach achieves (again) the highest completeness scores with GDMR. However, the configurations with Poisson (Kazhdan and Hoppe, 2013) and Colmap  obtain lower median errors. The different rankings in Table 2 and Table 3 demonstrate the importance of selecting an appropriate point sampling method. Note that the evaluation in Table 3 is biased by the number of mesh vertices. For instance, higher vertex numbers will most likely affect the resulting completeness scores. Table 5 gives an overview of the corresponding mesh vertices. A comparison of Table 2 and Table 3 shows that GDMR sacrifices accuracy of vertex positions to obtain faces closer to the actual geometry, which emphasizes the relevance for selecting an appropriate point sampling method. Because of the non-deterministic behavior of SfM we achieved slightly different results in the depth map computation step compared to the original values reported by Zhang et al. (2019). Fig. 2 shows several reconstructed meshes using the proposed approach with GDMR for meshing, which represents the best configuration of our pipeline. The different reconstructions represent the sites used for the quantitative evaluation. Fig. 3 visualizes the plain geometry of different meshes obtained with Poisson, Colmap and GDMR. We observe that the reconstructed surfaces exhibit different characteristics (e.g. varying levels (a) Textured mesh of site 1 reconstructed with GDMR.

Qualitative Evaluation
(b) Textured mesh of site 2 reconstructed with GDMR.
(c) Textured mesh of site 3 reconstructed with GDMR. Figure 2. Qualitative evaluation of the proposed pipeline using using GDMR (Ummenhofer and Brox, 2017) for three sites contained in the multi-date satellite image dataset provided by Bosch et al. (2016). of completeness, smoothness or mesh granularity), which are consistent to the results reported in Section 6.1.

CONCLUSION
This paper presents an approach to reconstruct textured watertight meshes for multi-date satellite imagery including a detailed description of different pipeline steps and their dependencies. In contrast to previous work, our method leverages current dense and surface reconstruction algorithms to obtain a representation of the scene. We present a decomposition of FPC calibration matrices that allows us to substitute the camera models in the SfM reconstruction with simpler camera models containing no skew factors. To ensure consistent reconstruction results we apply corresponding skew corrections to input images and depth maps. These adjustments enable us to utilize state-of-the-art mesh reconstruction algorithms in the context of satellite images. The paper presents an extensive quantitative evaluation of the pipeline on multi-date satellite images. By sampling points from the reconstructed mesh surfaces we compare our surface reconstructions with point clouds obtained (c) Mesh of site 3 reconstructed with GDMR. Figure 3. Qualitative evaluation of our pipeline using multi-date satellite images by Bosch et al. (2016). We observe that Poisson (Kazhdan and Hoppe, 2013), Colmap  and GDMR (Ummenhofer and Brox, 2017) reconstruct meshes with different shape characteristics.
with previously proposed SfM and MVS methods. Our evaluation shows that current state-of-the-art meshing algorithms outperform previous results in terms of completeness and median error. To foster future research we make the source code of the proposed pipeline publicly available.