ESTIMATING HEIGHTS OF BUILDINGS FROM GEOTAGGED PHOTOS FOR DATA ENRICHMENT ON OPENSTREETMAP

Abstract. To reconstruct 3D building models, building footprints and heights are essential information. From OpenStreetMap (OSM), we can easily obtain footprints. However, building height is usually missing. In order to yield the height information of building in OSM, this paper proposes a geometric method to estimate building height from geotagged photographs. This method explores the geometric relationship between the perspective centre of geotagged photos and buildings. Through matching photos and OSM, building height can be estimated according to the ratio of height to width of building. The proposed method can be divided into three parts. First, automatic geometric correction of photos is realized by using vanishing point tracking. After that, a semi-automatic scene search method is proposed to match the geotagged photograph and OSM. In this step, geographic coordinates of photos are used to locate a photographic scene. According to the edge of the building in the photos, corresponding footprints in OSM can be found. Finally, based on the length of the associated edge in the building footprint in OSM, the height of building can be calculated. Using Flickr photos and OSM in London, we experiment with the proposed method. The robustness of the geometric model has been verified. Experiments show that the proposed method is pertinent as the estimated height has expressed a proper ratio with its width, which is the same as the corrected photos. In particular for automatic geometric correction, which can achieve the same good results as the correction of manual operation.



INTRODUCTION
With citizens as sensors, Volunteered geographic information (VGI) (Goodchild, 2007) provides us with massive data for updating and reconstructing three-dimensional building models worldwide (Over et al. 2010, Bagheri et al., 2019. For instance, Flickr is directly georeferenced, and thereby can provide actual 3D building information. In fact, it is known that OSM can provide building footprints, which is a vector element formed by a two-dimensional outline of building. Taking these vector elements as basis, 3D building of CityGML LOD1 (Gröger et al. 2012) can be reconstructed by assigning heights. In this work, one difficulty is how to accurately obtain building height. In existing methods, such as extracting height information from multi-source sensor data is reliable. However, this method requires professional data collection, such as oblique photogrammetry, which is a large workload for urban 3D reconstructing. Another method is to extract the height information of buildings from geotagged photos. Geotagged photos as a kind of VGI data provide a large number of building images (Basiouka, 2014). Fan and Zipf (2016) estimate floor height to be 3.5m. Therefore, the height can be estimated by calculating the number of floors. Obviously, this is very loose. To accurately extract the height of building, it is very critical to solve perspective distortion in photos.
In past research, some methods of automatic image correction mainly focused on two aspects. One is to correct a target which has a fixed length-width ratio, such as license plates, magazines. Secondly, a photo can be corrected by adjusting the distortion parameters of the camera. Sihombing et al. (2016) propose the method of correcting license plate image with plane homography matrix. This method takes advantage of fixed size of license plate and fixed ratio of length to width. By morphological method, four corners of the license plate on the license plate image can be determined. By correcting these four corners to a fixed position, the license plate image can be determined. (Wang et al., 2005;Brown and Tsoi, 2006) propose the method to correct rectangular planar graphics (eg posters, maps, etc.), respectively. These two methods eliminate lens distortion, and transform straight lines into curves by least squares error estimation and Hough transform. Then, corrected images can be generated by texture fusion. Liang et al. (2008) propose a boundary interpolation method to deal with tilt problem of printed manuscripts. This method makes use of the cross-correlation of text content in printed images. By establishing cross-correlation function, image region can be randomly selected for boundary detection. Therefore, deformations of printed pictures can be processed according to oblique values of detected edges. Habib (2003) propose a calibration method for low-cost camera. This method first extracts lines in an image, and then establishes a test field to calculate distortion parameters of this image. According to a parametric model under different distortion conditions, the image can be corrected. Geetha and Murali (2013) use the algorithm of edge detection to determine corners in an image. Through plane homography matrix and perspective transformation, these corners can be reassigned to a fixed size plane. Chaudhury and Diverdi (2015) also uses edge extraction to correct images. In addition, in remote sensing image correction and stereo vision matching, many researchers have proposed relevant image correction methods, such as (Basta, 2017;Tuytelaars et al. 1998;Bensoukehal, 2015).
In the above methods, an image which has special characteristics, i.e. the size of rectangle and the size of license plate, can be corrected very well. However, these images of building do not always have prior knowledge. Moreover, if the camera parameters cannot be obtained, the above correction method cannot work well. In order to explore a general method for image correction of building in various scenes. This paper proposes a geometric model by means of tracking vanishing points (Elloumi et al., 2013). By calculating the vertical and horizontal deflection angle, the automatic correction of photographs can be realized. The height of building can be calculated through the ratio of height to width of building. Before the calculation, a matching method is needed for extracting the actual length of building from corresponding building footprint in OSM. Considering the influence of GPS drifting, the geotagged photos may not record accurate coordinates. Therefore, a semi-automatic method is proposed to match the photos and OSM.
In particular, there are two main contributions to this work: • A robustness geometric correction model has been proposed for geotagged photos.

•
The matching between OSM and geotagged photos has been realized by means of a semi-automatic method.
The remaining of this paper is structured as follows: Section 2 introduces the process of geometric correction. Section 3 presents a semi-automatic method for matching photos and OSM. Next, a comprehensive discussion of experiment results is provided in Section 3,. Finally, we summarize and propose possible future work in Section 5.

THE GEOMETRIC CORRECTION MODEL
In the real world, perspective centres are not always perpendicular to building planes. Assuming such scene in Figure 1 (a), where a reference coordinate system O-XYZ can be set with X-axis parallel to the building bottom edge, and Zaxis parallel to the building vertical edge. On the basis of that, we can introduce the deflection angles of perspective centre to building plane as follows: 1) left view of this scene in Figure  1(e) implies the vertical deflection angle, which has been represented as φ; 2) the direction of perspective centre is at an angle of ω with X-axis, which results in a horizontal deflection angle, as depicted in Figure 1(b) and (d); 3) κ denote as a skew angle, which is caused by a deflection around the central optical axis. To obtain an orthogonal view of photos, the deflection angles φ, ω and κ should be corrected firstly. In photogrammetry, we can also understand that these deflection angles are three of the external orientation parameters. And deformation of photo can be understood as the deformation caused by the three external orientation parameters. Following that, we propose a geometric correction model, such as the projective geometrical model in Figure 1. By using the collinear equation, such as Equation (1), we can build a connection between photos and the real world.
is denoted as Rotation matrix, which can be calculated by the deflection angles φ, ω and κ. (XS, YS, ZS) is the position of camera in ground coordinates system. In order to facilitate calculation, in this work, we set XS, YS, C to be 0, respectively. Because we do not need to calculate the real coordinates of the building when correcting the photo. Based on the above, an auxiliary coordinates system of image space can be built. And, (XA, YA, ZA) can be viewed as a 3D coordinate of image space auxiliary coordinates system, which corresponds to a point in the photo. We introduce how to obtain these parameters in the following section.

Geometric relationship between perspective centre and building plane
The geotagged photos shared by volunteers are characterized by various scenes and angles. As a result, it is difficult to determine the parameters of the photos, especially the Field of View (FoV). Some photos have provided the Exif data, which records the FoV. However, the others need an estimate to obtain the parameters of photos . We investigate the FoV of smartphone cameras, and determined that the standard field of view angle of the camera is 45 degrees. This is also the standard data provided by most camera manufacturers.
As shown in Figure 1(c) and Figure 1(d), for a given image plane, its relative position relationship between camera and the image can be determined when FoV angle and deflection angle of camera are determined. In reality, photos usually do not provide a deflection angle. According to the principle of perspective transformation, deformation of the image can change according to camera position. We therefore attempt to obtain the deflection angle by analysing geometric relationship between the deformation on the image and the camera position.
In the left part of Figure 2, we indicate the relationship between the vertical deflection angle and the vertical deformation on the image. As shown in Figure 2, total height of this building is set to H, the height corresponding to upper half of the image to H1, the height corresponding to lower half to H2, and the focal length to f. According to a supposing deflection angle φ, we can calculate the relative height of building in photo coordinate system, which has shown in Equation (2).
(2) Figure 2. The deflection angle in the vertical direction From Figure 2, we cannot get the actual height of the building due to the deformation. As we all know, the closer to the perspective centre, the less information a pixel on the image represents. That is to say, the amount of information represented by different pixels in an image is proportional to the degree of distortion of the image. We use the vanishing point theory to describe the image deformation in vertical direction. In reality, the actual length of w1 and w2 in Figure 2 should be equal. However, due to the distortion of the image, w1 and w2 have different lengths on the image. Therefore, we use an integrated idea to approximate the area of the upper and lower half of the photo, which is equal to the ratio of the actual height of the upper and lower half of the building. This ratio m can be considered as the ratio of the area of two trapezoids, as shown in the right part of Figure 2.
Where, according to the Midline of Trapezoid. Connecting the vanishing point with the lower left corner and the lower right corner of the image, we can get two bottom angles β1 and β2 of the trapezoid. Let the number of rows and columns of an image be R and C. In fact, w2 equals R, and we can calculate w1 based on w2. Considering the influence of trapezoidal bottom angle on the top edge, we can obtain w1 by Equation (5). 2 1, 2 tan( 1) tan ( 2) 2 Finally, according to Equation (2) and (5), we can get the vertical deflection angle.
Similarly, for the horizontal deflection angle ω, we indicate the relationship between the horizontal deflection angle and the horizontal deformation on the image, as shown in the left part of Figure 3. We can calculate ω as follows: Where, m2 can be considered as the ratio of the area of two trapezoids in horizontal directions, which can be calculated by referring to Equation (3) For the deflection angle κ, it is the angle of rotation on the central optical axis. In fact, it does not cause perspective transformation. Instead, it creates an angle between an image plane and the ground plane. We transform the calculating of κ to calculate the position of the vanishing point. It can be imagined that if κ = 0, the vanishing point caused by the horizontal deformation must coincide with the horizontal median line of the photo. Otherwise, the angle between the direction of vanishing point and photo centre and the direction of median line of the photo is equal to κ.

The Correction of photos
After determining the deflection angles, we can correct the distortion by eliminating the deflection angle. Specifically, the photo is projected along the photographic line onto the assumed architectural plane, thereby obtaining a new orthogonal photo plane, as shown in plane A-B in Figure 4. In this process, the position of the camera needs to be calculated first. Take the correction of the horizontal deflection angle as an example. To determine the relative position s(xs, ys) of the camera in photo coordinate system. We make point a coincide with point A in Figure 4, and set the coordinates as (R, 0), where R is the length of the image plane.
We know that the triangle Δsab surrounded by the plane a-b and the perspective centre s is an isosceles triangle. According to the formula of the vertex angle of isosceles triangle and the coordinate transformation, the position of the camera in the coordinate plane o-xy can be obtained as follows:  (l2, m2). The number of rows l2 = y.D-y.A (y.D needs to be calculated according to Figure 1(e) and the number of columns m2=xA-xB. Therefore, the pixels in the original image need to be allocated to the new image plane by using the bilinear interpolation algorithm.

Searching for an angle from camera to building
An angle between the vertical direction from the perspective centre to the building (its distance is set to d1) and the actual direction from the perspective centre to the building (its distance is set to d2) can be found in Figure 5. This angle corresponds to the horizontal deflection angle ω in the photograph, such as the description in Section 2.1. Under the condition of d1, d2 and ω, the angle of photo relative to building can be searched, as shown in the Figure 5. According to the principle of photogrammetry, the angle in the photo coordinate system is equal to the real angle in ground coordinate system. According to the angle of photography mentioned in Section 3.1. We do not need to pay close attention to the real location of the photography centre. This is because the geotagged photos provided by volunteers sometimes do not provide accurate positioning information. Following that, a scenario can be estimated manually in OSM. Based on the estimated scene and the FoV, we can discover the real location of the buildings recorded in the photos. After that, we can easily acquire the length of the building from OSM according to the building footprints.

Locating the building in OSM
As shown in Figure 6, after determining the boundary of the building in the image, four edge lengths of the building facade, H1, H2, L1 and L2 can be obtained by calculating the coordinates of the building boundary. The bottom edge length L1 corresponds to the l in OSM with real geographic information. The ratio of length to height of the building in the image is L3/H1. The actual length of L3 corresponds to the building width W in OSM, the height of the actual building can be calculated:

The experimental data
In our experiment, some photos in London have been downloaded from Flickr website. We retrieved all publicly available, geotagged Flickr photos in London since 2016, and obtained more than 489 thousand photos. Through open source scene recognition models of deep learning, we select photos containing buildings. According to the ID of the photo, we can use the API provided by Flickr to get the geographic coordinates of the photo, a reference research has been proposed by (Ding, 2020). Using geographic coordinates, we downloaded building vector data from OSM.

Determination of vanishing points
For a facade, the two most common types of straight lines are: horizontal straight line and vertical straight line. When the slope is between -0.5 and 0.5, we consider it is a horizontal straight line. And the slope is between tan(π/3) ~tan(2π/3) we consider it is a vertical straight line. Thus, two groups of straight lines with different marks are obtained by means of the method (Xue et al. 2019). By calculating the intersection of these straight lines, the set of undetermined vanishing points shown in the green points in Figure 7 can be obtained. Because the straight line obtained under the constrained conditions does not completely meet the horizontal or vertical conditions, there are many false vanishing points, such as the green points far from the centre. Therefore, we use least squares estimation to fit the centre of all the calculated vanishing points, so as to determine the real vanishing point position, as shown in the red point in Figure 7.

The rectification of VGI photos
The following Figure 9 a(1), b(1), c(1), d(1) are based on the calculated vanishing point, which emits rays to the image edge to form the grid planes. The magenta lines are the deformation of the vertical direction, and the blue lines are the deformation of the horizontal direction in image plane. After correction, these two groups of straight lines are perpendicular. Using the proposed algorithm, the distortion of photo can be solved.

Matching photos and OSM
Taking Figure 9 (a) as an example, the photo can be downloaded from Flickr website according to its ID. Using Flickr's API, we can acquire the coordinates of the photo. Since we need to manually match photos and building footprints in OSM. Therefore, we need to search in the proper range to reduce the search time. Based on the geotagging coordinates of the photos, we zoom the OSM to level 17 and downloaded the building footprint information on this scale. Through the matching method mentioned in Section 3.1, the building footprints corresponding to the photo can be found, as shown in Figure 10. Figure 10. The matching between photo and footprints in OSM

Enrichment of building height
As shown in Figure 11, each building has a different height. Therefore, we need to manually separate these buildings and assign the corresponding length to each building according to the footprint in OSM. In the photo, the length of each building can also be calculated, such as L1, L2, L3, L4 shown in Figures  11. According to the Equation (13), we can calculate the height of the building, such H1, H2, H3, H4. It should be pointed out that for the photo with such multiple buildings. There is currently no good way to automatically separate them. We will solve this problem in the next job. In addition, we also attempt to quantitatively evaluate the accuracy of the proposed method. However, these photos were randomly downloaded from Flickr. Therefore, it is difficult for us to acquire their true height from the existing data. Through visual interpretation, we can see that the corrected photos show a reasonable aspect ratio on windows, doors and other objects. We can also infer that this method of height estimation is feasible.

CONCLUSION
In this work, a method of geometric correction of geotagged images without any auxiliary information is proposed. The relative position of the perspective centre is estimated by combining vanishing points and image line features. By constructing a grid and resampling gray value of the image, the geometric correction of the image without auxiliary information is realized. A semi-automatic matching method is proposed to match the VGI image with OSM. By calculating the actual and vertical distance from the perspective centre to the building, the appropriate boundary of the building corresponding to the image is obtained. The height information of the building is estimated by using the vector boundary of the building in OSM.
Some limitations also need to be pointed out. 1) The proposed geometric correction model depends on the location of buildings in the photo. Since we only consider extracting vanishing points from a building plane, the vertical and horizontal deflection angles can be calculated. However, when the building in the photo has facades in two directions, the directions of the lines extracted from this photo may exceed two. This will cause a failed correction result. 2) During the matching process, we believe that there is only a small error between the coordinates of the geo-tagged photos and the real position, for example, no more than 2m. Therefore we can semiautomatically search for the associated building footprints that match the photos. Although some geotagged photos with GPS tags can have small errors (within 1m), there are unstable data provided by photo-sharing websites.
In fact, the proposed method has a good application for the photo which is taken and located by users themselves. Because it can provide a reliable match for photos and building footprints. We therefore encourage volunteers to enrich OSM's height information at the same time through the proposed method when uploading photos. In our future work, we will update our method in GitHub: (https://github.com/wangyuefeng2017/photo-correction).