3D RECONSTRUCTION OF ON-/OFFSHORE WIND TURBINES FOR MANUAL AND COMPUTATIONAL VISUAL INSPECTION

The expansion of off-/onshore wind farms plays a key role in the transformation of energy production from burning of fossil fuels and nuclear energy to sustainable and safe power generation. However, the wind energy sector is permanently under strong cost pressure and the maintenance of the turbines is currently still carried out quite expensively with human industrial climbers. In this article, we present the results of an interdisciplinary research project on the automation of various image-based inspection steps. Since the use of unmanned aerial vehicles (UAV) is a problem especially offshore, we present here a simple, cost-effective method to obtain a three-dimensional model of a wind energy plant using solely a digital camera equipped with a sensor array to use it for the detection and management of damages and abnormalities. A first approach to detect abnormalities on the surface with deep learning methods achieved an F1-score of about 95%.


INTRODUCTION
As part of the energy conversion process, renewable energy sources, such as wind turbines, are becoming increasingly important. One of the consequences is that more and more onshore and offshore wind farms have been built in recent years and more will follow in the coming years (Bilgili et al., 2011, Sun et al., 2012. Wind turbines must be inspected regularly to avoid consequential damage arising from structural damage and to enhance the life cycle performance (Rangel-Ramirez, Sørensen, 2008, Karyotakis, 2011. However, the increasing number and size of onshore and offshore wind farms requires an efficient inspection workflow. This includes (a) the acquisition of images, (b) the detection and classification of damage and (c) the evaluation of the damage by experts. With regard to (a), the workload of manual image acquisition by industrial climbers needs to be leveraged by ground based and/or unmanned aerial vehicle (UAV) photography. With regard to (b), the detection and classification of damage in the large image data sets must be facilitated by the use of machine learning and computer vision techniques. With regard to (c), experts need to use the results of (b) to decide on actions for the maintenance of the installations. For this purpose, the original images of the damages as well as the relative positions of the damages to each other and their positions on the tower of the wind turbine must be made easily accessible to the experts. This research focuses on the creation of a three-dimensional model of the wind turbine for interactive visual representation of previously detected damage. To keep the data acquisition simple and affordable, the 3D model uses only the images from a normal (i.e. not stereo) camera, the GPS position and the tilt angle of the camera. In addition, we performed a first study on machine learning based classification of coating damage on wind mill piles.
Three-dimensional reconstruction is an active topic in remote sensing (Li et al., 2019, Stathopoulou et al., 2019, Schonber- * Corresponding author ger, Frahm, 2016 for which a variety of methods exist. However, some existing methods use data generated with special equipment like stereo cameras (Shen, 2013, Sengupta et al., 2013 or laser scanners (Adan, Huber, 2011). Other methods use structure from motion (SfM) (Ma, Liu, 2017, Stathopoulou et al., 2019, Schonberger, Frahm, 2016. SfM detects keypoints in images generated by a camera moving around the object and estimates the depth from the movement of the keypoints. SfM is often used for the 3D reconstruction of archaeological sites (Pollefeys et al., 2003) or places of interest like temples/churches and works best with images that show a sufficient number of prominent keypoints. However, images of wind turbines usually hardly offer any keypoints (see Section 2 below) and data from laser scanners or stereo cameras are often not available. In the context of 3D terrain modeling, orthorectification methods like (Baiocchi et al., 2004) are applied to aerial or satellite images to remove distortion caused by e. g. uneven terrain. But these models usually do not deal with the presence of background in the images and assume that the camera orientation is completely known which is not the case in the present work.
In this work, we propose a new approach for the 3D reconstruction of on-/offshore wind turbines in windparks that neither uses keypoints nor data from laser scanners or stereo cameras. The proposed method uses images taken at four different positions with a customary camera. The camera positions relative to the tower of the wind turbine can be determined using the GPS coordinates. An approximate orientation of the camera can be obtained from the GPS positions and the tilt angle of the camera. However, traditional image processing methods need to be used to correct the inaccurate specification of the yaw and roll angles. With that information, projective mapping can be used to map the images to a cone frustum model of the tower. Due to the large number of images, a projection of the images during visualization is not feasible. So a texture based on the images is precomputed and wrapped around the cone frustum. In this way, the visualization can run in environments that do not have many resources, like e. g. a web browser on an of-the-shelf notebook or tablet. To the best of our knowledge, no methods exists that combine image processing and computer graphics in this way to create and visualize a 3D reconstruction.
The acquisition of the images is described in the following Section 2. The computation of the 3D texture is described in Section 3. Section 4 evaluates the quality of the result and the required effort. Finally, Section 4 gives an overview of limitations and potential of the proposed method.

MATERIALS
To demonstrate the method, we created a dataset using as less equipment and preparatory work as possible. For image capturing, a customary Nikon D7100 was used. A Solmeta Geotagger GMAX N3 supplies GPS-Coordinates and a tilt angle for every image.
Images of the wind turbine have been taken from four different camera positions c1, . . . c4. The four positions were arranged in approximately 90 degree angles around the wind turbine and had a distance of approximately 50 m to the turbine each. At each position ci a number Ni of images I j i have been taken with different tilt angles t j i to capture the whole wind turbine. Each image has a size of H = 6000 pixel width and W = 4000 pixel height.
In total, the dataset consists of four camera position c1 . . . c4 each coming with a set of views is an image with an aspect ratio denoted by a = 1.5 and angular aperture of α = 11.8 • taken with a tilt angle t j i and at a height of hcamera = 1.8 m. The number N1 . . . , N4 of views per position depends on the dataset and the height of the trunk. However, the number of views for a trunk with e. g. 80 m height is between 20 and 30 images for each position. See Figure 1 for example views.

METHODS
As described above, we want to construct a texture for the tower of the wind turbine. The whole tower is modeled as a cone frustum of height h with radius r bottom and rtop at the bottom and the top, respectively. The texture will be created based on • the images of the wind turbine • the camera positions from the Geotagger • the tilt angle given by the Geotagger for each image.
As the model of the tower is invariant to rotation around the y-axis, the position of the textures left and right border can lie on the intersection of the model with any plane through the yaxis. The computation in Section 3.2 assumes that the image borders lie on the intersection of the model with the yz-plane (see Figure 2). To do so, we first bring the camera positions and the tower of the wind turbine into a coordinate system with the center of the towers bottom in the origin of the coordinate system (Section 3.2). As no yaw and roll angles are given, computer vision methods are used to rotate and shift the images in a way that allows us to assume that the camera always points at the center of the tower of the wind turbine (Section 3.1). Finally, computer graphic methods are used to assign to each pixel of the texture a pixel of one of the original images (Section 3.3).

Compensate for yaw and roll inaccuracies
In this section, each image is rotated and shifted in the way that the tower of the wind turbine is exactly vertical and in the exact center of the image. The process is based on the towers left and right edges. The detection of those edges might be erroneous in some cases as the edges might be blurry or be confused with edges of other wind turbines in the background of the image. Therefore, this process is not fully automated but needs an interactive step to let an expert select the correct edges of the tower if necessary.
The first steps for finding the towers edges, are • convert the original image to a grayscale image • apply the Sobel filter (Gonzalez, Woods, 2006) to the grayscale image • apply Otsu thresholding to the filtered image • apply the Hough transform to transform the result into the Hough space • Find local maxima in the Hough space (Duda, Hart, 1972) H The local maxima in the Hough space form a set representing lines in the original image by The lines l1 = argmax(H) and l2 = argmax(H |D/{l 1 } ) reflect in most cases (see Section 4 below) the edges of the tower. However, in a few cases this does not apply, so the correct lines have to be chosen interactively by a human expert.
Let ϕ1 denote the angle between l1 and the y-axis and let ϕ2 denote the angle between l2 and the y-axis. After rotating the image by an angle of −(ϕ1 + ϕ2) the tower is perfectly vertical.
We rotate l1 and l2 by −(ϕ1 + ϕ2) and denote the result byl1 andl2, respectively. Let p 1 top denote the intersection of l1 with the top of the image and let p 1 bottom denote the intersection of l1 with the bottom of the image. Equivalently, let p 2 top denote the intersection of l2 with the top of the image and let p 2 bottom denote the intersection of l2 with the bottom of the image. The center of the tower is then given by (cx, cy) = (p 1 top + p 1 bottom + p 2 top + p 2 bottom )/4. After a horizontal shift of the image by cx, the image will be perfectly aligned with the tower being vertical and exactly in the middle of the image.

Compute camera positions in world space
GPS coordinates are given in a coordinate system that has its origin at the earth's center. Let p denote the position of the tower of the wind turbine and ci the i-th camera position in this coordinate system. For the computation described in the the next section 3.3, the camera positions and rotations as well as the tower of the wind turbine have to be represented in a coordinate system that has its origin at the bottom base of the tower and the y-axis on the towers main axis (see Figure2). We refer to this coordinate system as the world space.
To transform the coordinates into the world space, we first rotate the coordinates into a coordinate system with the center of the towers bottom base on the y-axis: Let n = p/ p and let k = n × [0, 1, 0] T denote the crossproduct of n and [0, 1, 0]. With I the identity and K the crossproduct matrix of k, the Rodrigues rotation rule (see (Rodrigues, 1840)) states that for R defined by the equation R · n = [0, 1, 0] T holds. In other words R rotates n (and thus also p) onto the y-axis. Applying R to p and to the camera positions ci rotates the tower and the camera position approximately (i. e. neglecting the earth curvature) into a plane parallel to the xz-plane.
Finally, the position of the tower in world space is and the final camera positions are For the rotation of the cameras, the tilt angle is given by the sensor (see section materials above), and through the preparatory work in the previous Section 3.1 the yaw angle of the camera at position pi is given by and the roll angle is r j iz = 0.

Project images onto the texture
To create a U × V pixel sized texture of the tower of the wind turbine, first each texture pixel needs to be assigned to a coordinate in the world space. While doing so, we assume that the texture is wrapped around the frustum in a way that the leftmost column of the texture is in the yz-plane (see Fig.2). Second, for each pixel on the texture the view o j i has to be determined that captures the corresponding world coordinate best. Last, a color has to be assigned to the texture pixel based on the view o j i . To do so, projective texture mapping (see (Everitt, 2001)) can be used, pretending a projector that • is located at camera position pi • faces into the direction given by the tilt angle t j i from the sensor and the yaw and roll angles r j iy and r j iz from the previous section projects the image I j i onto the tower.
For the first step, let r and c be the row and column coordinate of a texture-pixel pt, respectively. The y-coordinate of pt in world space is then given by The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) The towers radius at height y is given by With ω = 2π − 2π · c U the x and z coordinates of pt in world space are given by x = cos(ω) · R (9) and z = sin(ω) · R.
For the second step, let (x, y, z) denote the coordinates in world space of a texture pixel. We first note that the best camera position is given by pî witĥ The best view is then given by pĵ i witĥ For the last step, let (rt, ct) denote the row and column coordinates of a texture pixel and let (x, y, z) denote the corresponding coordinates in the world space as computed in the first step. In what follows we compute pixel coordinates (rI , cI ) of the pixel of Iĵ i that is projected to (x, y, z). For ease of notation, let α denote the tilt angle tĵ i and β denote the yaw angle rĵ iy associated with the view oĵ i .
First, a perspective projection is performed that projects (x, y, z) onto the plane that is orthogonal to the cameras direction of view and contains the camera. This is achieved by multiplying and The row and column coordinates (rI , cI ) of the pixel pI of Iĵ i that is projected on the texture pixel pt are given by rI = u/r and cI = v/r.

Computational damage detection with convolutional neural networks
In order to test the automation of the detection of damage and abnormalities, a set of 20 images was made available by a professional inspector. To obtain annotations with a high level of detail, the online annotation tool BIIGLE 2.0 (Langenkämper et al., 2017) (see Figure3) was used. The annotated images where cut into 1200 images patches. As there was only a single patch containing rust, that patch was removed from the dataset. In the remaining 1199 examples, 323 of the patches showed surface damage and 876 showed no surface abnormalities, i.e. have been labeled inconspicuous. The dataset was split into a training and validation set of size 959 and 240, respectively. A Con-

EVALUATION
The goal of this paper was the cost and time efficient construction of a 3D model of a wind energy plant for the localization and management of damages and abnormalities. The resulting texture based on the images described in Section 2 is shown in Figure 4. Some artifacts show up in the texture, e. g. a discontinuity at the top of the door caused by inaccuracy of the tilt sensor (see the red ellipse in Figure 4). However, the texture gives a good overview of the appearance of the tower. The correction of the yaw and roll angles improves the quality of the texture significantly as can be seen in Figure 5. The image on the top is a part of a texture generated without correction of the yaw and roll angles. This part shows background (here: sky) on the tower texture and a welding seam that is not horizontal. The texture on the bottom is generated with correction of the yaw and roll angles and both artifacts are no longer present in the texture.
The effort for the overall process of creating the 3D model, is broken down into the effort for recording the dataset and the computation of the texture. For the dataset, a customary digital camera and a customary sensor array can be used. Taking the images took about 20 to 30 minutes for an entire wind turbine.
For the computation of the texture, a customary Mac Book Pro (2,7 GHz Quad-Core Intel Core i7, 16 GB Ram, Intel Iris Plus Graphics 655 1536 MB) was used. The time for the computation of the texture depends on the resolution of the texture. A texture with size of 377 pixels width and 1201 pixel height, takes 1 minute and 16 seconds. A texture with doubled resolution (755 pixels × 2401 pixels) takes 4 minutes and 12 seconds. The effort for the interactive correction of the wind turbine edges is fairly low: In 8 images the edges had to be corrected manually, i. e. 92% of the images are processed fully automatic.

CONCLUSIONS
We proposed a method for 3D reconstruction of wind turbine towers in on-/offshore windparks. The overall process, including image taking and reconstruction is time and cost efficient and can be done with customary hardware. The resulting texture reflects the appearance of the tower although a few artifacts occur. This artifacts are caused by inaccuracies of the tilt angle and by elevation of an uneven terrain.
No preconditions (e. g. number of camera positions, number of images) are used by the method except that • a tilt angle of the camera is given for every image • the position of the camera (including the hight) is given for every image Figure 4. Texture for the 3D model of a wind turbine based on the dataset described in Section 2. The red ellipse highlights an example artifact.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) Figure 5. A comparison of a texture detail without (top) and a texture detail with (bottom) correction of the yaw and roll angles. The upper texture clearly shows an artifact (background) that is compensated in the texture at the bottom.
• the angle of view is known for every image • every point on the tower has to be shown in at least one of the images • two edges (left, right) of the tower must be displayed in every image.
We are therefore confident that this method shows the potential to be adapted for the processing of UAV images if the UAV is equipped with a height sensor and the images are taken from a sufficiently large distance.
Moreover, we showed the usefulness of this method for managing and visualizing wind turbine towers and their damages in an efficient way. Figure 6 shows an application that shows damages of a tower in an interactively rotatable/movable 3D model and allows to show any damage or point on the tower in the original image