MODELLING A CELL TOWER USING SFM: AUTOMATED DETECTION OF STRUCTURAL ELEMENTS FROM SKELETON EXTRACTION ON A POINT CLOUD

: The surveying and management of telecommunication towers poses a series of engineering challenges. Not only they must be regularly inspected for the purpose of checking for issues that require maintenance interventions, but they are often sub-let by their owners to communication companies, requiring a survey of the many (several thousand per company) installed appliances to check that they respect the established contracts. This requires a surveying methodology that is fast and possibly automated. Photogrammetric techniques using UAV-mounted cameras seem to offer a solution that is both suitable and economical. Our research team was asked to evaluate whether, from the information acquired by small drones it was possible to obtain geometric information on the structure, with what degree of accuracy and what level of detail. The workflow of this process is naturally articulated in three steps: the acquisition, the construction of the point cloud, and the extraction of geometries. The case study is a tower carrying antennas owned by several operators and placed in the industrial district of Cagliari. The article examines the problems found in modelling such structures using point clouds derived from the Structure-from-Motion technique, in order to obtain a model of nodes and beams suitable for the reconstruction of the structure’s geometric elements, and possibly for a finite elements analysis or for populating GIS and BIM, either automatically or with minimal user intervention. In order to achieve this, we have used voxelization and skeleton extraction algorithms to obtain a 3D graph of the structure. The analysis of the results was carried out by varying the parameters relating to the voxel size, which defines the resolution, and the density of the points contained inside each voxel.


Background
Radio base stations (BTS, Base Transceiver Stations) are systems for the transmission of radio frequency signals used in cellular networks and for the transmission of television or radio signals. These are objects between 20 and 50 m tall, but which can also reach heights of over 100 m. These are usually pylons or poles based on the ground (rawland) or, especially in the city environment, laid on the terraces of the tallest buildings (rooftop). The infrastructures are made up of different components which include: the trellis that constitutes the loadbearing structure, the radiant systems located at different altitudes, bearing and tilt, a considerable amount of cables for power supply and signal transport. The entire plant, which can also be very complex, includes, in addition to one or more pylons, also a series of passive devices placed outside or inside special shelters ( Figure 1).
The BTS constitute the asset of national or multinational companies that, as owners of the infrastructure, rent the space in the structure to the commercial operators of telephony, radio or television who use them to extend the coverage of their transmissions. Commercial operators have no infrastructure maintenance costs, which are therefore borne by these Tower Companies. In Italy, Inwit, the Telecom Italia tower company, currently the owner of more than 10,000 towers, announced its partnership with Vodafone in 2019, declaring it was to become the largest tower operator in Italy and the second in Europe (Inwit 2019). Inwit has inherited the property of towers built over nearly 50 years (the mobile radio system was introduced in Italy in The inspection activities of the towers include, among other things, the assessment of the structural integrity of both the truss and the brackets that carry the radiant systems (the radiant systems can be dipoles or sometimes even large diameter parabolas) subject mainly to corrosion. Until a few years ago, this type of inspection was carried out by skilled technicians climbing the tower or by using mobile platforms. More recently, for the tallest towers, the inspection is done with the use of UAVs carrying video/thermo cameras. The advent of very small UAVs and the possibility of use by operators with basic licenses has widened the population of towers that can be inspected with UAVs, to include the smaller ones (of lower height) which obviously are the majority.
At the same time, the inventory of the entire asset has been going on for several years, by companies, first through GIS and now also through BIM. The population of both GIS and BIM takes place easily for the towers built in recent years because the project documents and construction works are available, but it is a difficult job when the DBs relating to the old construction towers have to be populated. Of these pylons, the geographical position is often known only approximately, and any information relating to both the structure as it is built and the entire area of the system's plan is totally missing. Moreover, even for the most recent systems, it often happens that changes in the number and position of the radiant systems are not recorded properly and it is possible to lose track of them. The companies that manage these plants report that an adequate period of review for them could be once a year which, if it were to be carried out by specialized teams, would obviously have unsustainable costs. An adoptable strategy is to send non-specialized personnel to the field, capable of flying light-weight UAVs equipped with a camera, which perform a complete recording of the site, the pylon and the upper part where the radiant systems are housed.
The images/videos can be inspected by a few units of specialized personnel who assess the state of the plants and establish the maintenance activities to be carried out together with the work priority. In addition, one could think of expanding the exploitation of video footage to extract metric information from it in order to populate the above GIS and BIM.
In harmony with this scenario, a research project has been launched which, among the various objectives, includes an assessment of the potential of a non-professional measurement system for the purpose of populating a BIM.
Our research team was asked to evaluate whether, from the information acquired by small drones it was possible to obtain geometric information on the structure, with what degree of accuracy and what level of detail.

Related work
According to our knowledge, automatic reconstruction of 3d cell towers from UAVs and SfM point cloud has rarely been reported, although at least two professional solutions are presented by Bentley (Bentley 2020) and Pix4D (Pix4D 2020). In (Eckert et al., 2020) the authors explore three photogrammetric software, VisualSFM, Pix4D and ContextCapture in order to evaluate the most suitable feature matching algorithm and the most accurate point cloud. More recently, case studies of power pylons are reported, which are bigger and more complex objects than cell towers. In (Chen et al., 2019) Airborne Laser Scanners (ALS) are used to acquire 3D data, and an automatic reconstruction algorithm is developed to retrieve the frame of the quadrangular frustrum pyramid, while the internal structure reconstruction is based on prior knowledge of it. In (Jiang. et al., 2019) and (Huang et al., 2020) a UAV platform is used but pylon reconstruction is based on 2D images analysis and a library of pylon models.

METHODOLOGY
The target of this research is to extract automatically or semiautomatically the structure elements of the trellis as a graph of nodes and rods representing the legs and bracings of the lattice. In order to achieve this target, we developed a procedure, inspired by (Ma, Liu, 2018), which is summarized in the workflow of Figure 2.
The workflow is essentially divided in three phases: the first one concerns the acquisition of images using the UAV and the topographic survey; the second, the construction of the point clouds using the Structure from Motion technique, and the third the processing of the cloud.

Survey
Among the various indications on the correct acquisition suggested in the operating manuals of the photogrammetry software that exploit the algorithms of the Structure from Motion (SfM), (Metashape and Pix4D are just a few), there is the recommendation to avoid the presence of homogeneous backgrounds or textures in the frames. In fact, these can mislead the matching algorithms looking for homologous points, contributing to generate noise. In this sense, the acquisition of the sky can represent a weak point in the extraction process of a good point cloud, both in terms of density and noise. This statement is confirmed by several practical applications conducted in the survey of similar structures to the one being tested (RYKA UAS 2017), in which to try to limit the problem as much as possible, it is suggested to tilt the sensor 45° on the horizon. Regarding the flight pattern, for the survey of structures that rise vertically, as is the case of the steel communication towers in study, the acquisition scheme can be planned either as linear paths that run parallel to the axis of the tower, or as helical trajectories wrapping the structure. The image can also be acquired in continuous or discrete mode, depending on whether one opts for video sequences or individual shots acquired at specified space/time intervals. In our use case we use individual shots rather than video sequences, because frames exported from a video sequence would lack the metadata containing the position of the perspective center and the focal length for each shot. In addition,

Figure 1. Components of the antenna tower
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) the flight can be performed in manual or automatic mode; in the latter case the check of both project parameters and the routes tracking is delegated to the ground control station that maintains constant contact with the UAV during the flight.
Unlike classical photogrammetry, the camera positions obtained by using the SfM algorithms generate a model with arbitrary scale and orientation, consequently the 3D point clouds (sparse clouds) are generated in a relative coordinate system objectimage (Carrivick et al., 2016). Therefore, at least the absolute scale recovery is needed. Two ways are possible: an auxiliary topographic survey or a set of calibrated bars to be placed on the structure during the images acquisition.

Structure from Motion:
Reversing the classic photogrammetric approach, the SfM technique, starting without a priori knowledge about the scene, reconstructs the 3D scene (structure) determining camera positions through the intersection of the multiple homologous rays relating to the same point taken from several positions (motion). Essentially it operates through three phases: first, extraction of the key points in the images, in other words points of interest, lines, etc, and consequent matching of these features between images, second, camera motion estimation, and third, reconstruction of the 3D structure using the estimated motion. The minimization of the discrepancy between image measurements and their predictive model is done using a bundle adjustment technique.

Voxelization:
After the point cloud is produced, it is converted into voxels based on the spatial density of points. The voxelization process is a partition of space into cubic regions of equal side length ("step" or "resolution", r). In practice, we build a 3-dimensional matrix where the value of each element of indexes i, j, k was the number of points P which are inside the corresponding portion of space: The coordinates ′ % , ′ % , ′ % are relative to the lower bounds of the coordinates of the points: The ranges of the indexes are thus given by The next step is filtering out (that is, setting to zero) the voxels whose value is below a certain threshold, in order to reduce the noise. The optimal threshold depends on the resolution and the noise characteristics of the point cloud.

Skeleton extraction:
Skeleton extraction is a wellknown process in the field of image processing, which reduces a 2D binary image to its median lines (Lee et al., 1994). This process can also be applied to a 3D voxel image. We generate the 3D skeleton of the voxel matrix using the Skeleton3D function by Philip Kollmannsberger, available on the Matlab File Exchange.
The skeleton is still in voxel form, but the same author released another function, Skel2Graph3D, that converts the skeleton into a 3D graph composed of nodes and connecting arcs (Kollmansberger et al., 2017).
The nodes and arcs of this graph are the elements we need in order to model the structure of the lattice tower. They are thus exported in two formats: DXF for import and visualization in 3D modeling software, and a custom text format for further processing.
In order to follow more closely the point cloud, the coordinates exported for each node are not those of the center of the corresponding voxel, but those of the center of mass of the points contained in the voxel (Huang et al., 2019).

Quality assessment
As stated in many works (U.S. General Services Administration, 2009), (Khoshelham et al, 2012), (Rebolj et al., 2017), in general, accuracy, point density and level of scatter are three important measures for evaluating the quality of a point cloud.
We decided to assess accuracy measuring coordinates, through a topographic survey with a total station, for a suitable amount of points spread all over the structure and comparing them with those estimated through photogrammetry. The figure used is then the RMSE evaluated according to: ) respectively the photogrammetric and topographic estimated coordinates of N check points. Moreover, we compare point clouds of adjacent strips. Given that adjacent strips share common area, using the software CloudCompare, we perform a "cloud to cloud" difference, evaluating the mean and the standard deviation for all the adjacent strips.
Point density and level of scatter are not evaluated at this stage of the work, but we plan to generate the 3D cad model and perform a "cad to point cloud" comparison in order to evaluate scattering and point density.
Regarding the assessment of the level of detail, we assume the data quality definitions introduced by (U.S. General Services Administration, 2009) and summarized in (Rebolj et al., 2017). The Table 1 reports the GSA specifications in which LOD is the level of detail and LOA is the level of accuracy.

Description
The case study is the Base Transceiver Station (BTS), named CASIC tower, located along the 5th road in the industrial area of Macchiareddu (Ca), and it belongs to the "rawland" category. The station includes 6 dipole antennas mounted on three metal Y-arms at the pole placed on the top of the trellis and 6 parabolic antennas for radio link connections. The devices required for the proper functioning of the BTS are housed in the shelter located at the foot of the trellis. The tower is 35 meters tall with a ground section of 4 x 3.57 meters, tapering over five levels to a side length of 1.80 m, then rising for another four levels of equal size. The antenna holder pole rises 3 meters above the top of the tower. The structure of the trellis, consists of currents placed along the edges, joined by diagonal bracing on the four faces and by plane braces in the horizontal sections. Inside the trellis are accommodated the sailor ladder (with cage) and five service walkways (at altitudes of +7,5 m, + 17,5 m, + 30 m, and +32,5m).

CASIC Tower Survey:
Regarding the UAV, for the survey of the CASIC trellis, the choice fell on the DJI Mavic 2 Zoom. According to the recent promulgation by Italian National Agency for Civil Aviation ENAC (ENAC 2019), the UAV is defined as aircraft with a take-off mass less than 25 kg. The characteristics of the Mavic 2 Zoom and its equipment, as in the setup of the case study, is summarized in Table 2.

UAV
Considering its usage, as well as its technical aspects and extra equipment, the Mavic 2 Zoom is classified as a consumer (noncommercial or non-professional) UAV and it is commonly used for inspections. Assuming to reach at least a LOD Level 2, the scale has to be greater than 1:50 which leads to a flight plan whose main parameters are summarized in Table 3a and 3b.  To fix the origin and the scale of the reference system a topographic survey was performed using a high precision Total Station. At each side of the trellis, the coordinates of 25 points distributed, as shown in figure 5, for a total of 92 points were determined. Not all the points where used as Ground Control Points (GCP) but some of them where used as Check Points (CP), At each strip GCP and CP where subdivided according the Table  4 and Table 5.

Generating point cloud:
For the calculation of the point cloud model, we used the commercial software Metashape (Agisoft). The project was structured by organizing the chunks by strip. In each chunk, we proceeded by first evaluating the quality of the images and discarding those that possibly had a quality value in terms of sharpness, calculated on the areas of the image most in focus, less than 0.5 (Agisoft Metashape 2019). The quality range of the whole dataset is between 0.82 and 0.98. Subsequently, the alignment (sparse cloud) with maximum precision parameter was calculated.
Bundle Adjustment (BA) results, divided by strip and point type, are summarized in the same two tables. The error components in x, y and z and the planimetric and total error are reported.
The calculation of the dense cloud was made with a "high" quality parameter and a "light" depth filter (to limit as much as possible the risk that the filter could erode the useful section of the rods of the trellis and subjected to a coarse cleaning that provided only for the elimination of the double points and the removal of the noise outside and near the tower structure. The attempt is in fact to feed a "noisy" cloud to the voxelization and skeleton extraction algorithms, in order to evaluate how well they work in the presence of noise. In figure 6 the result obtained so far.
Also, according to paragraph the 2.4, the cloud to cloud distance was calculated between adjacent strips for the fourteen strips. The results obtained are summarized in the Table 6.

Parameters:
We processed the point cloud with different parameters of voxel resolution and filtering threshold. First, we used resolutions of 5, 10, 15 and 20 cm, and calculated the average and maximum number of points per active voxel (that is, excluding the empty voxels) in order to estimate a suitable threshold value. Table 7 shows the statistics obtained from these tests.  The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition)

Res
The Figure 7 shows the visualization of the voxel matrices at the different resolutions. We then examined the effects of different threshold values for a given resolution. We focused on the matrices with resolutions of 5 and 10 cm and applied threshold values of 0, 10, 20, 50 and 100. This affects the number of points that are rejected (filtered out) and influences the number of nodes and beams in the structure graph (see Tables 8a and 8b).  0  6192088  0  1467  1875  10  6040748  151340  712  1106  20  5890834  301254  596  1005  50  5472449  719639  568  983  100  4725840  1466248  564  984  Table 8b. Effects of different threshold values on the 10 cm resolution matrix Figure 8 shows the effects of the variation of the threshold on the voxel matrix with 10 cm resolution. We can see that threshold values up to 20 fail to remove all the noise, whereas with a value of 100 gaps start appearing in the structure. Figure 8 The voxel matrix with 10 cm resolution, filtered at 0, 20, 50 and 100 points per voxel

Skeleton and graph extraction:
After the voxel image have been filtered, the skeleton is extracted and converted to a 3D graph. The Tables 8a and 8b show the numbers of nodes and beams in the graphs obtained with different parameters. Figure 9 shows two examples of graph representations.  The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) The 3D graphs obtained from the skeleton were also exported in DXF format. This allowed us to load and compare the graphs produced with different parameters. Figure 10 shows one of these comparisons. Figure 10. Comparison between the structure extracted from the matrices filtered at 0 (red), 20 (yellow), 50 (green) and 100 (blue) points per voxel

DISCUSSION
The RMSE of the 14 strips evaluated on the Check Points spans from 1.29 cm to 2.01cm. Assuming RMSE as figure for LOA, we can say to be not far from a LOD Level 2, while the LOD Level 1 is certainly assured. Based on LOA, the system of measure provided by the small UAV tested is able to retrieve the frame and possibly the internal structure.
The analysis of the voxel matrix shows that filtering the voxels based on the number of contained points is effective in removing the noise produced by the SFM reconstruction. The threshold value must be chosen so as not to create gaps in the structure of the trellis; in our tests we settled on values that were of the same magnitude of the average number of points per voxel.
Using a smaller resolution value allows for a more detailed modeling of the structure, but of course requires a higher amount of memory.
We are now working on the reconstruction of the lattice tower structure from the nodes and arcs of the graph generated in our process and submit them to constraints depending on the geometry of the structure.