IMAGE RECORDING CHALLENGES FOR PHOTOGRAMMETRIC CONSTRUCTION SITE MONITORING

Construction site monitoring and progress monitoring is becoming increasingly popular in the architecture, engineering and construction (AEC) industry. To this end remote sensing techniques are used to gather consecutive datasets of the construction site. This work focuses on the recording of imagery for photogrammetric processing and the challenging conditions often encountered on construction sites. The constantly evolving character of a such sites requires datasets to be captured as quickly as possible. Furthermore other recording complexities arise such as the presence of auxiliary equipment and clutter or reflections caused by wet surfaces, hindering quick and complete recordings. Apart from these external factors also construction elements themselves often complicate the capturing workflow. This work enumerates several real-world examples of difficulties construction sites pose for the recording of imagery for photogrammetry purposes. Each section provides an insight in a specific challenge, typical for construction sites, and discusses applicable field-tested solutions including an overview of relevant solutions found in literature.


INTRODUCTION
The advent of Building Information Modelling (BIM) and the ongoing digitalisation of the AEC industry require the recording of a construction site through its different stages.These recordings can be used for a variety of purposes, an important one of which is progress monitoring.This process has gained major attention of both researchers (Golparvar-Fard and Peña-Mora, 2007, Turkan et al., 2012, Dimitrov and Golparvar-Fard, 2014, Kopsida et al., 2015) and the AEC industry (Gexcel, 2017, Autodesk, 2018) in recent years.In order to be able to monitor the progress of a construction site in a more automated way, the capturing of multiple datasets is required.Subsequently, the progress can then be determined by comparing either two consecutive recordings or one of the recordings with the 4D as-designed BIM.This enables project planners to keep the as-designed BIM up to date and to extract the new updated construction working schedules (Tuttas et al., 2015, Tuttas et al., 2016).
Also the monitoring of construction sites requires the scene to be recorded regularly.Examples are to keep track of all the problems during construction, to analyse the correct localisation of building elements and to detect possible deviations from the as-designed BIM.Furthermore the recorded imagery can be included in the progress reports that construction and project managers deliver on a regular basis (Everett et al., 1998, Abeid andArditi, 2002) or can serve as input data for the calculation of the current asbuilt BIM (Patraucean et al., 2015).Furthermore the reports or, on a higher level, the as-built BIM, can be handed over to the stakeholders as a final as-built file that represents the construction at the building completion.
Finally, the recordings could also contribute to the safety on the site.By recreating the scene virtually using the recordings, con-struction workers can be made more aware of the possible dangers on construction sites.Furthermore by detecting possible dangerous situations in an early stage, injuries or even fatalities can be avoided (Cheng and Teizer, 2013).
For overall monitoring purposes on construction sites, there are two main methods to record data: the site is either captured with photogrammetry or laser scanning.The focus of this work is the photogrammetric approach.The underlying idea is that construction site managers and foremen already frequently take numerous pictures (Ibrahim et al., 2009, Lin et al., 2015).This can be taken one step further by recording additional pictures following a specific capturing workflow that results in an accurate and complete 3D model of the entire construction site or the part of the site they are interested in.Furthermore the total cost, thus the purchase and operating costs, of these techniques is taken into account.The work of Omar and Nehdi shows that photogrammetry can both be purchased as well as operated at a lower cost (Omar and Nehdi, 2016), while yielding only slightly worse results compared to laser scanning (Koutsoudis et al., 2014).
However, despite the promising possibilities of such approach, there are several obstacles to overcome.Construction sites form an environment which is difficult to capture.Field experiments were conducted to research and solve the encountered challenges.The recorded data was captured at a construction site in Ghent, Belgium.Three apartment buildings will be erected on a large underground parking space.Both the recordings as well as the possible solutions to the capturing challenges were tested at the ongoing construction of the parking lot.It must be stressed that the conducted recordings mainly focussed on capturing the structural elements at the construction site, rather than equipment, building materials or construction workers.
The remainder of this work focusses on the variety of challenges and is structured as follows.The difficulties that can be asso-ciated with auxiliary equipment and clutter on the site are discussed in section 2. Reflective and texture-poor surfaces form another challenge, which is presented in section 3. and section 4. respectively.Furthermore, section 5. and section 6. discuss two building-related problems, namely narrow spaces and spaces with challenging lighting conditions.Each of the sections is provided with an overview of relevant literature and the results of the conducted field experiments.Finally, the conclusions are presented in section 7.

AUXILIARY EQUIPMENT AND CLUTTER
One of the major obstacles at construction sites is the presence of auxiliary equipment and clutter.First at all these objects cause occlusions of two different types: dynamic and static ones.The former are mainly caused by moving objects such as personnel or working equipment (figure 1, left), while the latter have a large variety of causes.Examples of this are auxiliary equipment such as moldings and scaffolding, which are the main sources of static occlusions, but also construction materials on the site, such as reinforcement steel or bricks, frequently obstruct the scene (figure 1, right) (Omar et al., 2018, Golparvar Fard et al., 2015).
A second possible consequence of the presence of auxiliary equipment and clutter at construction sites is the impact on the registration of multiple consecutive datasets.If, for some reason, the construction site could only be captured partially, the registration of the following day's dataset with the first day's dataset may encounter severe difficulties.This is due to the dynamic nature of a construction site.Equipment captured at a certain location is quite likely placed elsewhere the following day.Moreover, equipment and clutter frequently is more diversely textured compared to concrete surfaces, which results in mesh reconstructions with an easily recognisable and distinctive texture.This makes these elements the ideal objects to register both datasets.However, problems arise when these elements move.Severe registration errors between the datasets of both days can be expected when registering in a non-supervised automatic way.
The same problems occur when registering two consecutive datasets of the entire construction site for progress monitoring purposes for example.When the transformation parameters between the two point clouds or models are calculated based on feature points present in both datasets, care should be taken that these points cannot have moved between the different recording sessions.
Solutions Literature suggests several solutions for the presence of occlusions.A solution for the detection of occluding scaffolding structures is presented by Xu et al. (Xu et al., 2018).Because scaffolds typically are located close to structural components as well as coloured similarly, they can cause disturbances in the mesh creation since software considers them as part of the structure.However, by correctly recognizing these scaffolds, they can be eliminated, yielding more accurate results when further processing the remaining data.Golparvar-Fard et al. present their work on monitoring the progress on construction sites in (Golparvar Fard et al., 2015).The completion of elements is determined, taking into account both types of occlusions.An element can be considered as possibly built when it is severely occluded.Furthermore (Tuttas et al., 2014) state that occlusions inevitably will be present in the final reconstruction of construction sites.However, this does not necessarily pose large problems for the determination of the building progress.By taking into account the area recognized and confirmed as built, combined with building logics, further assumptions can be made about the percentage of completion of the element.
The registration of two datasets, either two partial or two consecutive full site ones, can be fulfilled based on at least three indicated common points to overcome the 7 Degrees Of Freedom (DOF), which can be decomposed in rotation, translation and scale (Golparvar-Fard et al., 2009, Golparvar Fard et al., 2015).If the registration is executed manually, the picked points should be easy to recognize in both datasets.Eminent elements for this are auxiliary equipment and clutter, with their large textural variations.However, since they frequently move, this can cause severe registration errors.Due to recent developments in photogrammetric software however, both the necessary time for aligning hundreds to thousands of images is heavily decreased as well as the number of distinct useful feature points in images has increased substantially.This results in a more accurate matching of the images of two datasets, less prone to erroneously moved common points between datasets such as in the fore-mentioned pointpicking method.This subject is elaborated further in the following field experiments.Furthermore Kim et al. present their work on the registration of the as-built and the as-designed data (Kim et al., 2013).Through the use of machine learning, the structural elements are separated from the remaining data.Subsequently, the transformation can be calculated based on corresponding feature points on these structural elements in both datasets and this an a fully automatic way.This way feature points on moving auxiliary equipment or clutter and thus erroneous registrations, are avoided.
Despite the fact that occlusions will inevitably be present in the final reconstruction, the objective is to keep them at minimum by following a dedicated capturing approach.The following suggested approach therefore is tested and fine-tuned in numerous field experiments.The focus of this work is to reconstruct the structural elements on the construction sites, rather than the tracking of equipment, work staff and so on.
A first factor that aids in the reduction of occlusions depends on their duration.Short-term dynamic occlusions caused by moving working staff could be avoided either by capturing the scene at specific times such as breaks or after working hours (Omar et al., 2018).Another possibility is waiting a short amount of time until the workers have moved elsewhere.This is because photogrammetry only is able to reconstruct objects that are visible in two or more images where the object is in exactly the same place and position.The conducted field experiments have shown that even though workers are present in images, they are hardly ever reconstructed because of their frequent movements.Furthermore dynamic occlusions can also be caused by movable equipment such as scissor lifts, man lifts or excavators.Depending if the machine is operable or not, these occlusions can be treated either similarly to occlusions caused by personnel, or as static occlusions such as discussed further.In case the machinery is operable, it is advisable to start capturing the area where the machinery will move to and finish the recordings in the area where it was originally operating.By following this approach in the field experiments, the complete recording of the construction site can be ensured.
Opposite to dynamic occlusions, the appearance of static occlusions is mostly harder to solve and can only be circumvented by altering the recording strategy.A successful strategy is to capture the scene first focussing on the general overview and subsequently focussing on the details.By first making a series of pictures focussing on the general recording of the construction site and browsing through the pictures, a good estimation of the occluded regions can be made.Because of the carefully selected trajectory of the first series of recording, the second capturing series can focus more on the occluded areas following a slightly deviating trajectory.This approach makes it possible to locate the The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2/W9, 2019 8th Intl.Workshop 3D-ARCH "3D Virtual Reconstruction and Visualization of Complex Architectures", 6-8 February 2019, Bergamo, Italy Figure 1: Auxiliary equipment and clutter on a construction site causing occlusions and registration errors images of the occluded areas in the overall project, while only recording the site in detail would result in an impossible or faulty alignment of the imagery.This strategy can be repeated multiple times spread over a construction site in order to reduce the static occlusions.This approach is proven to be successful in the conducted experiments.
Potentially faulty registrations of two consecutive datasets can be eliminated by capturing the second dataset in an intelligent way.By ensuring a large overlap zone between the two datasets, registration errors can be avoided.Such as stated before, the transformation between the datasets is calculated based on matching the numerous feature points.Because the number of feature points in the overlap zone is far higher than the number of feature points on the moved equipment or clutter, the equipment's feature matches will be considered as erroneous outliers.
A second applicable solution is the use of a reference system.By using the same set of reference points to reference both datasets, the transformation calculation becomes irrelevant.Furthermore this yields the additional advantages that the photogrammetric results are correctly scaled and that both datasets can be processed separately which can be beneficial for larger datasets, since the computational cost for the photogrammetric processing of the imagery increases combinatorially.Also the combination of both provided solutions, namely enough overlap between two datasets and the use of a reference system, can be used to ensure a fully correct alignment.

REFLECTIVE SURFACES
A second major challenge for the correct processing of recordings is the presence of reflections, the main source of which are wet floor surfaces on construction sites (figure 2 (top left)).The thin layer of water acts as a mirror which results in reflections.Two types of reflections can be distinguished: sunlight and construction site element reflections.The presence of reflections can lead to severe mismatches and hence erroneous reconstructed points and mesh models (figure 2) because for automatic photogrammetric reconstruction algorithms it seems as if the reflections are actual existing entities instead of a puddle reflection.
Solutions To the authors' knowledge no literature concerning reflections caused by water on construction sites exists.However, this topic is discussed in other research domains such as bathymetry.Casella et al. (Casella et al., 2017) present a paper about a bathymetric survey using an Unmanned Aerial Vehicle (UAV).The flight was timed to coincide with the ideal weather conditions, meaning no wind for a perfectly flat water surface, to keep sunlight reflections at minimum.Furthermore only nadir images were taken.They conclude that, despite the layer of water, it is possible to reconstruct the surface underneath.Partama et al. take it one step further (Partama et al., 2018) and are able to reconstruct the submerged surface despite minor sunshine reflections caused by the water ripples in non-wind-absent conditions.By combining and comparing the multiple nadir videoframes, recorded by a UAV at one single location, the reflections can be filtered out.This way it is possible to reconstruct the underwater scene.
However, before applying these literature solutions for water reflections when recording constructions sites, a major remark must be made.The solutions are developed for a very specific purpose, namely bathymetric surveying.The recording conditions at construction sites however, are substantially different, the major difference being that the site is captured terrestrially.This results in exclusively oblique images of the floor surfaces covered with water.This leads to far more pronounced reflections in the imagery: apart from the reflections originating from sunshine, also numerous additional object reflections are present, as can be seen in figure 2 (top left).
Furthermore the influence of the sunshine intensity must be taken into account.When recording on a clear sunny day, reflections either originating from the sunshine itself or the construction site elements, will be much harder to avoid.Because of the reflections are much brighter than the floor surface under the water, the reflections will be reconstructed instead of the floor.Furthermore the object reflections are indistinguishable from actual objects in the images.Therefore, it is advisable to record on days with an overcast sky.Although reflections cannot be avoided entirely, they will be much weaker, no longer overpowering the light coming from the actual submerged surface.
Additionally, the recording perspective can be altered to mitigate reflections.By capturing the submerged scene through nadir images, similar to the formerly discussed bathymetric surveying approach, construction elements reflections can be avoided.However, when manually capturing the scene, this is almost impossible.The person recording the site will be present in the images and ripples in the water will occur because of this person's presence, again increasing the amount of element reflections.Another more viable solution is the usage of UAV imagery as in bathymetric surveying.The absence of water ripples caused by a moving person will greatly reduce the number of construction site object reflections, as long as the UAV flies high enough, not causing ripples itself by its rotating propellers.
The non-existing objects caused by reflections in the water are typically reconstructed under the floor level.This facilitates the Figure 3: Texture-poor surfaces such as concrete walls result in a lower amount of detected feature points complicating both the alignment as well as the further reconstruction possibility to design an algorithm that deletes erroneous reconstructions under the floor level.This will be researched in the future.

TEXTURE-POOR SURFACES
Structural elements at construction sites mainly consist of concrete.One of the characteristics of this material is its quasi uniform color (figure 3), which is challenging for photogrammetric processing.The correct alignment of the images depends on finding correspondences between feature points in the pictures.Because of the uniform color, detecting unique, distinctive feature points is a hard task.This is the reason why feature points typically are detected at the edges and corners of the elements, since these points are far more distinctive than the uniform concrete surface.
Additional to the challenging alignment process, the dense reconstruction of the scene will yield less points in the areas where texture-poor elements are present.The following mesh calculation therefore will highly likely be less accurate, since it is calculated from fewer points.Both the decreased dense point cloud density as well as the less accurate meshing can pose problems when the reconstructions serve as input for progress monitoring, since the typical monitored objects are structural elements made of texture-poor concrete.
Solutions Furukawa et al. (Furukawa et al., 2009) present research on the reconstruction of Manhattan-world scenes.Construction sites mainly consist of planar surfaces, which are furthermore mainly oriented along one of the three dominant axes, complying to a Manhattan-world scene (Coughlan and Yuille, 1999).Their proposed 3D reconstruction and visualisation system is able to reconstruct a site's geometry by calculating oriented points and depth maps to produce a simplified 3D model in the end.By making the assumption of a Manhattan-world scene, the mesh model can be created more easily despite the presence of texture-poor surfaces.
Field experiments have shown that concrete surfaces pose less difficulties then originally expected.Thanks to the evermore developing photogrammetric software, little to no difficulties were experienced in the images' processing.The average point density of properly illuminated concrete walls was 29k pts/m 2 when reconstructing with the parameter set on high in Agisoft Photoscan.If a laser scanner recorded the scene at a resolution of 6.2mm/10m, the point density would be comparable with 26k pts/m 2 for walls at a 10m distance.
Furthermore, although properly illuminated concrete surfaces result in more feature points, also shadowed surfaces can be reconstructed completely, only resulting in a slightly decreased point densities of 21k pts/m 2 on average.Furthermore the shaded reconstructed points are also at the correct location, forming a planar surface as is in reality.
The remark should be made that the capturing took place before the next floor's slab was cast.The recording conditions change dramatically once this is the case, often even resulting in a failure to align the recorded pictures.This subject is further discussed in section 6.

NARROW SPACES
A specific challenge on many construction sites is the sometimes very limited size of rooms.Apartment or office buildings for example, often consist of multiple adjacent small rooms.These constitute an environment which is not only difficult to record but also hard to register (figure 4).
When recording large or open-field sites, the common approach is to photograph the scene with a camera with a standard kit lens.However, smaller rooms cannot be captured following such approach.The maximum field of view (FOV) of a standard kit lens mostly varies between 70 and 90 degrees.Because the distance to the opposite wall is limited in smaller rooms, this results in a very limited depicted area (Covas, Joao;Ferreira, Victor;Mateus, 2015).This causes a large increase in the number of images needed to capture the scene.
Furthermore the multitude of images and the presence of multiple identical narrow rooms form a challenging environment for the alignment of the images as well as a dramatic increase in processing time.Additionally, in many cases smaller rooms only have one or two doors.This results in a very small overlap between the in-and exterior of the room, further severely limiting the relative position estimation of images forming the link between inand exterior.Because of the limited room size, the overlap area is even more reduced, further complicating a successful registration of the pictures.
Solutions Literature research on reconstructing narrow spaces can mainly be situated in the heritage documentation sector.Perfetti et al. (Perfetti et al., 2017) present their work on reconstructing a historical staircase.If using normal lenses the capturing would be (almost) impossible.By using fisheye lenses, which depict larger portions of the scene due to their wider FOV, the scene could be reconstructed successfully.Barazetti et al. (Barazzetti et al., 2017) and Strecha et al. (Strecha et al., 2015) both present research on the quality assessment of the results obtained through fisheye photogrammetry.Respectively they report that processing the imagery using different software packages results in comparable accuracies and that fisheye photogrammetry results are comparable to traditional photogrammetry results.On the downside, there is a lot lower level of detail using a fisheye recording approach.
The field experiments testing a workflow using wide-angle or, to a further extent, the above discussed fisheye images, have not been conducted yet.In future tests, two different approaches will be followed.The first one will only use wide-angel/fisheye images.Compared to the traditional capturing workflow this will result in large time savings as the number of necessary pictures to be taken to cover the complete scene is heavily reduced, but with a lower level of detail.Secondly, an approach using wide-angle or fisheye pictures in addition to the normal imagery will be tried.The advantage of this approach is that it will deliver both a correct alignment through the overview in the wide-angle/fisheye images as well as a high level of detail via the high-resolution pictures as taken in the normal procedure.Future experiments will be conducted to test both approaches regarding completeness, accuracy and texture quality but also recording time.
To solve the alignment difficulties caused by narrow spaces, we conducted field experiments employing a UAV to capture nadir images of the construction site.For as long as the walls of the narrow rooms are not covered with the next floor slab or the roof, this approach is feasible.Because of the aerial perspective, a multitude of smaller rooms can be captured and registered more easily, forming a strong basis for the further alignment of the additional ground-based imagery (figure 4).

DARK AREAS
Another final challenge at construction sites is posed by areas with limited lighting.This results in multiple problems.To capture sufficiently bright images of the dark scene either the camera's gain must be raised, resulting in more picture noise, or the capturing time should be increased, resulting in motion blur when not standing perfectly still.Furthermore the contrast in dark areas is very challenging as well.In the vicinity of doorways, light enters the dark room, causing the pictures to be either under-or overexposed.This frequently results in matched image sets inside the room and other image sets outside the room.However the link between the datasets is missing, since it should have been calculated from the underor overexposed images depicting the overlap between dark and bright areas.
Solutions A possible solution consists of recording with additional lighting, either a camera flash or (a) portable site light(s) such as in (Perfetti et al., 2017).In the conducted field experiments it became clear that recording the scene this way had various disadvantages.First of all the hand-held image recordings still resulted in severe motion blur, even despite the better lighting conditions.Secondly, when using portable site lights, these must be moved every several pictures, creating an additional time burden when capturing.Finally, also the alignment of the imagery did not always succeed: in many cases only 70 to 80% of the pictures were aligned because of the limited feature points in dark areas, leaving large parts of the scene undocumented.
For the areas with large differences in brightness, the scene can be recorded by the High Dynamic Range (HDR) technique.However, because of the long necessary exposure for the dark area, field experiments have proven that this technique is infeasible with a hand-held camera, since heavy motion blur occurred.To this end, a tripod can be used.However, our goal is to make use of available images, recorded by the construction site manager, foremen or workers, possibly supplemented by some additional recordings.If such non-photogrammetric experts have to record the scene completely with both a tripod and additional lighting, the necessary time is simply to high to justify the possible outcome.Therefore, the most feasible solution is to record the constructions site elements up to their final form before the above floor slab is cast.

CONCLUSION
Construction sites pose several challenges for the recording of imagery for photogrammetric processing.This work focussed on the expected and encountered difficulties during the field experiments.The challenges are discussed and several possibilities are presented to (either partially or fully) solve these difficulties or avoid them, both from literature sources and from our own experiments.
Auxiliary equipment and clutter cause dynamic and static occlusions, as well as possible registration errors and can be solved by altering the recording approach and using large overlap zones between different datasets or using a reference system.Wet or polished surfaces cause reflections, which can be avoided by capturing the site from an top view perspective, such as is the case when using UAV images.Further texture-poor surfaces were assumed to complicate both the alignment as well as the final reconstruction of the construction site.However, during field experiments no large difficulties were experienced as long as the imagery could be recorded before the next floor slab was cast.The challenges associated with narrow spaces, such as large amount of pictures and hard registration can again be solved by using a UAV images, which forms the strong basis for the further alignment of the additional ground-based imagery.Successful results possibly also can be achieved by using other camera lens types such as wide-angle or fisheye lenses.The suggested recording and processing approaches will be tested in a future recording campaign.Finally, also dark areas frequently are present at construction sites.The associated challenges can be mainly avoided when capturing the scene just before the next floor slab is cast, resulting in a much easier to achieve successful reconstruction of the site elements in their final form.

Figure 2 :
Figure 2: The consequences for the 3D model caused by reflections on the water: recorded image of the construction site with water (top left), textured 3D model (top right), untextured 3D model (bottom left) and side view of the untextured 3D model (bottom right) Figure 4: Narrow spaces at construction sites are hard to record with normal lenses and complicate the alignment process

Figure 5 :
Figure 5: Dark or poorly illuminated spaces and spaces with a large variety in exposure are challenging to record