DEVELOPING A LOW-COST SYSTEM FOR 3 D DATA ACQUISITION

In this paper, a developed low-cost system is described, which aims to facilitate 3D documentation fast and reliably by acquiring the necessary data in outdoor environment for the 3D documentation of façades especially in the case of very narrow streets. In particular, it provides a viable solution for buildings up to 8-10m high and streets as narrow as 2m or even less. In cases like that, it is practically impossible or highly time-consuming to acquire images in a conventional way. This practice would lead to a huge number of images and long processing times. The developed system was tested in the narrow streets of a medieval village on the Greek island of Chios. There, in order to by-pass the problem of short taking distances, it was thought to use high definition action cameras together with a 360 ̊ camera, which are usually provided with very wide-angle lenses and are capable of acquiring images, of high definition, are rather cheap and, most importantly, extremely light. Results suggest that the system can perform fast 3D data acquisition adequate for deliverables of high quality.


INTRODUCTION
During the last years, 3D representations have been widely established as precise documentation methods, both for visualization and for metric purposes.However, the necessary instrumentation is expensive and often bulky and heavy.In addition, the learning curve of their use and, mainly, of the relevant and necessary processing is rather steep.Moreover, the various objects especially in the case of cultural heritage ones, have various peculiarities and require different approach and methodology each time.In this paper, a developed low-cost system is described, which aims to perform 3D documentation fast and reliably by acquiring the necessary data for the 3D documentation of the façades in the case of very narrow streets.
In section 1, the special characteristics of Kalamoti, the medieval village where the system was applied are described.In the second section, related projects using similar equipment or implemented in regions with similar characteristics are reported.In section 3, the tests initially performed in order to choose the most proper system configuration for the current project, as well as, its implementation in the streets (Figure 1) of Kalamoti are analysed.In the fourth section, the processing for the production of the 3D model is described.In the fifth section, the results and the evaluation of the methodology are presented.Finally, in the last section, some concluding remarks from the whole project are presented.

The medieval historical village of Kalamoti
Kalamoti is a medieval village (13th c.AD) and one of the 22 mastic villages (mastichohoria), which are located in the south part of Chios Island in the western Aegean Sea.These villages are called mastichohoria due to the tree of Skinos which grows only in this area and produces the mastic.Mastic is a natural, aromatic resin in a teardrop shape, falling on the ground in drops from superficial scratches induced by cultivators on the tree's trunk and main branches with sharp tools.Mastic has positive effects on human health and is used both as medicine and as aromatic substance.The special characteristics of this unique product are due to the geographical environment, and its production and processing take place only in this defined geographical area of the world.
The traditional stone architecture of Kalamoti has its roots in the centuries of the Genovese occupation of the island (1346-1566 AD).These villages were built to house the workers around a central part, the "tower", where the landowners and merchants used to live.The villages were built with very narrow streets for preventing the workers to gather together to protest and demonstrate (Figure 6).Recently, the alteration of the old traditional village, the demolition of several representative buildings of medieval architecture and the expansion of the village to the south, with the construction of modern concrete houses, endanger the traditional morphological features of the present state of Kalamoti.The reason for implementing this project in Kalamoti is mainly to raise the awareness for the protection of the unique cultural heritage and the masonry buildings of these villages.If this cultural heritage is not documented and protected properly, it will soon be lost, either by natural disasters or human negligence and ignorance.This documentation consists in producing orthoimages at a scale of 1:250, according to the end user's requirements.According to the above, the opportunity was given to apply the system in the narrow streets of this village, in order to carry out the Geometric documentation (Venice Charter, 1964) of Kalamoti.
The Geometric documentation is a subset of the overall integrated documentation (bibliographic, historical, archaeological, architectural, cartographic, legal, etc.).Through the geometric documentation, the present settlement will be recorded as it has survived through the course of time and is a necessary background not only for those studying the past of the settlement but also for the studies to be carried out in the future.

Project complexity and specifications
The project aims to generate orthoimages of the buildings' façades in three main streets of the village having a total length of 350m.Additionally, the 3D point clouds and meshes may be used by the architects for producing sections etc.The main characteristics of these streets are that they are very narrow and the buildings are relatively tall.Hence, it was essential to develop a method for documenting these street's façades, not only quickly but also with the minimum cost.The confined width of most of the streets of the village, which does not exceed 2.30m in relation to the average height of the houses which is about 7.00m-8.00m,creates very difficult conditions for applying traditional methods and other systems such as laser scanners, or UAVs and full frame cameras without using a fisheye lens.According to the physical dimensions, orthoimages of about 2800m 2 should be generated.Another important aspect of the project was the final accuracy of the orthoimages which was set to 6.25cm (scale of 1:250), a quite easy to achieve accuracy in relation to other precise photogrammetric projects.This gave the team the opportunity to test various methods and approaches and apply the developed system in multiple ways.However, not all test results are included in this paper.

RELATED WORK
In the literature and especially during the recent years several projects are using action and 360º cameras to produce 3D models of cultural heritage objects.However, according to the authors' knowledge, there are not any combinations of these two types of cameras into one system reported so far.
During the last decade, significant developments have taken place in the field of photogrammetry, especially for cases that the distance between the camera and the imaged object is only a few meters or even shorter.The continuous development of algorithms related or developed from the computer vision field has greatly contributed to this progress, as the automatic calibration of the cameras and the calculation of the exterior orientation parameters are now possible.Also, many free and commercial software have been produced in recent years to create 3D models, and the increase in the quality of low-cost digital cameras, such as e.g. the action cameras, contributes in a positive way to this evolution.

Action and 360° cameras used for 3D modelling of Cultural Heritage
In the recent years, action cameras have been increasingly participating in photogrammetric projects for the creation of 3D representations.It is a cheap, efficient, flexible and userfriendly alternative to get fast high definition data in complex and non-complex morphological environments.It is also an acceptable solution for projects in Cultural Heritage sites, where problems exist, such as narrow streets or confined spaces.However, they need a very precise and reliable calibration in order to compensate for the serious radial distortion effects (Papadaki and Georgopoulos, 2015).
An important study using action cameras was carried out by Limongiello et al. (2016), who aimed at producing orthoimages of the façades of the walls of an archaeological site in Italy.The special feature of the study is that the images are taken in a very narrow street, 0.80m-1.10mwide and 4m high.The length of the wall was almost 20m.To carry out their work, they used the GoPro Hero 3 Black, with 12 MP resolution with satisfying results.Kwiatek & Tokarczyk (2014) developed a system consisting of seven GoPro cameras, mounted on a base to provide 360° spherical coverage.This system was initially applied to create a low-cost mobile mapping system and to produce the 3D of the indoor space of a church.To map the area with video capture, it is necessary to synchronize the camera so that as soon as the video is captured, the video frames are exported and joined to form panoramas, which offer a 360 o wide coverage of the area.Another reference to the importance of fisheye lenses and their use on narrow streets in photogrammetry is made by Kedzierski, (2009).He points out that taking images with a normal lens on 2-4m wide streets is a time-consuming process and up to thousands of images are required for the geometric documentation of the façades of the buildings of such narrow streets.However, with the use of fisheye lenses, the number of required images and the processing time for the extraction of orthoimages are greatly reduced.
Action cameras are also applicable to aerial photogrammetry.Bolognesi et al., (2015) used the GoPro Hero3+ Black Edition to produce the 3D model and the geometric documentation of a castle in Ferrara, Italy.Two-way images were taken, both parallel to the object and with an inclination to it.For the purposes of the study, Bolognesi et al., (2015) conducted a series of experiments and had the following results.Initially, they formed the sparse point cloud only with images parallel to the castle's roof and without giving information about the type of camera.The complete elimination of the lens curvature was carried out with the participation of 9 control points.Then the two dense point clouds formed, one for the entire castle and one for the façade, were compared with the dense cloud created by a laser scanner and it was found that the mean deviation between the façades is 0.02m, while the average deviation of the point cloud which includes the roof reaches 0.04m.
Another project concerns the production of a 3D model by using the Samsung Gear 360º camera (Barazzetti et al., 2017).There, the main aim was to test the metric accuracy and the level of achievable detail with the Samsung Gear 360º coupled with digital modelling techniques based on photogrammetric and computer vision algorithms.A test was performed, in order to calibrate the two fisheye cameras, using the pair of fisheye images that are then stitched inside the Gear 360º.Results suggest that the use of the projection generated inside the mobile phone or with Gear 360º Action Direction have a relatively low metric accuracy.Additionally, automated 3D modelling was performed and results were compared with point clouds from a laser scanner.The presented results highlighted that the Samsung Gear 360º has limited usability when good metric accuracy is required.They suggest that the camera can be used for applications at scales of 1:100 -1:200, especially when there is limited time for data acquisition.Remondino et al., (2016) challenged and evaluated several methods in order to perform the geometric documentation of 40km of the historical porticoes in Bologna, Italy.Moreover, they wanted to develop a methodology to give to the 3D model, colour and shapes which correspond to reality.The purpose of this project was to determine the most suitable system for the production of this extended 3D model.terrestrial laser scanning, mobile mapping with the assistance of a car and portable mapping.They concluded that the two best methods of data collecting are terrestrial photogrammetry and terrestrial laser scanning.However, due to the lowest price of a camera, compared with a laser scanner, the flexibility of cameras, and the correct geometry and texture of the model, terrestrial photogrammetry was recognized as the most suitable, flexible and reliable technique for creating the 3D model.Even in this paper, action and 360º cameras were not used, the problem of the documentation of long façades on narrow streets is also addressed using various state of the art methodologies and systems.

Progress beyond the state of the art
Most of the above projects use action and 360˚ cameras, or they attach multiple action cameras in order to simulate a 360˚ camera, however, they do not integrate the two types of cameras into one system in order to take advantage of their combined characteristics, such as the use of fewer fisheye images covering only the façades and aligning both façades together using the 360º camera.The latter one was thought that may be very useful in projects under the same conditions, since, using only 360º cameras may reduce the accuracy of the final results.As such, the 360º camera here is used only for the alignment step in order to minimize the number of ground control points used in the project, reducing the time in the field, but not compromising the final accuracy of the results.

SYSTEM DEVELOPMENT AND EVALUATION
For capturing the façades of the buildings both from the two sides of the streets in a common reference system in a very short time, a system was designed and developed meeting these requirements.The system consists of a telescopic pole with two Hero 4 Black Adventure action cameras and one Ricoh Theta S 360˚ camera mounted on it.The cameras were set to take images every one second due to the complexity of the façades and the extremely low camera-to-object distance.The user holding the system was walking slowly along the street, while the cameras were taking images.As estimated, the system needs about 10 minutes to capture 100 meters of the façades of the two sides of the street, provided the street is walked twice.As already mentioned, a prerequisite for the system, was to simultaneously map both of the sides of a street and have the façades in a common reference system and a model without using too many control points.To achieve that, the 360˚ camera was selected in order to connect the smaller GSD imagery of the façades captured by the action cameras.The extremely low cost of the system was also a parameter to take into account since the whole project was in the context of a diploma thesis dealing with low-cost 3D modelling and mapping.The low weight was also a prerequisite.The price and weight of every element of the equipment and the total cost is presented in Table 1.

Cameras Configuration and Testing
Having come up with the equipment that will be used to collect images (GoPro -Ricoh Theta S), it was necessary to perform the appropriate tests to define the best possible combination of the position of the cameras for performing the fast mapping without failures.The main idea was to carry out these tests not in the village under study, due to its distance from the laboratory, but in an area with characteristics similar to those of the streets of Kalamoti.For this reason, the Plaka's (the old part of Athens) narrow streets were chosen, with a maximum street width of 2.30m and building heights of approximately 8m, in order to test all the possible cameras configurations and then evaluate the results of each method.
Issues to be addressed in these tests include the action cameras position and the 360º camera relative position, taking into account the movement direction.Moreover, the operator's walking speed, the reliability and accuracy results as well as the processing methodology were topics to be addressed too.These preliminary tests and their results are presented below.

Field Test 1
The first implementation concerns the placing of the two action cameras opposite to each other at a height of 4.25m, in vertical format, to cover both sides of the street (Figure 2a).In this way, it was initially thought that one pass along the street to be documented would be sufficient as both façades would be captured at the same time.The 360º camera was also mounted at a height of 4.25m in order to connect the two façades of the street into the same reference system.

Field Test 2
The second implementation concerns the positioning of the two action cameras one vertically above the other and on the same side of the telescopic pole at different heights, having a small inclination around the Z axis in order thus to cover a larger surface of the façade to be captured (Figure 2b).The 360º camera was placed at the height of the second camera, at 4.25m and was rotated parallel to the street in order to capture both sides of the street, again with the purpose of connecting the two façades.

Selected Cameras configuration
In order to decide which of the above system configurations is most suitable for retrieving thorough data, captured image data were evaluated at the office and 3D point clouds were created.Results were evaluated in terms of image coverage, image quality, and point cloud quality.For facilitating these evaluations, higher resolution camera images were used to create a control dataset and reference point cloud.The camera used was the Canon EOS 6D, a full-frame DSLR camera with a 24mm prime lens.As it was expected, in the first test, images were not covering the whole surface from the two façades of the street.This is caused by the fact that in order to capture both of the sides at the same time, the system was placed in the center of the street, where the distance from the two opposite façades was about 1.00m.Such a short distance between the camera and the façades decreases the imaged area (Figure 2a).Consequently, this system structure with the specific cameras cannot be used effectively in such narrow streets.Regarding the second test configuration, the two action cameras covered the façade totally.More specifically, the system offers vertical overlap of 65% and horizontal overlap of 90% in two consecutive (1sec interval) captures along the track.As such, it was ensured that every single detail of the façades would appear in at least 3 images, since there were balconies and architectural details, with a distance from the camera less than 2.00m.The final system setup is presented schematically in Figure 2b.
For the alignment step, all of the imagery was used, both from the two action and the 360º cameras.360º camera images were used in order to connect the two sides of the street into the same reference system.It is important to note that, even in these tests as well as in the actual implementation of the system in Kalamoti, special targets were put and measured to be used as control and check points.The main aim was to use the minimum number of control points into the photogrammetric processing and create a single model of the street's façades by exploiting the 360º camera.
To that direction, various tests were performed to decide the minimum number of control points used in order to minimize the data capturing time and maintain the system's implementation "fast".The very promising results (Figure 3), suggest that indeed the 360º camera aligns the two façades into the same reference system very accurately, however more detailed results will be presented in the future.
In Figure 3a, a sample of these tests is presented: the result of the comparison between the sparse point cloud of the area generated using all the 39 measured GCPs, which are scattered all over the area and the sparse point cloud created using GCPs only at the crossroads, 14 in total having a minimum distance of 30m among them.(areas in red circles in Figure 3a).The lines of two cross sections (in magenta in Figure 3a) are also illustrated.The respective histogram (Figure 3d) of the sparse point clouds deviations suggest that a mean error of 0.015m arises, something that is also verified by many sections, two of which are presented below (Figure 3b, c) with their respective Gaussian histograms (Figure 3e, f).For the dense point cloud generation, initially, all the imagery was used.However, soon enough it was clear that the imagery of the 360º camera was introducing noise in the 3D dense point cloud (Figure 4a).This is also reported in Barazzetti et al., (2017).The main reason for that is that the 360º camera produces images of lower quality than the action cameras and this affects the quality of the matching procedure.This would have also affected the alignment step as check points indicate that the two façades were aligned together with less than 0.02m accuracy.However, if more accurate results are required, the 360º camera performance over the alignment step is not guaranteed.
Hence, 360º camera imagery was not used in the dense point cloud generation step.Figures 4a and b demonstrate the dense clouds produced with and without the 360º camera imagery and the conclusions are clear.
To quantify the differences between the dense point clouds, the point cloud using the control dataset collected with the Canon EOS 6D (Figure 4c) was compared with the point clouds of the developed system produced using images captured only from action cameras (Figure 4b) and images using action cameras and 360° camera (Figure 4a).The results show that a mean deviation of 0.013m is present in the first comparison while a mean error of 0.009m exists in the second one.Even these two differences are minor in respect of the final requested accuracy of the presented system and project.Point cloud quality comparisons suggest not to use the 360° imagery for generating the dense point cloud.

APPLICATION, PROCESSING AND RESULTS
Following the test and the performed evaluations, the refined system was used in Kalamoti village in order to capture the necessary imagery for generating the required orthoimages.As already stated, the selected configuration was that of Figure 2b, where the action cameras capture only one façade of the buildings (Figure 8b, c) while the 360º camera captures the whole area (Figure 8a).With this setup, it was necessary to walk through the street twice, so that both façades would be captured.The whole process lasted only a few minutes.

Data acquisition
The study area was divided into three sections, which are depicted in red, blue and green trajectories in Figure 6.Area 1 is illustrated with red colour and its length is 43m.Area 2 is depicted in blue colour and its length is 105m, while the marked with green colour Area 3, has a length of 88m long.

Image capturing process
The total area captured in each case was a closed path, 86m, 210m and 176m long respectively (Figure 9).Beginning from one point and following the route, cameras were capturing the buildings' façades of the one side of the street (Figure 7a).Then, the same route was followed backward by capturing the other side of the street.The capturing process lasted less than 35 minutes, about 20 minutes for each side.So, overall, 472 meters of building façades (about 5600m 2 ) were captured.The average GSD of the acquired dataset was 0.8mm.

Problems faced
Images were taken with the time lapse function every one second.This resulted in more images from the area in order to produce a more accurate and dense model.It is worth mentioning that the conditions of the image capturing were not the most ideal due to the state of the streets of Kalamoti and also because of the lighting.Almost in all streets of Kalamoti, there were plants that hid parts of the façades of the buildings.Moreover, in all streets, various wires are hung between the two façades of the streets at a height of less than five meters (Figure 7b).As a result, the telescopic pole needed to overcome the obstacles and so the data capturing procedure in some cases did not have a continuous flow.In addition, due to the summer period, when the images were taken, it was very difficult to find the right time during the day to take images with proper lighting.The upper half of buildings had very intense lighting from the sun's light, causing high contrast to the colours of the images between the two floors of the same building (Figure 7a, 8b).

Processing
For the production of the 3D model, more than 1400 images were used.98% of these images, (4000 x 3000 pixel resolution) were taken by the action cameras, as the remaining 2% of the images were taken by the 360º camera (5376 x 2688 pixel resolution).The standard photogrammetric (SfM-MVS) pipeline was performed in Photoscan (AgiSoft PhotoScan Professional 1.2.6, 2016).and was interrupted by point cloud post processing and continued for the final orthoimage generation.

Standard photogrammetric pipeline
Again here, as in the initial tests, for the alignment step, all images were used.In contrast, for generating the dense point cloud, 360º images were excluded since, as already presented, they introduce noise in the cloud.Following the alignment of the images, some GCPs were marked in the images of the model.The GCPs used for the final model and thus the model for generating the orthoimage were only these at the crossroads of the village, since, as already presented in Section 3.2 and Figure 3, they are enough for georeferencing and scaling the model without compromising the accuracy.In this way, fieldwork time is reduced almost in half, since traverses in the streets are avoided.The total error of these points was almost 5cm which is less than the initial specifications.

Point cloud processing
Before the mesh generation, the dense point cloud was processed in Geomagic software in order to remove unnecessary noise and artifacts like wires, plants etc.Then, the mesh was created.All stages of the three-dimensional modelling had the eventual goal of producing orthoimages on both sides of the street.The final accuracy of the orthoimages is less than 5cm and the GSD of the orthoimages was selected to be 4mm, mainly for visualizing purposes.

Point clouds evaluation step
Having completed the dense cloud and mesh generation steps, various areas of the dense cloud were evaluated using again images from a higher resolution camera in order to create a control dataset and point cloud.The camera used was the Canon EOS 6D, a full-frame DSLR camera with a 24mm prime lens.The main aim of this step was to evaluate the generated point clouds using the developed system over more accurate and reliable results.Comparisons in Cloud Compare (Figure 10a) freeware indicate that the mean error between the two dense point clouds is 0.022m (Figure 10b), including the outliers, which appear mainly in window's openings, in holes on the wall and plant covered areas in front of the façades, where the action cameras cannot generate detailed results.Then a histogram is depicted showing the number of points corresponding to each class of uncertainty.As a result, it is shown that 75% of points have a deviation from the Canon EOS 6D point cloud less than 4cm.

Results
Orthoimages were produced with a GSD size at 4 mm.Hence, the appropriate scale for the orthoimages to be presented is 1:40.However, according to their accuracy resulting from the photogrammetric processing, they are only suitable for a scale of 1:250.

CONCLUDING REMARKS AND FUTURE WORK
In this paper, a developed low-cost system was presented which was designed in order to facilitate the 3D data acquisition over large and dense building areas.The system is undergoing further investigation and in this paper, the first, very promising, outcomes were presented.
The proposed system is far less expensive than laser scanning techniques and much faster than applying traditional data acquisition process, using a DSLR camera and/or UAV systems, which in our case is almost impossible since a large number of images is needed while UAVs cannot fly due to the obstacles.However, the limited control over the 360˚ camera parameters such as focal length and distortion coefficients leads to alignment errors and less accuracy on the final results.Moreover, due to the height of the buildings and the narrow streets, the variability of illumination conditions becomes an additional problem in the matching process as well as in the orthoimage uniformity.Uniform lighting conditions and nice colours are quite difficult to get when the field of the camera is 360° (Barazzetti et al., 2017), even if the camera is used indoor or outdoor.Current results suggest that the system can perform fast 3D data acquisition to be used for deliverables of a scale of 1:200 in cases like the presented one, by using the minimum number of control points required.In order to achieve better results, effort should be done at the alignment step in order to reduce the errors caused by the 360˚ camera.However, the panoramic images proved very useful in our case for combining the two opposite façades into one system, thus reducing the control points required.It is expected that in the near future more results and comparisons with conventional methods will be presented, in order to strengthen these first results.The system is undergoing further development, in order to be combined with GNSS and Gyro sensors, evolving its autonomy and speeding up the whole process.

Figure 1 :
Figure 1: The characteristic narrow streets of Kalamoti configuration during field test 1 (a) and field test 2 (b)

Figure 3 :
Figure 3: Test performed regarding the minimum number of points as well as their distribution.The result of the comparison of the sparse point cloud of the area generated using GCPs all over the area with the one created using GCPs only in the crossroads and the two cross-sections (in purple) (a) and the respective histogram (d), In (b) and (c) typical sections of the cloud of 1.00m width are presented along with their respective histograms (e) and (f) Figure 4: (a) Dense point cloud using the action cameras and 360° cameras (b) Dense point cloud using only the action camera and (c) Dense point cloud using the DSLR camera

Figure 6 :
Figure 6: Documented streets (Image by Google Maps) Figure 7: (a) Developed system in use (b) Obstacles during image acquisition Moreover, in order to georeference the final 3D model for future applications as well as give scale and orientation to the model for the orthoimage production process, ground control points and check points were placed along the whole length and

Figure 9 :
Figure 9: A top view of the 3 documented areas

Figure 10 :
Figure 10: (a) Comparison between Dense Point Clouds from GoPro and DSLR cameras in Cloud Compare (b) Histogram of uncertain point of comparison (a)

Table 1 :
Cost and weight of system