SIMULATION-BASED DATA AUGMENTATION USING PHYSICAL PRIORS FOR NOISE FILTERING DEEP NEURAL NETWORK

LiDAR (Light Detection and Ranging) mounted with static and mobile vehicles has been rapidly adopted as a primary sensor for mapping natural and built environments for a range of civil and military applications. Recently, technology advancement in electrooptical engineering enables acquiring laser returns at high pulse repetition frequency (PRF) from 100Hz to 2MHz for airborne LiDAR, which leads to an increase in the density of 3D point cloud significantly. Traditional systems with lower PRF had a single pulse-in-air zone (PIA) big enough to avoid a mismatch between pulse pair at the receiver. Modern multiple pulses-in-air (MPIA) technology ensures multiple windows of operational ranges for single flight line and no blind-zones; downside of the technology is projection of atmospheric returns closer to same PIA zone of neighbouring ground points and more likely to be overlapping with objects of interest. These characteristics of noise compromise the quality of the scene and encourage usage of noise filtering neural network as existing filters are not effective. A noise filtering deep neural network requires a considerable volume of the diverse annotated dataset, which is expensive. We developed simulation for data augmentation based on physical priors and Gaussian generative function. Our study compares deep learning networks for noise filtering and shows performance gain on 3D U-Net. Then, we evaluate 3D U-Net for simulation-based data augmentation, which shows an increase in precision and F1-score. We also provide an analysis of the underline spatial distribution of points and their impact on data augmentation, and noise filtering.


INTRODUCTION
LiDARs have emerged as powerful mapping tools for urban planning, navigation systems, and robotics. Airborne LiDAR has revolutionized the process of data acquisition for topographical surveying to capture reality. Sometimes, the perceived quality of the scene is compromised by unwanted reality. This unwanted atmospheric data, due to sensor operation or adverse weather conditions such as fog, rain, or snow, is broadly termed as noise. Traditional systems with lower PRF can complete a survey in a single pulse-in-air zone, which avoided the mismatch of pulse pair at the receiver. As the technology advanced, gateless LiDAR sensors were introduced with higher PRF, no restriction of single window operational ranges, and no blind-zones guaranteeing high-density 3D point cloud. As PRF increases, PIA zones become narrower, which requires manufacturers to come up with algorithms to track the PIA zone automatically within a single flight-line to match laser pulse pair. The downside of the technology was atmospheric points are most likely to be projected closer to or overlap with objects of interest. Figure 1 shows the 3D point cloud acquired using Teledyne Optech's Galaxy T1000 airborne LiDAR to visualize the compromised perception of the scene due to noise. So, while the atmospheric returns always existed, at lower PRF, we could remove these atmospheric points with simple height filters, or nearest-neighbor algorithms quite effectively. But once they started mixing with the object of interest, more sophisticated algorithms become necessary. Previous works have demonstrated encouraging performance to denoise 2D images using statistical, machine learning, and deep learning methods (Goyal et al., 2020). Existing sophisticated deep learning methods such as PointCleanNet (Rakotosaona et al., 2020) have dealt with sparse noise points for meshes. There are two significant studies for noise filtering due to adverse weather conditions for autonomous driving systems, which do not deal with sensor noise (Heinzler et al., 2019) and (Stanislas et al., 2019). Developing a deep neural network for noise filtering requires a thorough investigation of the diverse annotated dataset. We not only studied airborne LiDAR technology and its operation to understand the sensor noise but also acknowledge the need for a massive annotated dataset for training a deep neural network. The collection of the new dataset and manual annotation is labor-intensive and expensive. We developed simulations to replicate the noise mechanism to analyze the dataset and augment it for training noise filtering neural network. These simulations used physical priors and Gaussian generative function for producing synthetic noise with variable density. It considers the multiple pulses in air (MPIA) technology to estimate the distance ROBS traveled by pulse beyond the maximum range before the next beam was fired (Roth and Thompson, 2008). ROBS helps in determining the proximity of each point from their respective PIA zone. This proximity can be an indication of a point belonging to the systematic noise pattern. Significant contributions of our work are as follows:

RELATED WORK
In this section, we will discuss the recent work related to 3D data augmentation. Data augmentation increases the volume and diversity of the dataset. Some of the data augmentation methods are relatively simpler such as scaling, colouring, shifting, rotating, while others are much more complex such as simulations and deep learning. We will divide this section into two significant subsections, simulation-based, and deep learning-based data augmentation (Shorten and Khoshgoftaar, 2019).

Deep Learning Networks
There are two approaches of deep learning often used for data augmentation; generative adversarial network (GAN) and adversarial training networks. PC-GAN is a novel approach to generate synthetic 3D object point cloud (Li et al., 2018). PC-GAN proposed a generic framework to use the underline distribution of data to create a point cloud for 3D models. Due to its usage of local latent variables to understand the neighborhood and spatial location of the point and global latent variable to interpret the overall shape of the object, it's not suitable for large scale point cloud generation. (Achlioptas et al., 2017) has proposed an autoencoder which produces a latent space that effectively increased the performance accuracy of GAN networks for 3D point cloud objects. PointFlow learns the distribution of shape and points and uses the invertible parametrized transformation to learn from these distributions and generate a model for syntenic data (Yang et al., 2019). This approach increases the accuracy of generating a 3D object point cloud. Adversarial examples are objects that look like real objects with few perturbations. These examples are generated using deep learning or simple geometrical manipulation to create synthetic datasets. GAN networks proposed to generate the synthetic 3D object point cloud are relevant for object detection and recognition (Shu et al., 2019). Existing frameworks lack the generation of synthetic data for large scale 3D point cloud. They also deal with convergence problems and suffer from difficulty in producing high-resolution output.

Simulations
There are different simulations developed to address the data scarcity for large scale point cloud using virtual reality and gaming. One such simulation generates synthetic scenes from the game-based environment (Yue et al., 2018). These scenes can be customized by the user, which can help boost the performance of neural networks training on a synthetic dataset. The results showed performance gain on a semantic segmentation task. (Sallab et al., 2019) proposed a hybrid technique that used real data to make simulated data more realistic using cycleGAN. It also shows that simulated data is not usually very realistic, especially by off-the-shelf opensource simulations for autonomous driving scenes. Commercial tool Blender's 3D sensor simulation plugin used to generate a large-scale 3D point cloud called SynthCity (Griffiths and Boehm, 2019a). There are multiple other data augmentation techniques based on class weights for imbalanced class distribution or random duplication of points proposed by (Griffiths and Boehm, 2019b) and (Qi et al., 2017). Research work for autonomous driving vehicles that deals with noise due to adverse weather conditions proposed augmentation model for fog and rain (Heinzler et al., 2019). It utilizes distance matrix, intensity matrix, extinction coefficient, and point scattering rate for the augmentation of rain and fog. The augmented dataset is used to train proposed neural network architecture and shows overall performance increase. Though it also caused ambiguity between rain and fog classes due to their inherently same nature. Our approach proposed a simulation to generate atmospheric points based on the principals of MPIA and increasing PRF to reflect the uniqueness of modern sensor noise.

METHODOLOGY
In this section, we will discuss our proposed synthetic noise simulations. To understand our proposed methodology, we first discuss the basic concepts of LiDAR scanning. We then introduced physical priors-based simulation and Gaussian model-based simulation for generating synthetic noise.

Airborne LiDAR Scanning:
LiDAR is built from these significant components; LiDAR sensor, a GPS receiver, and an inertial measurement unit (IMU) mounted on a vehicle (helicopter or plane) shows in Figure 2 (El-Sheimy, 2005). LiDAR scanner emits a laser pulse (echo/beam), which reflects from the target to the receiver. It calculates the 3D point using three major measurements; the position of the sensor, the direction in which the signal traveled, and the distance covered by the pulse for hitting the target. Trajectory information is acquired using a global navigation satellite system's receiver, which is mounted on the vehicle along with altitude and orientation. IMU is used to track the position of LiDAR using pitch, roll, and yaw angles. LiDAR's signal is deflected using a mirror inside the scanner, and the position of the mirror is stored on every laser pulse shot. (Rohrbach, 2015). The euclidean distance equation is used to calculate the distance R between the target and sensor.
where xp, yp, zp = coordinates of the points x l , y l , z l = coordinates of the sensor One of the most critical parameters of the LiDAR is pulse rate frequency (PRF), which is the number of beams shot in one second. MPIA technology enables the sensor to fire the next beam before receiving the last. Manufacturers have the proprietary algorithm in place to match these pulse pairs by tracking the PIA zone automatically for a single flight. The negative side of the technology is that atmospheric points are projected closer to the The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition) The increasing PRF makes this problem worse as more shots, and the same rate of atmospheric returns per shot causes more atmospheric points. Higher PRF also develops narrower PIA zones due to which atmospheric points are likely to be closer to or overlaps with regular/non-noise objects. Figure 3 shows workflow of the simulations. Both simulations used PIA LiDAR equation for estimating physical priors. P2-Simulation uses travel time of laser pulse while GM-Simulation uses distance from sensor to target.

Physical Priors-based Simulation (P2-Simulation)
The simulation-based on physical priors assumes that travel time for atmospheric points is less than target data. PRF is taken from LiDAR configuration, and the maximum sensing range RMAX is estimated by the speed of light c and PRF c 2×P RF . The simulation model takes PRF, c, regular data points, and noise density and output simulated data. Figure 4 shows how simulation estimates the travel time of the atmospheric return. Opportunity time window 1 P RF gives time limit for receiving signal to correctly match with sent signal without specific matching algorithm for laser pulses. This time window helps in calculating travel time difference ∆t between regular and synthetic noise points. ∆t is used to estimate the travel time of synthetic noise and projects it on a transition point between two PIA zones.

Gaussian Model-based Simulation (GM-Simulation)
We selected Gaussian generative models for our second simulation to fix limitations of P2-Simulation by extracting range constraints, as shown in Figure 3. The simulation model takes PRF, c, regular data points, flight trajectory, and noise density to output simulated data based on Algorithm 1. Physical priors were calculated using PIA LiDAR equation; P IA = ceil( R R M AX ) and ROBS traveled by any pulse beyond maximum range RMAX . The generative noise functions use these physical priors to estimate Gaussian parameters. The distribution of ROBS from manually annotated data is used to generate random normal distribution of given noise density, which is used in calculating the distance RSN between synthetic noise point and sensor using a PIA zone that of a particular regular/non-noise point. Synthetic noise is projected on the distance RSN along the vector from the sensor to a regular/non-noise point, as shown in Figure 5.
where NRI = Total no. of raw input points NAN = Total no. of actual noise points NSN = Total no. of synthetic noise points NREG= Total no. of points in clean point cloud

Noise Filtering Deep Neural Network
We designed an experimental study using a deep neural network 3D U-Net to observe the results of noise filtering for 3D point cloud. The selection of 3D U-Net was based on its larger receptive field, performance efficiency, and state-of-art performance for semantic segmentation over various medical imaging datasets. The architecture of 3D U-Net can be seen in the Figure 6. It shows multi-resolution features extraction and decoding it to full resolution using skip connections.
The results of experiments and simulations lead us to understand the LiDAR noise characteristics. We also compared the performance of 3D U-Net with support vector machine (SVM), denoising autoencoder (DAE) (Palla et al., 2017), and PointNet (Qi et al., 2017). 3D U-Net is a deep network that requires massive, diverse datasets. We generated a simulation-based augmented dataset and trained 3D U-Net with and without augmentation to analyze performance differences. We only selected synthetic data from second simulation as it gives us fair opportunity of comparison due to its output similarity to real dataset as shown in Figure  3.

Datasets
For simulation and noise filtering experiments, we acquired a dataset of thirteen scenes of a site using the Teledyne Galaxy T1000. Each scene contains roughly 5 million points and coverage area of approximately 1km 2 . These are all outdoor rural scenes containing forest and agricultural land. All the scenes are manually labeled into two classes noise and regular objects. Due The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition)

Simulations
We took each manually labeled scene from the dataset; separate noise and regular objects from the scenes. Physical priors-based simulation takes the regular objects or clean LiDAR file to generate the synthetic noise along the transition point of PIA-zone. Gaussian model-based simulation requires real noise data to estimate the mean µ and standard deviation σ to generate the Gaus- Figure 6: 3D U-Net Architecture sian distribution of ROBS for synthetic noise. Simulation package enables the user to either input a noisy ground truth for estimation of these parameters or manually input based on domain expertise.

Noise Filtering Using Simulation-based Data Augmentation
We randomly split the dataset into nine scenes for training and four scenes for testing. Each scene was then projected on the voxel grid, where each voxel is of size 2m 3 and contains a count of points as feature. These voxels are then inputted as the cell of 128x128x128 to the network for noise filtering. We also trained 3D U-Net with a synthetic and real dataset together, which increased our dataset to 18 scenes. Synthetic data was generated using the GM-Simulation of 5% density of noise throughout the dataset. Both experiments have similar training settings.

RESULT AND DISCUSSION
Our experiments of sensor noise simulation and utilizing it for data augmentation for noise filtering has given interesting results. GM-Simulation simulated synthetic noise closer to real noise, as shown in Figure 7. To further verify our results, we calculated probability density function for noise shown in Figure 8, which validate similar spatial distribution of noise over ROBS. We also performed experiments for noise filtering. Our first experiment with 3D U-Net shows excellent performance as compared to the SVM, DAE, and PointNet. We have compared the results on recall, precision, and F1-score for all test scenes for noise class.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition)

Recall = T rueP ositive T rueP ositive + F alseN egative
(2) P recision = T rueP ositive T rueP ositive + F alseP ositive (3) Our ablation study of noise filtering shows that 3D U-Net has the best recall and F1-score for noise class and comparable precision, which are critical for noise filtering problems. Recall gives us correctly identified noise points from correctly predicted points. The F1-score provides a balance between completeness and correctness of the predicted noise points, as shown in Table 1. These results helps us in identifying the challenges of noise filtering as compare to semantic segmentation. Main reason of the failure of other methods is their lack of larger receptive field which is critical for context learning in regard to sensor noise containing global systematic noise pattern. We then performed another experiment of 3D U-Net to observe the effects of data augmentation on the learning of the network. We trained network with synthetic and real data and tested on four real test scenes, as shown in Figure 14. We did not train our network with synthetic data by P2-Simulation. It generated noise on transition point between two PIA zones which did not show complex and random overlaps with objects of interest and would not be fair comparison. Empirically, a network that was trained using real and synthetic data outperformed vanilla training on precision and F1-score for almost all four cases. It has comparable performance on recall except for test scene 01, as shown in Figure  14 first row. Test scene 01 has denser noise as compared to other scenes, and training might have suffered from the bias due to similar noise density throughout the synthetic dataset. On the other hand, test scene 04 shows better performance on all three metrics for augmentation experiment , as shown in Table 2. It can be concluded from visualization of the results that augmentation helps high overlapping cases, as shown in row third and fourth of Figure 14. The results clearly show the problematic areas. Most of these noise and regular objects overlaps with each other or noise points clumped together, exhibiting the characteristics of complex objects such as trees or bushes. These results motivated us to observe the underlying global and local spatial distribution of points and understand the underline issues behind these results.
We calculated global spatial distribution of noise for all scenes which confirmed our assumption that noise is similarly distributed throughout the dataset. To clearly understand the problem, we decided to investigate the local spatial distribution of noise points. We generated 20m 3 voxel ten times the input voxel due to limitation of storage memory resources. Figure 9 reflects two different cases of noise and regular objects overlapping in a voxel. Graphs in Figure 10 indicate the probability density of noise points at ROBS for the voxels from Figure 9-a) and 9-b). It is evident that the highest probability of random point to be noise is lower for Figure 9-b) then Figure 9-a) due to large overlaps and position of centroid.Graphs in Figure 11 show that probability for regular point to be at observed range interval of [34-47]m is 0 for Figure 9-a) but for Figure 9-b) its lowest for 34m and its highest for an interval of [42-46]m around transition point of PIA zone that is because of points distribution in the voxel is very close to noise. We observed two more voxels that only contain noise but with different density. The normal distribution of points in Figure 13 shows that the overall probability is higher for a Figure 12-b) voxel because of the density of points but lower for Figure 12-a). We concluded that noise present in the dataset shows various characteristics such as complexity, randomness and global systematic pattern and it can be divided into three major types; type-I: sparse noise, type II: systematic noise and type III: complex noise. It can help us compare the neural network generalization and performance with respect to noise types in a later stage.    The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition)

CONCLUSION
In this work, we reviewed the technology advancement of airborne LiDAR systems and their utilization in data acquisition for scientific and commercial applications. We concluded that physical priors, along with spatial distribution of points, provide leverage in simulating synthetic noise. We showed simulationbased data augmentation can improve performance for certain cases. We also evaluated the underline global and local distribution of noise points to better understand the results obtained for noise filtering. Our analyses state the importance of differentiating the noise in types; type I: sparse noise, type II: systematic noise, and type III: complex noise. In future works, we can utilize these analyses for designing a new deep neural network for noise filtering using a physical priors-based attention module. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B2-2020, 2020 XXIV ISPRS Congress (2020 edition)