MICRO UAV BASED GEOREFERENCED ORTHOPHOTO GENERATION IN VIS + NIR FOR PRECISION AGRICULTURE

This paper presents technical details about georeferenced orthophoto generation for precision agriculture with a dedicated selfconstructed camera system and a commercial micro UAV as carrier platform. The paper describes the camera system (VIS+NIR) in detail and focusses on three issues concerning the generation and processing of the aerial images related to: (i) camera exposure time; (ii) vignetting correction; (iii) orthophoto generation.


INTRODUCTION
In the domain of precision agriculture, the contemporary generation of aerial images with high spatial resolution is of great interest.Useful in particular are aerial images in the visible (VIS) and near-infrared (NIR) spectrum.From that data, a number of so-called vegetation indices can be computed which enable to draw conclusions on biophysical parameters of the plants (Jensen, 2007, pp. 382-393).In recent years, due to their ease-of-use and flexibility, micro unmanned aerial vehicles (micro UAVs) are increasingly gaining interest for the generation of such aerial images (Nebiker et al., 2008).This paper focusses on the generation of georeferenced orthophotos in VIS+NIR using such a micro UAV.The research was carried out within a three-year project called agricopter.The overall goal of this project was to develop a flexible, reasonably priced and easy to use system for generating georeferenced orthophotos in VIS+NIR as an integral part of a decision support system for fertilization.
We will first describe a self-constructed camera system dedicated for this task.Technical details will be presented that may be helpful in building up a similar system or to draw comparisons to commercial solutions.Then we will describe issues concerning the generation and processing of the aerial images which might be of general interest: (i) practical and theoretical considerations concerning the proper configuration of camera exposure time; (ii) a practicable vignetting correction procedure that can be conducted without any special technical equipment; (iii) details about the orthophoto generation including an accuracy measurement.
As prices could be of interest in application-oriented research, we will state them without tax for some of the components (at their 2011 level).

SYSTEM COMPONENTS
The system used to generate the aerial images and the corresponding georeference information consists of two components: a customized camera system constructed during the project and a commercial micro UAV used as carrier platform.

Camera System
There were three main goals for the camera system.

Multispectral information
The main objective was to gather information in the near-infrared and visible spectrum.The details of the spectral bands should be adjustable with filters.
Suitability for micro UAV As the camera system should be used as payload on the micro UAV, the system had to be lightweight and should be able to cope with high angular velocities to avoid motion blur or other distortions of the images.

Georeferencing information
The aerial images should be augmented with GNSS information to facilitate georeferenced orthophoto generation.
As these goals were not concurrently realizable with a single commercial solution, we decided to built up our own system from commercial partial solutions.The system was inspired by a system described in (Nebiker et al., 2008).However, it differs in its individual components and is designed for georeferenced orthophoto generation.The main components are the cameras, a GNSS receiver and a single-board-computer interfacing the previous ones.

Cameras and Lenses
We used two compact board-level cameras (UI-1241LE-C and UI-1241LE-M from IDS Imaging, costs approx.310 e and 190 e), each weighing about 16 g without lenses.The first one is an RGB camera and the second one a monochrome camera with high sensitivity in the NIR range.The infrared cut filter glass in the monochrome camera was exchanged with a daylight cut filter resulting in a sensitive range of approx.700 nm to 950 nm.An additional monochrome camera with a band-pass filter glass could be easily added to the system to augment it with a desired spectral band (see figure 4).
Both cameras have a pixel count of 1280x1024, resulting in a ground sampling distance of approx.13 cm with our flight altitude of 100 m.This is more than sufficient for the application case of fertilization support.The advantage of the small pixel count is a huge pixel size of 5.3 µm resulting in short exposure times to avoid motion blur.This is especially important on a platform with high angular velocities like a micro UAV (see also section 3.1).For the same reason it is important to use cameras with a global shutter.Flight experiments with a rolling shutter camera resulted in strongly distorted images exposed during a rotation of the UAV.
We used a compact S-mount lens (weight 19 g) suitable for infrared photography for both cameras (BL-04018MP118 from VD-Optics, costs approx.105 e each).The focal length is approx.4 mm resulting in an angle of view of approx.81 • horizontal and 68 • vertical.With our flight altitude of 100 m this corresponds to a field of view of 170 m × 136 m.The wide angle has the advantage of highly overlapping image blocks to facilitate the image registration process.
The cameras are mounted on a pan-tilt unit designed by the local company berlinVR (Germany) to ensure the cameras are pointing roughly in nadir direction throughout the flight.
Figure 1 shows raw sample images made with the described camera-lens combination.Note the variations in the test field and the barrel lens distortion.

GNSS Receiver
The system was augmented with a GNSS receiver (Navilock NL-651EUSB u-blox 6, weight 14 g, costs approx.40 e) to directly associate the images with the GNSS data.We did not use the receiver of the UAV to have a flexible stand-alone solution which potentially could be used on another UAV.

Single-Board-Computer and Software
We used a single-board-computer (BeagleBoard xM, weight 37 g, costs approx.160 e) on board the UAV to be the common interface for the cameras, the GNSS receiver and the user of the system.The board is running the Ångström embedded Linux distribution which is open source and has a broad community support.
The GNSS receiver was interfaced via USB and the UBX protocol.The cameras were connected via USB and a software interface from IDS Imaging was used which is available for the Bea-gleBoard upon request.An additional trigger line (GPIO) was connected to the cameras to simultaneously trigger both cameras.The user connects to the board via WLAN in ad hoc mode and a Secure Shell (SSH).We implemented a simple software running on the single-boardcomputer, which is started and initialized by the user via SSH.After reaching a user-defined altitude near the final flight altitude, the software starts a brightness calibration for both cameras.When finished, the exposure time is kept constant during flight.When reaching a user-defined photo altitude, the software starts image capturing which is done as fast as possible.Each photo is saved together with the GNSS data.Image capturing is stopped, when the photo altitude is left.After landing, the flight data can be transfered with WLAN or directly loaded from an SD-Card to a computer running the orthophoto generation software.

Micro UAV
We used the Oktokopter from HiSystems GmbH (Germany) as carrier platform (figure 3).For the details of this multicopter platform, we refer to the manufacturers homepage or (Neitzel and Klonowski, 2011).Throughout the project, the UAVs autonomous GPS waypoint flight functioned reliably in our flight altitude of 100 m up to mean surface wind speeds of approx.25 km/h.The flight time was restricted to approx.20 min in our conditions of use (altitude 100 m, payload 500 g, battery capacity 8000 mA h).

SPECIAL ISSUES
This section describes three special issues concerning the generation and processing of the aerial images.

Exposure Time
A key factor for the reliable use of the cameras on board the micro UAV is the exposure time.Too high exposure times cause motion blur induced by camera movements, so this defines the upper limit.The lower limit is given by the image sensor capabilities and the signal-to-noise ratio.Because we wanted to use the system reliably in a broad range of brightness scenarios (i.e. from early spring up to midsummer, in the morning and high noon, with and without clouds, over fields with different reflectance characteristics), setting up the camera exposure times was a challenge.This section will describe some practical and theoretical considerations.

Lower Limit
The first factor concerning the lower limit of the exposure time is the signal-to-noise ratio.In digital cameras the sensibility of the sensor is usually adjustable through the gain factor (or ISO factor).So it is possible to achieve very short exposure times by increasing the gain.But as this signal gain will also result in a noise gain, every reduction of exposure time via the gain will decline the signal-to-noise ratio.It is therefore preferable to use higher pixel sizes for shorter exposure times (Farrell et al., 2006) and keep the gain as low as possible.
The second factor influencing the lower limit is the capability of the image sensor.The lower limit of exposure time is usually specified for the camera.For the NIR camera used in our system we reached exposure times below 0.05 ms in the brightest conditions.Although this is theoretically possible with the camera, for some technical reasons bottom lines become brighter in global shutter mode with this short exposure times (see camera application notes from IDS Imaging).So we added a neutral filter glass (SCHOTT NG4 2 mm, costs approx.40 e including cutting) to the camera to mechanically decrease the sensitivity by a factor of approx.ten (see figure 4).This could also be a practical solution, if the lower exposure limit of the sensor is reached.3.1.2Upper Limit Camera movements during the exposure time can cause motion blur in the image.This effectively causes information loss in general and problems for the image registration process in particular.One way to avoid this is to reduce the exposure time by increasing the gain.As described in 3.1.1,this should be done only as much as necessary.Therefore it is beneficial to have an estimate of the upper limit of exposure time.Knowing this limit, the exposure time from a previous flight can be checked and the gain increased only if necessary.One easy way approaching this limit is by visually inspecting all flight images for motion blur.Another way is to compute the limit from the camera calibration data and flight dynamics measurements of the UAV.This approach will be described in the following.
Camera Calibration Data The cameras were calibrated using the Camera Calibration Toolbox for MATLAB.The toolbox is using Brown's distortion model (Brown, 1971) and offers functions for transferring points from camera frame to pixel frame and vice versa which will be important in the computation of the upper limit.

Flight Dynamics Measurements
The maximum transitional velocity |vt| max of the UAV can simply be recorded by the GNSS and is about 10 m/s in our system.To measure the maximum angular velocities of the cameras, we mounted an inertial measurement unit (IMU) directly at the cameras and recorded the data on the single-board-computer (velocities could also be approximated from UAV IMU data).Figure 5   of the angular velocities during an aggressive flight.These angular velocities are hardly reached during a normal photo flight.From this data we set the upper bounds of |ωx| max and |ωy| max to 50 • /s and |ωz| max to 30 • /s.The IMU is mounted in a way that the x-y plane corresponds to the image plane of the camera and the z axis is parallel to the optical axis.

Computation of Upper Limit
For the computation of the upper limit of exposure time to avoid motion blur, it is useful to define how much motion blur is acceptable.This depends on the application scenario.So we define b max to be the maximum distance in pixels that the content of one pixel is allowed to be shifted away by motion blur.For the computed numerical values below we set b max to one pixel.
The upper limit e max vt of exposure time with respect to translational velocity can easily be computed from |vt| max and the ground resolution of the camera rground by: With our ground resolution of approx.10 cm/pixel at a flight altitude of 100 m this results in an upper limit of approx.10 ms.At this flight altitude the influence of angular velocities on motion blur is much stronger.We will therefore neglect the influence of translational velocities in the following.
The upper limit e max ω of exposure time with respect to angular velocities can be computed from the measured angular velocity limits and the camera calibration data.Therefore, we will derive the maximal velocity of a single pixel in the pixel frame under rotation of the camera.We will first present the results for the idealized pinhole camera model and then for the distortion camera model (using the camera calibration toolbox).
For the pinhole camera model we can use the equations presented in (Atashgah and Malaek, 2012): The upper part of figure 6 shows the maximum absolute pixel velocities of the complete image area as a colormap computed with the formulas 2 and 4. For every pixel, the cameras angular velocity values (ωx, ωy, ωz) were filled with the eight possible combinations (±|ωx| max , ±|ωy| max , ±|ωz| max ).Then the maximum absolute value was taken respectively for ẋ and ẏ.It should be mentioned that representing the extreme angular velocities of the system by a combination of the three values |ωx| max , |ωy| max and |ωz| max is just an approximation.In a real system these values are usually not independent of each other, which means that the three extreme values are not reached simultaneously.So we are overestimating the true angular velocities for the sake of simplicity.
From the upper part of figure 6 we can now simply read off the maximum pixel velocity over the complete image and both dimensions and compute e max ω for the pinhole camera model: As we are using a wide-angle lens with a strong distortion (see figure 1), we will also compute this limit for the distortion camera model using the camera calibration toolbox.To compute ( ẋ, ẏ) we will simply map the pixel (x, y) from pixel frame to camera frame, rotate it for a small time interval ∆t and map it back to pixel frame.Then ( ẋ, ẏ) can be approximated by the difference quotient: By using the difference quotient we avoid working with the derivative of the mapping functions which can become quite complicated or may not be computable at all.The mapping function from camera frame to pixel frame c P C () is defined by the distortion model.The inverse function c C P () is often computed numerically when the corresponding distortion model is not algebraically invertible (Jason P. de Villiers, 2008).This is also the case for our calibration toolbox where both functions are available as MATLAB-functions.Irrespective of how the inverse is computed, for computing ( ẋ, ẏ) with equation 5 the residual r = P − c P C (c C P (P )) has to be far below the pixel shift that can be expected in ∆t.If this condition is met, ( ẋ, ẏ) can be computed by where ∆Ψ is the rotation matrix for small angles (David Titterton, 2004, p. 39) and defined by: The lower part of figure 6 shows the absolute pixel velocities as a colormap computed with the formulas 6 and 7.For every pixel (ωx, ωy, ωz) was again filled with the eight possible combinations (±|ωx| max , ±|ωy| max , ±|ωz| max ).The pixel velocity in the dark corners of the image could not be computed reliably, because the defined residual r was above a defined threshold.This is a result of the definition and implementation of c P C () and c C P ().The fact that the pixel velocities increase less strongly to the borders than in the pinhole camera model can be explained by the barrel distortion of the lenses.A wide-angle lens (with barrel distortion) is closer to an ideal fisheye lens, that keeps a constant an-gular resolution throughout the image (Streckel and Koch, 2005).So, for an ideal fisheye lens, rotations of the camera result in the same pixel velocities throughout the image.
So from the lower part of figure 6 we approximate e max ω for the distortion camera model: So, the closer the lens approaches an ideal fisheye lens, the greater the difference for e max ω will be between the pinhole camera model and the more realistic distortion camera model.
Knowing this limit, we increased the camera gain stepwise and carefully after flights to keep the exposure time below 1 ms.This way, we avoided motion blur for the brightness conditions occurring in our application scenario without increasing the camera gain more than necessary.

Vignetting Correction
The brightness of a camera image usually declines radially from the image center.This effect can have various reasons (Goldman and hung Chen, 2005) and is called vignetting.Vignetting can be particularly important for our system, if its extent differs significantly between the spectral bands.Therefore it is beneficial to measure the amount of vignetting.If necessary, the camera images can be corrected with the measured vignetting parameters.
If the camera is at hand, vignetting can be measured accurately with a single image using an integrating sphere.Here we will shortly describe a measurement procedure that can be conducted without any special technical equipment and is practicable for wide-angle cameras.

Preliminary Approaches
Many procedures for measuring the vignetting are based on complete images.One possibility is to use a reference target (approaching a Lambertian surface) under homogeneous illumination conditions and try to measure the vignetting with one image (e.g.(Edirlsinghe et al., 2001)).In practice, we had difficulties to realize the conditions of this approach.Especially with regard to the wide-angle lenses, it was hard to capture the complete reference target and assure homogeneous illumination conditions at the same time.
A more practicable approach is to use non-perfect targets and conditions (Hakala et al., 2010) and reduce all errors by summing up multiple images.In the extreme case, one can try to sum up just normal flight images (Lelong et al., 2008).According to our experience, it is hard to avoid systematic errors with such an approach.For instance, the bidirectional reflectance distribution function (BRDF) is usually not symmetric to the 0 • zenith viewing angle of the camera (Jensen, 2007, p. 369)), thus causing systematic errors in the summed up image, even if the copter is rotated around its yaw axis during the flight.Or it is hardly practicable to vary influences like sun zenith angle, target heterogeneity or UAV heading in a randomized way.

Final Approach
We adopted a procedure presented in (Debevec et al., 2004).The idea is to capture a target point repeatedly under constant illumination conditions and simply rotate the camera at a fixed position.In doing this, the target point is captured in positions spread over the whole image.Then the brightness values of all these occurrences of the target point can be collected as vignetting measurement points.Finally, a favored vignetting model function can be fitted to these measurement points.In (Debevec et al., 2004) a diffuse light source was used as target point.We used a simple black circle printed on an A3 sheet of recycled paper.The pixel values inside the circle were averaged for a single measurement point.Figure 7 shows a sample image from our series of measurement.The series was collected in a normal office environment.As we were using daylight illumination (blue sky), the images were collected in a short total time to minimize illumination variations.For this reason, and for convenience, the images were extracted from a captured video sequence using FFmpeg.The target positions were marked manually in the extracted images.
The exposure time has to be constant during capturing the images and set up in a way that zero or saturated pixel values are avoided in the target averaging area.Figure 8 shows the result of our calibration measurement series for each of the channels.Similar to (Debevec et al., 2004), we fitted the measurement points to a fourth order even polynomial a0 + a2r 2 + a4r 4 , where r is the radius from the principal point.After fitting, the polynomial was normalized with regard to a0. Figure 9 shows the fitted polynomials of all channels in one plot.As in (Edirlsinghe et al., 2001) the vignetting differs between the individual spectral bands.The peripheral brightness is reduced up to 47% and varies up to 13% between the bands.
Knowing the parameters of the vignetting model, each pixel of a captured image can be corrected easily by dividing its value by the corresponding value of the normalized polynomial (called vignetting ratio).See figure 10 for an example.

Georeferenced Orthophoto Generation
Inspired by the work of (Neitzel and Klonowski, 2011) we tested the orthophoto generation with different tools (Bundler+PMVS2, Pix4D UAV Cloud and Agisoft PhotoScan Professional).After an experimental phase we decided to use PhotoScan due to its satisfying results, ease of use and reasonable price (approx.500 e educational version).
Agisoft recommends 60% of lateral overlap and 80% of forward overlap for aerial images.Considering camera panning we used 70% of lateral overlap resulting in 50 m distance between flight lines.With our maximum frame interval of 3 s the forward overlap resulted in a maximum flight velocity of 9 m/s, which was not exceeded by the UAV used.
The software was fed with the raw images and the corresponding GNSS data.VIS and NIR orthophotos were generated separately.Although the orthophotos seemed to be self-consistent, we carried out a ground control measurement with RTK-GNSS to examine the relative and absolute accuracy.We achieved a mean absolute deviation of 3.2 m and a maximum absolute deviation of 6.1 m.This corresponds to the expectations using a non-differential GNSS on board the UAV.
As can be seen in figure 11 the main error can be ascribed to a rotation and scaling of the complete orthophoto.Therefore we repeated the accuracy measurement with a new orthophoto generated with PhotoScan using three ground control points (GCPs) from the borders of the captured area.Thereby we reduced the mean absolute deviation to 0.3 m and the maximum absolute deviation to 1.3 m.The huge absolute deviation remaining compared to (Neitzel and Klonowski, 2011) can be explained by the fact that we used natural salient points for approx.50% of the GCPs.Finding the correspondence of those points in the orthophoto can cause errors in the range of the maximum deviations measured.

SUMMARY AND DISCUSSION
The paper first presented a camera system designed for georeferenced VIS+NIR orthophoto generation which was reliably used on a micro UAV.The overall weight is about 500 g including the pan-tilt unit with a total cost of approx.1000 e. Together with the UAV described and the orthophoto generation software we achieved a reasonably priced (total cost approx 3500 e) and easyto-use overall system for generating the orthophotos.The paper outlined some key features of the camera system which could also be useful to pre-evaluate commercial solutions.
In addition, the paper presented three special issues that arose during the project and possible solutions.
The first issue described the influences on the upper and lower limit of camera exposure time and possible solutions to set up the exposure time in a suitable way.A possible computation of the upper limit from the camera calibration data and a flight dynamics measurement was presented.The described solutions could be useful if aerial images should be made in a broad range of brightness scenarios.
The second issue presented a simple procedure for vignetting correction that was used in the project.If the calibration is worth the effort depends on the application scenario.Especially in the case of orthophoto generation, where the final image is composed of multiple single images, it may be sufficient to leave the issue to the orthophoto generation software.
The third issue described the deployed orthophoto generation solution and presented an accuracy measurement of the generated orthophoto using RTK-GNSS.The mean absolute deviation of 3.2 m may be sufficient for the application case of fertilization support.If not, the accuracy could be improved significantly using three well-distributed GCPs.In summary, we were able to reliably generate VIS+NIR orthophotos throughout the fertilization season (from April to July).Despite the restricted flight range of the UAV used, we could cover an area of approx 10 ha during a single flight.Figure 12 shows a NDVI (normalized differenced vegetation index) map generated with the solutions presented in this paper to show a possible application in precision agriculture.The validation of the spectral quality of the orthophotos obtained is part of on-going research.

Figure 1 :
Figure 1: Raw RGB (left) and NIR (right) sample images of a fertilization test field (altitude approx.50 m).

Figure 2 :
Figure 2: Close-up of constructed camera system with: (bottom) two board-level cameras in pan-tilt unit; (right) GNSS receiver mounted on top of the UAV during flight; (top) single-board-computer interfacing the cameras and the GNSS receiver.

Figure 3 :
Figure 3: The camera system on board the Oktokopter.

Figure 4 :
Figure 4: NIR camera with detached camera mount and neutral filter glass.The glass can be fixed inside the camera mount in addition to the present daylight cut filter.

Figure
Figure 5: Raw gyroscope data during aggressive flight.Peaks at the end are caused by the landing procedure.

|Figure 6 :
Figure 6: x and y pixel velocities under rotation of the camera (max.angular velocity) plotted against the pixel distance from the principal point.Results plotted for pinhole camera model (top) and distortion camera model (bottom).

Figure 7 :
Figure 7: Sample image of our vignetting measurement series.The measurement target is the area inside the black circle on the A3 sheet mounted on the whiteboard.

Figure 8 :
Figure 8: Results of calibration measurement series for each channel.Measurement points from the calibration images and fitted polynomial are plotted against radius from principal point (all data normalized).

Figure 9 :
Figure 9: Fitted vignetting model of all spectral bands in one plot.

Figure 10 :
Figure 10: Two raw flight images (VIS+NIR) on the left and the corresponding vignetting corrected images on the right.

Figure 11 :
Figure 11: Orthophoto accuracy measurement with ground control points (red), corresponding RTK-GNSS measurements (yellow) and deviation vectors scaled by a factor of 10 (blue).The captured area is about 550 m × 550 m.

Figure 12 :
Figure 12: NDVI map generated with the solutions presented in the paper.The area is about 550 m × 550 m and was captured during three sequent flights (see figure 11 for corresponding RGB data).
If no camera calibration data is available, the principal point can be approximated by the center of the image.