FEATURE MATCHING OF HISTORICAL IMAGES BASED ON GEOMETRY OF QUADRILATERALS

This contribution shows an approach to match historical images from the photo library of the Saxon State and University Library Dresden (SLUB) in the context of a historical three-dimensional city model of Dresden. In comparison to recent images, historical photography provides diverse factors which make an automatical image analysis (feature detection, feature matching and relative orientation of images) difficult. Due to e. g. film grain, dust particles or the digitalization process, historical images are often covered by noise interfering with the image signal needed for a robust feature matching. The presented approach uses quadrilaterals in image space as these are commonly available in man-made structures and façade images (windows, stones, claddings). It is explained how to generally detect quadrilaterals in images. Consequently, the properties of the quadrilaterals as well as the relationship to neighbouring quadrilaterals are used for the description and matching of feature points. The results show that most of the matches are robust and correct but still small in numbers.


INTRODUCTION
This contribution presents an approach to match historical images in the context of a historical three-dimensional city model of Dresden.The objective is a visualization of historical images and plans in this model.For an overlapping view of model and images the knowledge of camera position and orientation (exterior orientation) is required.Basis for this research is the photo library of the Saxon State and University Library Dresden (SLUB), which contains about 2 million images of 88 institutions at this point in time.The majority of images in this archive was taken between 1940 and 1990 (deutschefotothek.de).The research focuses on historical images of buildings in the city center of Dresden.
Compared with recent images, there are a lot of factors that make an automatic image analysis (feature detection, feature matching and relative orientation) difficult.For example, image information gets lost when digitizing in low resolution.Film grain, dust particles and digitalization artifacts can occur on the images.Most of the times there exists no information about the digitalization process.Thus, if only a part of the original analogue image is scanned, the principal point may be at the border or even completely outside of the digital image.In most cases the camera used by the photographer is not known and hence, inner and exterior orientation are partly or completely unknown.Also the radiometric differences between two or more images of the same epoch are usually very large (fig.1).
Figure 1.Example that shows radiometric differences between three historical images of the same building The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium "Towards Photogrammetry 2020", 4-7 June 2018, Riva del Garda, Italy Summarizing, due to different reasons digitalized historical images show image noise that can mask the texture of the photographed object (signal).In many images and consequently in image pairs an inconvenient signal-to-noise ratio (SNR) occurs, that makes an automatic feature detection and matching difficult.
Using conventional gradient-based descriptors like SIFT (Lowe, 2004), SURF (Bay et al., 2006) and ORB (Rublee et al., 2011) for feature matching of historical images lead most of the times to few or zero correct matches.The presented approach therefore uses exclusively geometric features and semantic constraints to match two or more historical images.Robust features are in this attempt rectilinear structures in object space which can be detected easily and fast in image space as general quadrilaterals.Most of the historical buildings show a lot of rectangles as certain formations (position and arrangement of windows, stones and claddings) which can then be recognized in further images using semantic-topologic relations.

RELATED WORK
Historical images are an important source of information for humanities researcher.Against this background, tasks like the finding of depictions of specific objects, the visual comparison between different states of construction or the estimation of proportions are of relevance.These tasks are heavily related to metadata e.g. about position and orientation of images or a time of origin.Since public platforms like Europeana (europeana.eu) or Prometheus (prometheus-bildarchiv.de) as well as private platforms like Rephotos (re.photos) and Mapillary (mapillary.com)show increasing numbers of images, their varying metadata quality is often still an issue (Friedrichs et al., 2018).To improve the visualization of the data, an additional spatial approach as shown in Schindler and Dellaert, 2012 can be valuable for users.
Still most of the recent scientific projects working with historical images are using images not older than a few decades (Grün et al., 2004), (Snavely et al., 2007).Only a small number of projects rely heavily on photographs which are more than 50 years old and a lot of work is still done manually (Bräuer-Burchhardt and Voss, 2001), (Henze et al., 2009), (Siedler et al., 2011) (Gouveia et al., 2015).
The focus on this research lies on an automatic approach to orient the images relative.Later these images will be oriented absolutely in the model (semi-)automatically. Approaches like SIFT show bad results in different researches using distinctive landmarks like the Eiffel tower (Wolfe, 2015), (Ali and Whitehead 2014).Using these gradient-based feature descriptors on two historical images of the same façade in our dataset led to even worse results and thus an incorrect fundamental matrix.The focus in this attempt lies on geometric features.These geometric features instead of gradient-based features have been used in various approaches.They are ranging from geometric relationships between point features over line features through to geometric constraints between planes (van den Heuvel, 1998) (Zeng et al., 2008).
Line features are already highly developed and a lot of different approaches like LJL (Li et al., 2015), LPI (Fan et al., 2012), MSLD (Wang* et al., 2009) exist, though not all of these attempts use exclusively geometric relations but also intensity, gradient and color information that is mostly not applicable for historical images.As shown in the benchmark of Li et al., 2016 line segment cluster methods like the one presented in Wang et al., 2009 could be more suitable for historical image matching and will be tested in future research.
Rectilinear structures are used in Micusik et al., 2008 as large support regions of co-planar points which are later used for matching.Other approaches use structures in single images to calibrate the camera (Wang et al., 2008), (Li et al., 2010) or even reconstruct three-dimensional objects (Han and Zhu, 2009), (Wefelscheid et al., 2011).
The presented approach will instead focus on the relative orientation between two or more historical images with unknown inner orientation using point features described through quadrilaterals.

WORKFLOW
The workflow will be presented in the following subchapters.It is split into four different parts.The images are filtered in different ways to ease the detection of edges and consequently quadrilaterals (fig.2).In the second step the quadrilaterals are detected using two different approaches.After that the centroids are determined.For every centroid (feature point) a descriptor is calculated that describes the quadrilateral according to its neighborhood.These steps are not only done for the original image but different resized smaller images.An image pyramid with an octave  = 3 (so the original image and two smaller blurred copies) was suitable for our application since the digitalized images are most of the times around 1000 x 1000 Pixels in size.Finally, the descriptors of the quadrilaterals are matched and outliers are removed.The following steps were implemented in C++ using additional functions of the OpenCV library (opencv 3.2.0,opencv.org).

Filtering and equalization of images
For a better comparison and to improve the detection of edges the images are treated radiometrically.Edges are strengthened and image noise is weakened to detect a higher number of quadrilaterals in the next step.Therefore, a lot of different filtering methods are tested empirically on varying historical images.Highest priority is the development of an automatic approach without using manual thresholds.
Thus, the contrast of the historical images is enhanced by using histogram equalization.After this step the images are filtered with the bilateral filter (Tomasi and Manduchi, 1998).The bilateral filter preserves edges but reduces textures like stone grain or curtains in the windows.In the filtered images edges are detected using the Canny edge detector (Canny, 1986) with automatic Otsu thresholding (Otsu, 1979).The Otsu algorithm analyses the histogram of the image and sets thresholds for the Canny Algorithm splitting the image in foreground and background.The detected edges are strengthened using the morphologic operation "closing" (fig.2).

Detection of quadrilaterals
Rectangles in object space were in our investigation very robust and distinctive geometries in the historical images as these images often show complete/partial façades.These rectangles can be detected in the image space as general quadrilaterals which show very characteristic properties.They can also be easily reduced to a pointwise matching approach when using the four corners or in the first case the centroids of the quadrilateral structures.It must be said, that centroids of two homologue quadrilaterals in two projective transformed images do not need to be exactly the same points but for a first evaluation of the matching approach it was easier to count the correct matches.In the case of calculating a fundamental matrix or the relative orientation the bottom-left corner of two homologue quadrilaterals has been used for higher accuracy.
In a first approach the quadrilaterals were detected in the Canny image using template matching.Therefore, contours were calculated using "border following" (Suzuki, 1985) and all detected contours that had approximately the shape of a template rectangle were saved.The problem that showed up was that when the images were resized or with images that were showing e.g.just one window, not all quadrilaterals could be detected correctly.Some images even showed zero quadrilaterals for matching.
Thus, in the final improved approach the rough contours are also detected in the Canny image using "border following".Then, contours that can be approximated with only four points using the Douglas-Peucker algorithm (Douglas & Peucker, 1973) and show a convex shape are kept (fig.3).Just the  biggest quadrilaterals in every pyramid octave are saved.Using  = 40 was suitable for the images showing a lot of rectangles.For 3 octaves 120 quadrilaterals are found for one image and are described using the following descriptor.

Description of quadrilaterals
For the description of every general quadrilateral a descriptor was developed.Instead of using intensity-, gradient-or colorbased values the descriptor uses the properties of quadrilaterals and additionally quadrilaterals in the neighbourhood and compares these properties against each other.Thus, the neighbourhood of the quadrilaterals defines the given descriptor.The properties are mostly projective invariant (Hartley & Zisserman, 2003).Eight different values describe every quadrilateral   where  ∈ [1,  * ] .For every quadrilateral the centroids   are calculated and the  nearest neighbours of one quadrilateral (given by the smallest distance between the centroids) can be observed.
Figure 3. Canny image (left), detected closed contours using "border following" (middle) and recursive simplifying of detected contours to quadrilaterals using Douglas-Peucker algorithm (right) The first four values of the descriptor describe the location of the quadrilaterals in the neighbourhood in comparison to the centroid of the observed quadrilateral.The neighbours can therefore lie in four different quadrants.It is assumed, that if all quadrilaterals can be detected in two different images, the local neighbourhoods of two homologue quadrilaterals will be similar.For the calculation the angle   where  ∈ [1, ] between the negative y-axis and the line     ����� between the centroids is determined.The hits in every quadrant A, B, C and D are summed up and normalized with  for a better comparison with the other values of the descriptor (fig.4), (eq.1).
[ ] where 1, .j k ∈ Figure 5 shows an example of an observed quadrilateral and its 16 neighbours.Some of the quadrilaterals fall into two quadrants and consequently increment both descriptor values by 1 (eq.1).
The values 5 to 8 of the descriptor describe properties of the observed quadrilateral that are compared with the quadrilaterals in the neighbourhood.Different relationships were chosen which are local invariant to projective transformations of the image.
For the calculation of value 5 the areas   of all quadrilaterals   are compared with the areas of their  nearest neighbours.If the areas are equal (or almost equal) the value of the descriptor is incremented by 1.We call this value the "ratio of area" (ROA) (eq.2).
Value 6 performs a similar operation for parallel lines.The "ratio of parallelism" (ROP) compares whether the sides   ,   of the observed quadrilateral are parallel to their equivalent sides of the  quadrilaterals in the neighbourhood.If both sides are parallel the value is incremented by 1 (eq.3).
Value 7 shows the "ratio of aspect ratio" (ROAR).The aspect ratio   ∕   of the minimum enclosing rectangle of the observed quadrilateral is compared with the equivalent aspect ratio of the  quadrilaterals in the neighbourhood.If the aspect ratios are equal (or almost equal) the value is incremented by 1 (eq.4).
Value 8 is the "ratio of Hausdorff distance" (ROHD) and compares the Hausdorff distance of the observed quadrilaterals with their  nearest neighbours.The distances are summed up and normalized with 1000 (eq.5).
All values are normalized with  so that they are easily comparable between a range of 0 and 1.A correlation of values has to be tested in further studies.The descriptor for every quadrilateral (QD) is shown in equation 6. [

Descriptor matching
The centroids of the quadrilaterals described by QD are matched using Brute-Force matching with L 2 norm.So every descriptor in image 1 is compared to every descriptor in image 2 and the best result (the smallest L 2 norm between two descriptors) is saved (fig.6).
These results are filtered using a symmetry test.Matches from image 1 to image 2 are only accepted, if there is a match of the same points from image 2 to image 1 (fig.7).
Figure 6.Two homologue quadrilaterals in two images and their calculated neighbourhood for k=12 (marked with red dots).
Descriptor values are calculated for the two quadrilaterals and if the L 2 norm is low the descriptors and consequently the quadrilaterals are matched like in this case.
Figure 7. Feature matching between two different parts of historical images.The image shows the result of the brute-force matching and after the symmetry test for k=20 Remaining matches are used to calculate a fundamental matrix where outliers are eliminated using RANSAC (Fischler and Bolles, 1981).The observed image pairs show in relation to image size and number of quadrilaterals around 5 to 20 correct matches (fig.8).
Figure 8. Matched point after calculation of fundamental matrix and filtering using RANSAC for   [7,70] If there are incorrect matches left after the calculation of the fundamental matrix, those are most of the times just one quadrilateral away from the correct match.This happens due to the fact that not every quadrilateral in every image can be detected.For images where less than 7 quadrilaterals are detected, a calculation of a fundamental matrix is not possible and thus the algorithm will fail.Long baselines between the images or heavy perspective differences also cause the algorithm to fail most of the times.

RESULTS
As to be seen in figure 7 and 8, the results are good and robust when a lot of quadrilaterals can be detected.The matching using only geometric properties in a descriptor works very well.Radiometric differences like lightness or contrast changes can be eliminated after the quadrilaterals are detected.If only a few quadrilaterals can be detected in one or both of the images the algorithm will fail.This happens with e.g.small images (preview images), occlusions, façade parts without windows/stones and images which are not showing buildings.The table shows that the algorithm is not suitable for every type of images.The number of matches for three of the benchmark images of Li et al., 2016 are very low.Still, the remaining correct matches are most of the times very robust and could be used to calculate a fundamental matrix.Furthermore, reducing the neighbourhood with the size of  to a fixed size is difficult.In some images a small neighbourhood can be more suitable than a large one even if a large neighbourhood makes the descriptor more distinctive.We achieved the best results by not calculating the descriptors for just one neighbourhood  but for an iterating .Then, the algorithm calculates a predefined number of matches with the highest occurrence in all iterations.

CONCLUSION AND FUTURE WORK
The presented approach shows feature matching in historical images based on the geometric properties of quadrilaterals.The matches are calculated mostly invariant to projective transformation, periodic structures and a bad SNR.Therefore, the bottom-left corners of the quadrilaterals are matched using the descriptor QD consisting of 8 different values.The resulting matches are most of the times few but robust.The algorithm works very well for a specific type of historical images, which show façades with a lot of quadrilaterals.But it can also be applied on recent images of buildings or other man-made structures when quadrilaterals can be found.
In further researches the presented algorithm will be optimized.Different values could be added to the descriptor to improve the description and consequently the matching of quadrilaterals.These could be cross-ratios or projective invariant distances (Richter-Gebert, 2011).It is also planned to combine different approaches to make it possible to match more diverse historical images.Line matching or the combination with other geometric features could result in a more consistent matching results for different façades.

Figure 2 .
Figure 2. Original historical image (left), image after histogram equalization and bilateral filtering (middle), image after Canny edge detector and closing (right)

Figure 4 .
Figure 4. Starting from the observed quadrilateral in the middle, quadrilaterals in the neighbourhood (marked by a red dot) fall into one (in overlapping regions two) quadrant(s) A, B, C, or D depending on their angle .These four values are not rotation invariant but since digitalized historical images are almost always upright this fact is negligible.Additionally, a region of 5° on both sides of the axes was added to prevent false assignments.The quadrilaterals in the neighbourhood of two homologue quadrilaterals are thus in the same quadrant and the descriptor values are equal.

Figure 5 .
Figure 5. Example for an observed quadrilateral and its 16 neighbours represented with red dots marking the centroids.The quadrilaterals in the neighbourhood lie in different quadrants.

Table 1
shows different image examples (one image of an observed image pair) and their correct number of matches.Matches that are just one quadrilateral shifted are still treated as incorrect matches.

Table 1 .
Number of correct and false matches using the presented descriptor on different historical images.Additionally, results on the images of Li et al., 2016 are shown.