LIBRJMCMC : AN OPEN-SOURCE GENERIC C + + LIBRARY FOR STOCHASTIC OPTIMIZATION

The librjmcmc is an open source C++ library that solves optimization problems using a stochastic framework. The library is primarily intended for but not limited to research purposes in computer vision, photogrammetry and remote sensing, as it has initially been developed in the context of extracting building footprints from digital elevation models using a marked point process of rectangles. It has been designed to be both highly modular and extensible, and have computational times comparable to a code specifically designed for a particular application, thanks to the powerful paradigms of metaprogramming and generic programming. The proposed stochastic optimization is built on the coupling of a stochastic Reversible-Jump Markov Chain Monte Carlo (RJMCMC) sampler and a simulated annealing relaxation. This framework allows, with theoretical guarantees, the optimization of an unrestricted objective function without requiring any initial solution. The modularity of our library allows the processing of any kind of input data, whether they are 1D signals (e.g. LiDAR or SAR waveforms), 2D images, 3D point clouds... The library user has just to define a few modules describing its domain specific context: the encoding of a configuration (e.g. its object type in a marked point process context), reversible jump kernels (e.g. birth, death, modifications...), the optimized energies (e.g. data and regularization terms) and the probabilized search space given by the reference process. Similar to this extensibility in the application domain, concepts are clearly and orthogonally separated such that it is straightforward to customize the convergence test, the temperature schedule, or to add visitors enabling visual feedback during the optimization. The library offers dedicated modules for marked point processes, allowing the user to optimize a Maximum A Posteriori (MAP) criterion with an image data term energy on a marked point process of rectangles.


INTRODUCTION
Optimization of an objective function over a given search space is a very important and wide research topic.Optimization problems come however in very different flavors: the objective function may or may not exhibit nice properties like convexity or the search space may be a simple compact of R n , a (possibly infinite) set of discrete values, or even a combination of both: a (possibly infinite) union of spaces embeddable in R n .This article proposes an implementation of a stochastic optimization framework for optimizing arbitrary objective functions over the latter and more complex search spaces.Given the difficulty of these problems that mix combinatorial and variational aspects and the genericity of the proposed library, it does not provide a one-size-fits-all solution.Instead, it implements a theoretically sound stochastic framework that is easily and highly customizable for rapid prototyping and tuning of the optimization process.The librjmcmc is a generic C++ library (Abrahams and Gurtovoy, 2004) designed to be both heavily optimized and highly modular.
In this paper, we first briefly remind the mathematical foundations involved in the chosen stochastic framework (section 2).We then present our implementation choices to build a computationally efficient and generic library (section 3).Before concluding, we demonstrate sample uses of the library, including the discussion of the implementation of the building footprint extraction from satellite imagery proposed in (Tournaire et al., 2010) (section 4).

STOCHASTIC OPTIMIZATION
librjmcmc 's optimization is performed using a coupling of a simulated annealing relaxation (Salamon et al., 2002) with a Reversible Jump Markov Chain Monte Carlo (RJMCMC) sampler (Hastings, 1970;Green, 1995).This framework enables the stochastic minimization of a large class of energies over complex search spaces, only requiring the evaluation of the objective function, rather than e.g.derivatives or convexity properties (Descombes, 2011).

Reversible-Jump Markov Chain Monte Carlo
A Markov Chain Monte Carlo (MCMC) sampler provides a series of samples according to a given unnormalized probability distribution function.RJMCMC is an extension of MCMC that allows the configuration space Ω to be the union of spaces of varying dimensions Ω = n Ωn.Algorithm 1 (Green, 1995) consists in repeating stochastic proposition and acceptance steps to build a series of configurations (i.e.elements of the search space Ω) which stationary distribution is the desired target distribution π.Note that after a sufficient number of iterations, the running configuration Xt is independent of the initial configuration X0.
A well-known property of interest is that the stationary distribution π is not only never sampled directly (as it is the purpose of the algorithm) but also only evaluated up to a multiplicative constant, as it only appear in the term π(X t ) π(X t ) of the Green ratio.This thus enables the sampling of a probability π defined as the normalization of a non-negative integrable function f : π International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia Algorithm 1: RJMCMC sampler: Metropolis-Hastings-Green Algorithm 1 requires a set of reversible kernels (Qi) associated with normalized probabilities q(i|Xt).Each reversible kernel Qi models a simple stochastic modification of the current configuration, based only on the current configuration Xt: i.e.Qi(•|Xt) defines a probability distribution over Ω.For the correctness of the algorithm πQi must have continuity properties and Qi must be reversible: Qi(X|Y ) > 0 ⇒ Qi(Y |X) > 0 (Descombes, 2011).We will consider reversible kernels defined using the following elements (Green, 1995): • A probability pi(n |n) of applying the reversible kernel, with n, n such that Xt ∈ Ωn, X t ∈ Ω n .
• A pair of forward and backward views provide a probabilized set of concurrent extractions of a fixed length vector representation θ of the configuration.For instance it may provide one of many redundant parameterization of the configuration, or a parameterization of a stochastically chosen element of the configuration.
• A pair of forward and backward distributions φi, φ i defined over respectively R N and R N .That completes the vectors θ and θ to match the dimension of the bijective transform.
• A bijective transform Ti : R N → R P , where The kernel contribution to the Green ratio is then:

Marked Point Process
A (possibly multi-) Marked Point Process (MPP) (van Lieshout, 2000) is a stochastic unordered set of geometric objects of one or many types (e.g. points, segments, circles, ellipses or rectangles in 2D, ellipsoids or spheres in 3D).The marked point denomination arises from the usual decomposition of the object parameterization into a point of its embedding space (usually its center) and a vector of values (the marks) that completes the geometric parameterization of the object (e.g.orientation, length or radius parameters), hence the name Marked Point Process.
The librjmcmc currently focuses on (multi) MPP in both sampling and optimization contexts.When direct sampling of the MPP is not possible, as when its probability distribution function (PDF) is not samplable and not normalized, the RJMCMC algorithm 1 may be used to build a Markov Chain which distribution converges to the desired target stationary distribution.
A MPP is defined by a probabilized space (Ω, π).If K denotes the set of possible values of a single object, elements of the configuration space Ω = ∞ n=0 K n are unordered sets of a varying number n of elements in this set K. If the process is multimarked, Ω is then a Cartesian product of such spaces.A simple probabilization of this configuration space may be given by a probabilization of the set K of each object type and a discrete probabilities defined over the natural numbers N for each object type that samples the number of object n of each type.This probabilization allows a direct sampling of the configuration space Ω by sampling both the object counts, and then each object independently.In our context, we will consider more complex probabilizations π of the configuration space Ω, which sampling will require the more advanced RJMCMC framework.

Simulated Annealing
Simulated annealing is a physical process inspired by annealing in metallurgy and is widely used in many communities where global optimization is of importance (Salamon et al., 2002).At each step of the algorithm, the stationary distribution is replaced by a similar stationary distribution that is increasingly more selective around the maxima.Coupled with the RJMCMC framework, it enables the global optimization of an extremely large class of energy functions over complex search spaces of varying dimensions.The goal of the simulated annealing process is to drive the initial RJMCMC sampler from an initial probability distribution function given by an energy-agnostic probabilization of the search space (denoted as the reference process) to a target probability distribution function which support is exactly the set of global minima of the energy.To interpolate the PDF between the initial reference PDF and the final PDF, a Boltzmann distribution is usually used (see section 3.2.2),parameterized by a temperature T which decreases from +∞ to zero.The stationary RJM-CMC PDF then converges in theory to a mixture of Dirac masses at the global minima of the energy.More informally energy increases are allowed which avoid being trapped in a local minima.But as the temperature T decreases, the maximum allowable energy increase decreases.Depending on the temperature decrease rate and its schedule, and its adequacy to the energy landscape, a solution close to optimal is found in practice.

Generic Programming
Generic Programming is a paradigm that offers compile-time polymorphism through templates.It is not as flexible as object-oriented programming but is much more compiler-friendly as it exposes types at compile-time rather than at runtime.Therefore the compiler is given the opportunity to optimize and produce machine code of similar performance than a special-purpose code, while keeping the object-oriented advantage of segregating orthogonal concepts to modularize the library.This even lets us provide various implementations of each orthogonal concept, which are denoted models of these concepts.A concept may be seen as the documentation of the requirements and behavior of a particular template parameter.Concepts have been proposed but not yet accepted as a C++ extension, which would simplify both the documentation and the compiling error reporting of generic libraries.

RJMCMC Concepts
The RJMCMC sampler is implemented in the class sampler, which can be customized using the template parameters: Density, Acceptance and a list of Kernels.It offers access to statistics International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia (such as the acceptance rate of each kernel) and is used through the following member function that modifies the current configuration in place (preventing useless copies), performing one iteration of algorithm 1.The Configuration type and its Modification associated type is highly problem dependent, section 2.2 will detail them in the MPP context.The sampler does only access them through the Kernels, the reference Density and the evaluation of the energy difference ∆U they offer.T is used to introduce the energy bias (through the ∆U term) into the reference PDF, R∞ denoting the Green ratio of the reference process.metropolis_acceptance is the most common choice, using the unnormalized target distribution πe − U T .Other acceptance rules depart from the formulation of a welldefined target distribution (Salamon et al., 2002).A kernel is used to propose atomic moves to explore the configuration space.

Density
Variate Concept It provides the distribution φi that samples a fixed-length vector and evaluates its probability density.
View Concept It encodes the combinatorial aspects of the kernel and the function ψi that extract an iterator over θ values when applied forwards and constructs the new configuration based on the vector (θ , u ) resulting from the Transform when applied backwards.
Transform Concept Transformations encode the bijective differentiable mapping Ti between spaces of equal dimensions.They are thus used to describe the bijection applied during a kernel proposal.Transformations must provide apply, inverse and abs_jacobian functions, which are used respectively in the forward kernel, the backward kernel and in the Green ratio computation.The former is based on a simple std::vector while the latter embeds a boost::graph that allows the caching of unary and non-zero binary energy terms, speeding-up ∆U computations.
The only requirement on the object type is that it is able to provide an iterator over a parameterisation vector.Multi-Marked Point Processes (i.e.processes of heterogeneous object types) are supported by supplying a boost::variant over the types of objects as the Object template argument of the configuration.In this context, all library functions that handle explicitly the objects have them first dispatched through the boost::variant dispatching function so that functors like Energies are boost::variant-agnostic.

MPP Energies
The library currently supports energies that are defined as sums of unary (U1) and binary (U2) energies: Available Energy Models (Tournaire et al., 2010) constant_energy (unary and binary) image_gradient_unary_energy

E0%E1
By overloading the C++ operators, the operator Energy models allow to build an energy such as U2(x, y) = 100−area(x∩y) by intuitively writing the code 100−intersection_area_binary_energy(), resulting in a binary energy of type: minus_energy<constant_energy,intersection_area_binary_energy>

MPP Density
The reference MPP distribution is defined using the class marked_point_process::direct_sampler, which combines a discrete distribution (section 3.2.1)for each object type, probabilizing the number of objects and a uniform object sampler that is used to get independent object samples.

3.3.4
Accelerator Concept When queried with a configuration C and an object X, an accelerator provides a subset of the set of objects in C that interact with X (i.e. that have a non-zero binary energy with it).If the binary energy becomes null or negligible when objects are further apart than a given threshold, an accelerator may only report objects that intersect a suitable ball centered on the query object X.Further releases of the library may include accelerators based for instance on quad-trees, uniform grids or Kd-trees.However, the only model of this concept is currently the trivial_accelerator that returns the whole range of objects of the query configuration.
3.3.5MPP Kernels For efficiency, the energy evolution ∆U of the modifications proposed by the kernels must be easy to compute.Thus MPP Kernels usually only delete and/or insert a small number of objects.A Modification class thus keeps a vector of iterators to the death-proposed objects and a vector of birthproposed new objects.The simplest MPP kernels that are sufficient for convergence are provided: RJMCMC::uniform_birth_kernel and its reverse kernel RJMCMC::uniform_death_kernel.

Simulated Annealing Concepts
The main simulated annealing function is the following free function that performs the optimization loop until the convergence EndTest succeeds, advancing the temperature Schedule and calling the Visitor functor at each iteration: Whereas the logarithmic_schedule is the one that ensures convergence (Descombes, 2011), the geometric_schedule is usually used instead in practice for computational efficiency.The step_schedule wraps another schedule to offer constant temperature periods of τ iterations, enabling statistics computation through a Visitor.
The evaluation of a suitable initial temperature is not trivial.On the one hand, too high a temperature will unnecessarily prolong the initial phase of the simulated annealing where the sampling is unbiased by the target distribution.On the other hand, too low a temperature prevents the exploration of the whole configuration space and results in the convergence to a local minimum only.(Salamon et al., 2002) suggests considering an estimation of the variance of the energy of configurations sampled according to the reference process to estimate the initial temperature.The corresponding salamon_initial_schedule T0 estimation function is provided, which performs well in the absence of hardcore energy terms.

3.4.2
EndTest concept Models of the EndTest concept provide a simple predicate that informs the simulated annealing framework that the process has converged or whatever reason requiring to stop the simulated annealing iterations, such as a user-issued cancellation.The only requirements of this concept is to provide the following member function: Provided models are max_iteration_end_test that fails after a given number of iterations and delta_energy_end_test that fails after a given number of iterations without energy change.They may be combined using the composite_end_test model.Available EndTest Models max_iteration_end_test t ≥ tmax delta_energy_end_test ∀i > t − τ, ∆Ui = 0 composite_end_test<E0, E1, ...> or(E0(), E1(), ...)

3.4.3
Visitor concept Visitors are highly customizable objects passed to the simulated annealing process.They may be used to visualize or to gather statistics on the current state of the optimization.Available models offer console (figure 1) or GUI (based on Gilviewer and WxWidgets, figure 2) feedback, or shapefile saving of the current configuration.They may be combined using a composite_visitor.This extensibility mechanism offer the possibility to compile the same RJMCMC code with a GUI to build intuition and monitor the running algorithm and without for batch timing purposes.

Circle packing
This section details a simple example use of the librjmcmc library.The goal is to find a set of circles that minimizes a simple objective function.Each circle center lie in the unit square and its radius is constrained in an interval [rmin, rmax].Notice that the cardinality of set itself is unknown.The minimized function awards a negative constant value for each circle and penalizes every pair of overlapping circles by its intersection area.Basically, it tries to pack as many circles as possible into the unit square while minimizing their overlap.First, one needs to define the object type of the MPP, with all its geometric methods.Then, the object container is defined as well as an objective function as the sum of a unary term for each object and a binary term for each pair of objects.The set of objects is then probabilized using a Poisson reference process and uniform birth.A basic birth and death RJMCMC sampler is then defined to enable the sampling of the MPP relative to the Poisson reference process, modulated by the score of the objective function and a temperature parameter.Then header files relative to the optimization of the objective function are included.A sufficiently slow temperature decrease morphs the probability distribution function from the one of the reference process to a combination of Dirac masses at the global mimima of the objective function, thus achieving its minimization.A geometric scheduling of the temperature decrease is suboptimal but is more practical and thus more commonly used.ostream_visitor osvisitor ; tex_visitor texvisitor ( " q u i c k s t a r t " ) ; composite_visitor<ostream_visitor , tex_visitor> visitor ( osvisitor , texvisitor ) ; visitor .init ( nbdump , nbsave ) ; optimize ( c , samp , sch , end , visitor ) ; r e t u r n 0 ; } The main entry point starts by parsing the input parameters, providing default values.Once the reference Poisson process is instantiated, an empty configuration is created and initialized with the energy objects.Then, the RJMCMC sampler object samp is constructed.Finally, the simulated_annealing objects are created and the optimization is performed in place on the configuration instance c. Figure 1 shows the output of the program, and the resulting circle packing.Its runtime is 11s for 356 circles and 10 7 iterations.

Building extraction
The method introduced in (Tournaire et al., 2010)

CONCLUSIONS AND FUTURE WORK
Generic programming is very efficient at run-time and this library showcases how modular high-level code performs well at runtime (Abrahams and Gurtovoy, 2004).Once accustomed to the necessary typedef clauses, using the library reduces to simply describing the problem at a high-level rather.Its development is however quite cumbersome, due to the extra burden allocated to the compiler, and its lack of native language support.Until concepts are accepted as part of C++, there are preliminary ways to better document them and to get more legible error reporting when the library is misused.librjmcmc now only checks the validity of the Schedule concept as an input iterator.Future work would be to systematically check template parameters for their intended concept.
Implementations of close relatives of the RJMCMC algorithm, such as jump diffusion or multiple birth and death (Descombes, 2011) would share many code parts.It would be interesting to implement them in a common library to assess their comparative strengths.
In the librjmcmc , complex binary or unary energy terms may be built using expressions on simpler energies.This technique is called a Domain-Specific Embedded Language (DSEL).It would be nice to extend its use the whole energy formulation, kernels, distributions, and even transforms so that, for instance, complex transform would be able to generate their Jacobian computation code at compile-time.This would help hiding the complex type names produced by the generic programming and help offering a more user-friendly interface.boost::proto is a framework to define such DSELs, but its ease of use and runtime costs will have to be assessed.
Finally, the librjmcmc has now only be used within a 2D MPP context.Future interesting work would be to implement other Configuration, Kernel and Energy models, in embedding spaces of other dimensions (1D signals (Hernandez-Marin et al., 2007;Mallet et al., 2010), 3D ellipses (Perrin et al., 2006)...) or even in a non-MPP context, for instance to optimize a 3D triangulation over the noisy heightfield of photogrammetric Digital Surface Model.Our wish is that the open-source license of the librjmcmc library will enable reproducible research.
t e m p l a t e <t y p e n a m e Configuration> v o i d sampler : : o p e r a t o r ( ) ( Configuration &c , d o u b l e temp ) ; Kernel Strategy Pairs of reversible kernels are provided together using the following class, the forward and backward kernel being defined by swaping roles of Views, of Variates and inversing the Transform: t e m p l a t e <t y p e n a m e View0 , t y p e n a m e View1 , t y p e n a m e Variate0 , t y p e n a m e Variate1 , t y p e n a m e Transform> c l a s s kernel ; t e m p l a t e <t y p e n a m e Configuration , t y p e n a m e Sampler> b o o l o p e r a t o r ( ) ( c o n s t Configuration& configuration , c o n s t Sampler& sampler , d o u b l e temperature ) ;

/
/ called once at startup t e m p l a t e <t y p e n a m e Configuration , t y p e n a m e Sampler> v o i d begin ( c o n s t Configuration& configuration , c o n s t Sampler& sampler , d o u b l e temperature ) ; / / called once per iteration t e m p l a t e <t y p e n a m e Configuration , t y p e n a m e Sampler> v o i d visit ( c o n s t Configuration& configuration , c o n s t Sampler& sampler , d o u b l e temperature ) ; / / called once at the end t e m p l a t e <t y p e n a m e Configuration , t y p e n a m e Sampler> v o i d end ( c o n s t Configuration& configuration , c o n s t Sampler& sampler , d o u b l e temperature ) ;

Figure 1 :
Figure 1: Output of L A T E Xvisitor (left) and the console visitor (right) during a 11s optimization code of section 4.1 (356 objects)

Figure 3 :
Figure 3: From left to right (a), (b), (c) and (d).Building extraction results using a MPP of rectangles(Tournaire et al., 2010).from a Digital Surface Model (DSM) using a MPP of rectangles.This implementation, which has been included in the librjmcmc to foster reproducible research, amounts to provide the problemdependent object: the rectangle type, the energies and the kernels.We show here results of this implementation on 4 different DSMs (figure3), from 10 cm to 50cm Ground Sample Distance (GSD), acquired using photogrammetry or Lidar and of various sizes.Figures 3.a and b are on the same area which is also contained in figure3.d.The size of the dataset 3.d shows the scalability of the proposed approach, even in the absence of a proper acceleration (such as a quad tree).
Strategy This strategy provides the reference distribution, by exposing a double pdf_ratio(const Configuration&, const Modification&) member function that evaluates the distribution ratio π(X t )/π(Xt), where Xt is provided as a Configuration and X t as a proposed Modification of the current configuration Xt.The poisson_distribution and uniform_distribution are discrete probabilities over the set of non-negative integers, which are handy as basic blocks to build a Density.Discrete distributions expose the following member functions: d o u b l e pdf_ratio ( i n t n0 , i n t n1 ) c o n s t ; / / PDF ratio evaluation d o u b l e pdf ( i n t n ) c o n s t ; / / PDF evaluation i n t o p e r a t o r ( ) ( ) c o n s t ; / / PDF sampling 3.2.2Acceptance Strategy It encodes how the simulated annealing temperature Configurations MPPs are commonly used to drive a stochastic exploration in a optimization context where the objective function, hereafter called an Energy is composed as the sum of per-object unary terms and per-pair of objects binary terms.For efficiency reasons, Configurations are thus tightly coupled with Energies in this library.A Configuration is a wrapper around a container of objects.The library offers vector_configuration and graph_configuration Configuration models.
Schedule concept Models of the Schedule concept are responsible for providing the evolution of the temperature throughout the simulated annealing process.The Schedule concept is a refinement of the InputIterator concept of the C++ Standard Template Library.Available Schedule Models t e m p l a t e < c l a s s Configuration , c l a s s Sampler , c l a s s Schedule , c l a s s EndTest , c l a s s Visitor > v o i d simulated_annealing : : optimize ( Configuration& config , Sampler& sampler , Schedule& schedule , EndTest& end_test , Visitor& vis ) { d o u b l e t = * schedule ; visitor .begin ( config , sampler , t ) ; f o r ( ; !end_test ( config , sampler , t ) ; t = * (++ schedule ) ) { sampler ( config , t ) ; visitor .visit ( config , sampler , t ) ; } visitor .end ( config , sampler , t ) ; } 3.4.1 International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia has been implemented within the librjmcmc library, which extracts buildings International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XXXIX-B3, 2012 XXII ISPRS Congress, 25 August -01 September 2012, Melbourne, Australia