IMPROVING PATH QUERY PERFORMANCE IN PGROUTING USING A MAP GENERALIZATION APPROACH

pgRouting library provides functions to compute shortest path between any two points of a road network which is of great demand and also a topic of interest in the field of GIS, graph theory and transportation. To compute path in a road network, pgRouting functions process the entire road network which is a major bottleneck when it comes to routing in large road networks leading to the requirement of large server resources. A reduction/compression in the input network that is to be processed for path computation would improve the performance of pgRouting. In this study a map generalization based network model is proposed which extracts a significantly smaller subset of the road network aka skeleton which further used to divide the network into zones, that shall be selectively used in path computation. This results in processing a much smaller part of the network to compute path between any two points leading to an overall improvement in query performance of pgRouting when computing path, especially on large road networks. As part of assessment of this approach and its applicability to large road networks, the paper presents an in-depth analysis of the trade-offs between deviation in computed path and the performance gain in terms of space and time on road networks of varying sizes and topology to get a better understanding for both providing a sound proof of the utility of the proposed method and also to show its implementability within the current model of pgRouting or any other routing platforms.


INTRODUCTION
pgRouting is an open source geospatial routing library extending PostGIS enabled PostgreSQL database.pgRouting library provides a variety of routing algorithms like All Pairs Shortest Path(APSP), Shortest Path, Driving Distance, Traveling Sales Person and Turn Restricted Shortest Path(TRSP).These routing algorithms are of great demand and a topic of interest in the field of GIS, graph theory and transportation.pgRouting path algorithms process the full network to compute path between any two points.This leads to slow path computation especially in case of large road networks.A reduction/compression in network data processed for path computation should enhance the performance of pgRouting path algorithms.A number of approaches have been tried out in this context of network reduction, like network compression, graph contraction, graph partitioning and map generalization which are discussed below.(Akimov et al., 2004, Khoshgozaran et al., 2008, Shekhar et al., 2002, Suh et al., 2007, Zhang, 2006) try to compress the vector data or road networks.(Akimov et al., 2004) deals with compression of vector data by removing redundancy in the data.(Khoshgozaran et al., 2008) implements a compression technique that improves performance of vector data queries.(Suh et al., 2007, Shekhar et al., 2002) propose techniques of vector data compression which can be used to reduce storage and improve data transportation in limited bandwidth.Most of these works talk about compressing vector/road network data but do not deal with path computation on the compressed network which may lead to improved performance in path computation.(Geisberger et al., 2008) try to contract the graph by addition of shortcuts and store precomputed paths to achieve speedups in path computation.(Mo ¨hring et al., 2005, Jung and Pramanik, 1996, Chondrogiannis and Gamper, 2016) try to partition the graph into clusters and store precomputed paths to reduce the search space and improve path computation.In each of the above mentioned works, the graph is modified due to the addition of shortcuts.The precomputed paths need to be computed and stored which leads to additional storage requirements.Moreover the path extracted using the modified graph is not complete and needs expansion due to presence of shortcuts in the path.This maybe an overhead when computing longer paths in large road networks.Some of these works also require the design of a special algorithm to compute path based on the modified graph structure.
In literature, several generalization approaches for road networks have been proposed.(Thomson and Richardson, 1995, Mackaness and Beard, 1993, Jiang and Claramunt, 2004, Jiang and Harrie, 2004) propose graph theory based generalization methods.(Bjørke and Isaksen, 2005) deals with the applicability of information theory to generalization.(Thomson and Richardson, 1999) proposes a topography based generalization approach.In all the above studies focus has been mainly on generating a generalized network with reduced size while preserving its overall topography.Not much work has been carried out on exploiting this generalized structure of the network which may achieve gains in path computation.
The goal of this study is to improve the overall query performance of pgRouting by proposing a map generalization based network model that leads to processing a significantly small subset of the road network selectively, to compute path between any two points without the use of precomputed paths.The proposed approach is evaluated by carrying out an in-depth analysis of the trade-offs between deviation in computed path and the performance gain in terms of space and time on road networks of varying sizes and topology.It should be noted that the none of the above mentioned approaches in the literature are implemented in pgRouting ith level, and Econn(SG i s for path computation.Therefore the proposed approach is compared to the pgRouting Dijkstra algorithm to get a better under- The above definitions can be generalized into multiple levels.We use the notation SG i (V i , E i ) to denote the jth subgraph in the j j j standing of the utility of the proposed method and also to show j ) denotes the set of connecting edges of its implementability within the current architecture of pgRouting SG i (V i , E i ).E i as the set of all connecting edges in the ith or any other routing platforms.j j j level.

Graphs and Paths
A road network can be represented as a directed graph G = (V, E), where V denotes a set of nodes that represent road in-

Skeleton Network
The Skeleton Network is a representative network of the original network whose size is very less compared to the original network.The above definition of Gs can be generalized into multiple levels.We use the notation G i to denote the Skeleton tersections and E ⊆ V × V is the set of edges.Each edge e = (va, vb), represents a road segment that connects nodes va and vb.A weight function w: E → R assigns to each edge e = (va, vb) a weight w(e), which captures the cost of moving from va to vb, in terms of travel time or distance.A path p is an ordered set of edges e1, e2, ...., eN where ei ∈ E is an edge ∀i where i = 1, 2, ...., N. A path between nodes vx and vy is denoted by p(x→y).The containment of an edge e in a path p defined by a function σ as follows Network in the ith level.

Residual Network
The above definition can be generalized into multiple levels.We
(1) 0, otherwise.use the notation The length l(p) of a path p equals the sum of the weights for all contained edges, i.e.,

ARCHITECTURE OF PGROUTING
(2) i=1 p*(x→y) is a shortest path if there is no path p(x→y) such that l(p) < l(p*).

Subgraphs and Connectivity
A directed graph is said to be connected if every node is reachable from every other node i.e a path p exists between each and every pair of the nodes.A connected component is a subgraph in which any two nodes are connected to each other by paths, and is connected to no additional node in its corresponding supergraph.Suppose a graph G = (V, E) is divided into a set of subgraphs {SG1(V1, E1), SG2(V2, E2), ..., SGn(Vn, En)} then, where 1 ≤ i, j ≤ n and i /= j A set of connecting edges Econn for a graph G(V, E) is the set of all edges (va, vb) such that va and vb belong to two subgraphs SGa and SGb respectively where a /= b.
For each subgraph SGj (Vj , Ej ), a set of connecting edges Econn(SGj ) is defined as pgRouting follows an SQL based architecture in which the graph/network data is stored in the form of SQL tables in a PostgreSQL database.The graph data for path computation is extracted quickly and efficiently using SQL queries.Figure 1 shows the architecture of pgRouting comprising of two major components namely PostgreSQL Database and the pgRouting extension which are explained below.

PostgreSQL Database
The postgreSQL database contains the information of edges and vertices of the graph/network G(V, E) in the form of SQL tables.The schema for edge table is explained in Table 1.

pgRouting Extension
pgRouting is an extension to the PostgreSQL database which contains all the path computation algorithms.Let us say the client wants to find a path between source node vx and target node vy .The client sends a request in the form of an SQL query to the PostgreSQL server to compute a path between vx and vy by specifying a path algorithm, let us say PA.The pgRouting extension extracts the appropriate graph data from the edge table in the PostgreSQL database using SQL query.The extracted graph data is now used to find the path between vx and vy by using PA.The computed path is then returned back to the client.

Query Structure
The edges sql represents the edges of the graph G on which the path computation is performed.start vid represents the identifier of the source vx.end vid represents the identifier of the target vy .

Path Computation
The path computation procedure can be divided into 3 simple steps as illustrated in Figure 1.

Read Graph Edges In this step the function extracts the
graph specified by the client in the query.The graph corresponding to edges sql query is obtained from the edge table present in the PostGreSQL database.

Build Graph This step uses the graph extracted from
Step 1 to build a Boost C++ graph structure internally.

Compute Path
In this step path between source and target is computed by using the specified path algorithm on the internal C++ graph structure obtained from Step 2.
The pgRouting function in the server takes vx and vy as input and executes steps 1, 2 and 3 to compute path between them and returns the path to the client.The output format of the pgRouting path algorithm is given in Table 3.The output of a dijkstra query to compute path from source node 4 to target node 2 in the graph(shown in Figure 2(a) is given below.
To compute path between any two points in a network, pgRouting path algorithms use the full network which leads to performance issues when it comes to routing in large road networks.To improve path computation either the individual steps explained in 3.2.2 or their combination needs to be enhanced.In this work we focus on improving the efficiency of step (1) by a reduction or compression in the extracted network data for path computation, which naturally leads to an improvement in steps ( 2) & (3).

Preprocessing
Given a graph G = (V, E), all the disconnectivities and dangles are removed and it is ensured that the graph is well connected.Every grid is assigned an identifier i.e grid id.Each vertex is populated with a grid id indicating the grid to which it belongs.Let V j be the set of nodes that belong to the jth grid such that

Choice of Nodes
After the network G was divided into grids, nodes that belong to each grid are known.From each grid supposed to be generated, a threshold value cthreshold is chosen.
The cthreshold value eases the selection of edges based on their priorities.Let us suppose we want to select the top 10 edges of a network based on priorities.In order to do so we sort the edges according to the priorities and choose the top 10 among them.Let us say edge e t is the 10th edge which is of least priority with priority value c(e t ).In such a case, the top 10 edges can be represented by defining a threshold value cthreshold = c(e t ).Now in order to easily extract the top 10 edges from the network, we choose the edges with priority value greater than cthreshold.
Gs is initialized with an empty set and the edges e ∈ E with c(e) ≥ cthreshold are added to Gs. j a set of random nodes V j were chosen.The number of nodes 4.3.2Connectivity The resultant Gs formed in the previous chosen from each grid is proportional to the number of nodes that belong to the grid.The collection of chosen nodes from all the grids constitute the special nodes of interest.2. Add paths between the connected components until Gs becomes well connected.
The number of nodes chosen from each grid |V j | = |V j | p where p = 0.5.The grids 1, 2, 4, 5, 6, 8, 13, 16 are empty and The above mentioned steps are performed and the resultant Gs thus do not contribute towards the special nodes of interest.

Priority Definition
Dijkstra's shortest path algorithm is as shown in Figure 3 (a).Algorithm 1 is used to generate a Skeleton Network given a threshold value cthreshold.Algorithm 2 explains the procedure to make Gs well connected.was used to compute the path between every special node of interest to every other special node of interest.Each edge e is assigned a value c(e) which indicates the number of shortest paths that contain e.This value c(e) determines the importance/priority of an edge e.

5:
MakeConnected(Gs) Using the above definition, the priorities of all the edges are computed.The red colored edges in Figure 2(c) shows the edges with c(e) ≥ 28.

Skeleton Construction
Definition 3. Given a zone Zj (Vzj , Ezj ), its extended zone is defined as a subgraph Z * (V * , E * ) such that 6: return Gs

PGROUTING PATH ALGORITHM BASED ON SKELETAL MODEL
The skeleton of a road network can be used to optimize path computation by limiting the amount of network that is used for path computation which could solve the bottleneck problem of pgRouting and improve the performance of path computation.This section talks about the applicability of the Skeletal Model to the existing pgRouting Architecture to improve path computation.
After the formation of zones as discussed in Section 4.4, zone

Edge Levels
In order to generate skeletons of different sizes, a quantile classification with k intervals, is applied over c(e) values sorted in decreasing order to get a qualitative classification of edges based on their priority in the overall network.Each interval i is associ-source node vx and target node vy .

Path Computation Algorithm
The reduced graph concept explained in the previous section is used for computing path between any two nodes in the network.From Section 4.4 it is ensured that the every extended zone is well connected to the skeleton Gs.From Section 2.3, skeleton Gs is also well connected.Therefore it can be deduced that the reduced graph Gx,y is also well connected and thus contains the path between any two nodes vx and vy .Algorithm 4 illustrates the algorithm that computes path between a source node vx and target node vy using the reduced graph.

Algorithm 4 Path Computation Algorithm
1: procedure SKELETALPATHALGORITHM(vx, vy ) i threshold which denote the minimum priority value of 2: x t = z(vx) interval i where 1 ≤ i ≤ k.Every interval is termed as a level such that the ith interval denotes the ith level.Every edge e is assigned a value level(e) which denotes its priority level.Algorithm 3 is used to generate skeleton at a given level i where and Z * to which node 1 node 7 belong to respectively.From Figure 3(c) it can be noticed that the reduced graph G1,7 does not contain the extended zone Z * and therefore a reduction in the graph size can be observed which cuts down the overall search space for path algorithms.

pgRouting Skeleton Path Query
The proposed path algorithm reveals that only a subset of the The extended component Z * (1) is extracted using the query below.The ABS function in SQL is mathematical function that returns the absolute (positive) value of the specified numeric expression.
SELECT id , source , target , cost FROM edge_ table W HERE ABS ( zone_ 1 ) = 1; Similarily, the extended component Z * (1) is extracted using the graph is sufficient to compute path between any two points.This algorithm when implemented in pgRouting reduces the overhead of extracting and processing the total road network for path computation thus solving the bottleneck problem of pgRouting.Moreover it can be implemented in pgRouting easily by reusing the existing pgRouting path algorithm without any changes to the existing architecture.k new columns are added to the edge table and vertex table in the PostgreSQL database to store the zone identifier of each edge e and vertex v respectively in the ith level.The columns are named as zone i which represents the value z i (e) for each edge and z i (v) for each vertex where 1 ≤ i ≤ k.The equations below give a clear understanding of the assignment of zone identifier values to each edge e and node v in the ith level.query below SELECT id , source , target , cost FROM edge_ table W HERE ABS ( zone_ 1 ) = 2; Combining the above three individual queries the query to extract the reduced graph G1,7 is given by the following query SELEC T id , source , target , cost FROM edge_ table W HERE zone_ 1 = 0 OR ABS ( zone_ 1 ) = 1 OR ABS ( zone_ 1 ) = 2; Therefore the path query can be written as In order to easily extract the residual graph Gx,y in the SQL based architecture the connecting edges e ∈ Econn connecting Gs and Zj are given a value −j.
Given this configuration of column names and zone identifier assignment we try to understand how different components of the Algorithm 5 pgRouting Skeletal Path Query 1: procedure PGR DIJKSTRA(G, vx, vy , i) x t = z (vx) where x t = z i (vx) and y t = z i (vy ), are extracted easily using SQL queries.Figure 3(c) shows the reduced graph used for understanding such queries at level i = 1 where x = 1 and y = 7 and their corresponding zone identifiers x t = z i (1) = 1 and The skeleton G 1 is extracted using the query below SELECT id , source , target , cost FROM edge_ table W HERE zone_ 1 = 0; 3: return p Algorithm 5 illustrates the level based path computation algorithm implemented in pgRouting.The performance of network extraction from the PostgreSQL can further be improved by creating indices on the added zone columns zone i where 1 ≤ i ≤ k.The proposed model also provides a structured way of network tion comprises of the skeleton Gs and the extended zones(Z * storage in the database leading to efficient retrieval of a subset of * ) where x t = z(vx), y t = z(vy ).It should be noted that the network data for path computation.

EXPERIMENTS
The road network data of Chandigarh, Hyderabad, NYC and Belgium (see Table 4) made available by Open Street Maps is used for experiments.The road networks required for the analysis were extracted by using osm2pgrouting tool.The experiments were carried out on a 64-bit linux machine with an Intel Xeon Z400 equipped with 16 GB main memory and 8 MB L3 cache.
The experimental evaluation is divided into two sections.In the first section we evaluate the preprocessing time, skeleton sizes and extended zone sizes for k = 10.In the second section we evaluate the proposed pgRouting path computation algorithm that uses the Skeletal Model.For path evaluation we average the path computation time and path error over a set of 1000 randomly generated queries with varying path lengths.The path error we refer to is the difference in the length of the path computed using Algorithm 5 and the length of the same path computed using pgRouting Dijkstra Algorithm that uses the entire network.We divide the set of queries into 5 sets of equal size.The node pairs are generated in such a way that the distance between a node pair in skeleton at level 10 includes the entire road network which can be observed from Figure 4(a) and 5(a).This leads to an empty residual network and thus no zones are formed at level 10.Therefore the sizes of extended zones at level 10 are not shown in Figure 4(b) and 5(b).

Query Processing
Figure 4(c) and 5(c), illustrates the variation of average path computation time taken with the length of the path.On Y-axis is the path computation time in milliseconds.X-axis is numbered with the query set number q where 1 ≤ q ≤ 5.For the sake of convenience the plots are shown for skeletons of level i where i = 1, 2, 3, 4, 6.The curve with the dashed line represents the computation time on the original graph G(V, E).As the path length increases the computation time increases as more nodes and edges have to be processed in order to find the path.We can also observe that at level 2, the gain in path computation time achieved is at least 4-5 times not using more than 25 % of the total network for path computation.
Figure 4(d) and 5(d), illustrate the variation average path error with the length of the path.On Y-axis is the percentage error in path.X-axis is numbered with the query set number q where .For the sake of convenience the plots are shown for the qth set lies between (q − 1) × dmax and q × dmax where 1 ≤ q ≤ 5 5 5 skeletons of level i where i = 1, 2, 3, 4, 6.It can be seen that as dmax is the distance between the farthest node pair in the network and 1 ≤ q ≤ 5. Therefore distance between any pair of nodes in (q+1)th set is less than distance between any pair of nodes in qth set.Table 4 contains the value of dmax for each of the road networks.Table 4 also shows the combined processing times for calculating edge levels, skeleton generation and zone generation for all the 10 levels.
Chandigarh is a uniform gridded network whereas Hyderabad is a non uniform dense network.In order to highlight the applicability to different types of networks, the proposed method is applied to Chandigarh and Hyderabad road networks and the observations are explained.In order to highlight the applicability to large networks, the proposed method is applied to NYC and Belgium road networks and the observations are tabulated in Table 5.Here by size we refer to the number of edges in the graph.We can observe that the size of the skeleton increases as level i increases.This is because as the level increases more and more edges are added to the skeleton as discussed in Section 4.5.Here by size we refer to the number of edges in the extended zone.We can observe that the maximum size of zone at a level i = 2 is nearly 2.5% and 0.4% of the the path length increases the path error gradually decreases.This indicates that the proposed model is more suitable for computing longer paths.We can also observe that, at level 2, for larger distances(q ≥ 2), path error is less than 2% while not using more than 25% of the total network.The optimal skeleton sizes and their respective computation gains and path errors are averaged and tabulated in Table 5 for all the 4 datasets given in Table 4.

CONCLUSION
In this paper we present a Skeletal Model using map generalization technique to reduce the size of input network used for path computation.The proposed model is implemented in pgRouting to provide both improved path computation and structured way of storing and retrieving the network data used for path computation.A new path algorithm is proposed to compute path using the Skeleton Model by reusing the existing pgRouting Dijkstra  Results show that an average speedup of nearly 5X in path computation time was achieved while the processed network data was not more than 30% of the original road network.While these are significant gains, it has to be noted that these came at a cost of having an average path deviation error of less than 7%.The query time is faster since the reduced graph size is less which cuts down the overall search space for pgRouting path finding algorithms.
Finally, we hope that this work will aid in implementing navigational services on a low resource system too, thus leading to a paradigm shift from a traditional client-server model to the development of a client based path computation model.With the advancement in computing power and storage capacity of hand held devices, the focus should move towards utilizing these devices for path computation while reducing the dependency on network coverage or server response.
Figure 2 (a) illustrates a graph G = (V, E).The disconnectivities and dangles are represented by dotted lines.After preprocessing, the dangles are removed and the resulting graph is shown in Figure 2 (b).All the edges of G = (V, E) shown in Figure 2(b) have unit weight.
Figure 2. Computation of Edge Priorities 4.2.1 Grid based Division The graph G(V, E) is divided into g ×g grids as shown in Fig 2 (b).Every grid is assigned an identifier i.e grid id.Each vertex is populated with a grid id indicating the grid to which it belongs.Let V j be the set of nodes that belong to the jth grid such that Figure 2(b) shows that the graph G(V, E) is divided into 4 × 4 grids.The yellow colored nodes constitute the special nodes of interest.The identifier of each grid is represented in blue color.

Algorithm 1
Value Based Skeleton Network Construction Algorithm 1: procedure CONSTRUCTSKELETON(G, cthreshold) to all edges and nodes of G res indicating zj = Vzj ∪ S(Zj ) zj = E zj ∪ E conn (Z j ) Definition 4. Two extended zones Z * (V * , E * ) and the zone to which they belong.z(e) and z(v) indicate the zone to which edge e and vertex v belong to respectively.We use the notation z i (e) and z i (v) to denote the zone identifier of edge e and node v respectively in the ith level.b (V zb , E zb ) are connected only when a za za Definition 5. Let vx be the source node and vy be the target Z * * * S(Za) ∩ S(Zb) /= φ, a /= b node between which the path is to be computed.Let Zxl and Zy Given the skeleton Gs, the corresponding residual network Gres obtained and all the zones of Gres are computed.It should be l be the zones to which vx and vy belong respectively where x t = z(vx), y t = z(vy ) and x /= y.Let Gs = (Vs, Es) be the Skeleton Network of G = (V, E).The path between vx and vy is computed on the reduced graph Gx,y where noted that each of the zones Zj are well connected to the skeleton Gx,y = Z * ∪ Gs ∪ Z * (5) Gs through a set of connecting edges Econn(Zj ).The above definitions can be generalized into multiple levels.Z i denotes xl yl jth zone in the ith level.S(Z i ) denotes the set of skeletal nodes The above definition of Gx,y can be generalized to multiple levi of Z i .Econn(Z i i els where G x,y denotes the reduced graph in the ith level given j j ) denotes the set of connecting edges of Zj .Z * i j (i) denotes the extended zone of Zj .

Figure 3
Figure 3(b) shows the zones Z1, Z2 and Z3 formed as a result of partitioning Gres.The dashed edges connecting the skeleton and the zones represent the connecting edges Econn defined in Section 2.2.

Figure 3
Figure3(c) illustrates the reduced graph Gx,y i.e G1,7 used for path computation algorithm to compute path from source node 1 to target node 7 with x t = z(1) = 1 and y t = z(7) = 2.The red colored edges represent the skeletal edges Es and the graph enclosed within the dotted lines represent the extended zones Z *

Figure 4
Figure 4(a) and 5(a), shows the size of skeleton network at each level i generated as a percentage of the original network G(V, E).Here by size we refer to the number of edges in the graph.We can observe that the size of the skeleton increases as level i increases.This is because as the level increases more and more edges are added to the skeleton as discussed in Section 4.5.

Figure 4
Figure 4(b) and 5(b), shows the average and maximum size of the extended zones generated at each level i as a percentage of the original graph G(V, E).Here by size we refer to the number of edges in the extended zone.We can observe that the maximum size of zone at a level i = 2 is nearly 2.5% and 0.4% of the
Table Schema of Edges

Table 2 .
Mapping between Signature and Query Given below is the signature for pgRouting Dijkstra Algorithm.pgr_dijkstra( TEXT edges_sql , BIGIN T start_vid , BIGIN T end_ vid ) RETU RN S SET OF ( seq , path_seq , node , edge , cost , agg_ co st ) or EM PTY SET Let us try to understand the signature with a sample query given below SELECT * FROM pgr_ dijkstra ( SELECT id , source , target , cost FROM edge_ table , 4 , 2);The mapping between the algorithm signature and sample query is shown in Table2

Table 3 .
Output Format of pgRouting Dijkstra Query node BIGINT Identifier of the node in the path from start vid to end vid.edge BIGINT Identifier of the edge used to go from node to the next node in path sequence.-1 indicates the last node of the path.cost FLOAT Cost to traverse from node using edge to the next node in the path sequence.agg cost FLOAT Aggregate cost from start vid to node.SELECT * FROM pgr_ dijkstra ( SELECT id , source , target , cost FROM edge_ table , 4 , 2 ) 2

Table 4 .
Dataset Details whole network for Chandigarh and Hyderabad respectively.We also observe that the maximum and average size of the extended Dataset V E dmax Preprocessing(min) zones drops with level i.This is because as the level increases more edges are added to the skeleton Gs, thus dividing the zones in a particular level i into smaller zones, thus leading to a drop in

Table 5 .
Performance Gain of Optimal Network Skeleton