A CATEGORY-THEORY APPROACH FOR CONSTRUCTION ONTOLOGIES IN SUBSURFACE MASS TRANSIT

The construction and expansion of subway systems represents an important step towards better livability conditions in a rapidly urbanizing world. However, underground construction has not benefited from well-established ontologies of semantic and geometric representation, such as Building Information Modelling (which is used for standalone structures) and City Geography Markup Language (which is designed for continuous urban elements). To bridge that gap, this paper proposes a novel and highly flexible means to underpin a relevant ontology. The approach uses the ontology log, or olog, a model of knowledge representation based on Category Theory. In an olog, dependencies between objects are restricted to functional relationships (for every object there is a unique correspondence). This robust mathematical formulation allows for a more flexible, yet also informative and user-readable model of the studied entities. In this paper, the olog’s usability is demonstrated through the ontological representation of common items in the fare-control areas of two New York City metro stations. Ologs are shown to capture similar underlying structures both across different stations and within the same station. Importantly, the olog allows for further generalization to incorporate pre-existing data, as well as being a transferable framework for conceptualizations of other metro systems.


INTRODUCTION
There are approximately 220 cities worldwide served by at least one metro system (Liu et al., 2021), with more regularly added due to the saturation of aboveground street networks. In fact, according to the International Association of Public Transport (UITP), over a third of existing metro systems were built entirely since the year 2000 (2018). With further long-term benefits such as reducing air pollution (Carrier et al., 2018), the importance of underground transit is only likely to grow.
However, many cities have only recently expanded their mass transit network into the underground or do not rely on metro systems for transportation at all. Cities of the Global South are often in such scenarios due to financial constraints despite high population density (Fouracre and Gardner, 1993). To illustrate, the first metro system in Sub-Saharan Africa opened only in 2015 in Addis Ababa, and while the number of metros in Asia has risen significantly in India and China, most Southeast Asian countries still lack underground transportation (UITP, 2018). According to the International Tunneling Association, new metro systems in developing cities must be built to accommodate large passenger flows, on the order of 20,000 passengers per hour per direction (2004). Such metro facilities need more amenities and more frequent maintenance due to their high usage, increasing the complexity and costs of maintaining a metro system.
Construction and operation of metro systems are often shared responsibilities between designers, contractors, owners, and policymakers. Facilitating joint creation and management of hundreds of kilometers of tunnels and tracks and their affiliated stations is stymied by the absence of a comprehensive asset management system. Such a system must be able to accommodate incomplete data and heterogeneity between object properties in different metro stations, while providing a robust interface for object documentation. A systematic and transferable framework for asset management could also lower costs and improve the popularity of metro transportation in populous developing cities. The multiplicity of components in metro stations, incomplete or missing historical data and performance indicators, and the lack of a unified data framework pose major obstacles to achieving these goals (Mohammadi et al., 2019).
Unlike aboveground construction projects, whose administration is aided by ontologies such as Building Information Modelling (BIM) for standalone structures and City Geography Markup Language (CityGML) for city-level assets, metro systems do not entirely align with either of these asset types. Specifically, each metro station contains intricate structural and architectural components that need to be represented at the fine level of detail readily supported by the Industry Foundation Classes in BIM. However, BIM cannot readily accommodate the interconnectedness of geographically distant facilities. Conversely, while the spatial continuity in CityGML could encompass the network's interconnectedness, the traditional representation of structures, positional relationships, and structural classifications of entities at a BIM level of detail is not supported. As metro systems are usually built over many decades and by different designers and contractors, with distinctive design trends, construction techniques, and engineering standards, such representation is crucial. Additionally, during major repairs and rehabilitation of older assets, there may be major modifications to content, appearance, and function. Thus, the existence of original design information may not be up to date and cannot be easily used as a sole proxy for conceptualizing stations even when original documentation is not lacking.
The age of many of these systems also poses an extra challenge for digital representation, as most assets pre-date any form of Computer-aided Design (CAD), to say nothing of BIM. Consequently, what records exist are not readily integrated and may be outdated. Because BIM requires not only full documentation but also an explicit item-by-item description for every element, the lack of affordable and efficient digitizing techniques appropriate for metros is problematic. For example, the use of a terrestrial laser scanner in a metro is exceptionally time consuming and difficult, because of both the GPS denied environment that exists underground (which prevents automatic registration) and the plethora of line-of-sight obstructions [which necessitates many more scans than an unobstructed facility would (e.g. a gymnasium)].
To help overcome these gaps, this paper introduces a new ontological conceptualization of metro systems, which is flexible, yet precise. This representation takes the form of a mathematical object called an olog rather than that of a preexisting construction ontology.

BACKGROUND
There has been previous work in designing ontologies suitable for representing subsurface assets and metro stations. After discussing a few approaches, the ologthe basis of this proposed metro ontologyand its most relevant properties are introduced.

Ontologies in Subsurface Mass Transit
Notably, work about the adoption of BIM and CityGML techniques for asset management in subway stations tend to focus on cases of single stations, lines, or systems with a high degree of standardization. For example, Kim et al. produced a geometric modelling of one subway station in South Korea (2015) with laser scanning tools. BIM has also been shown to effectively aid the construction of prefabricated stations in China (Liu, 2021). Some subway systems such as Hong Kong (Li et al., 2013) have incorporated BIM modelling into many of their stations, this process is usually done at the time of construction and remains utilized mostly for simulation rather than asset management. All these authors readily highlight the restricted scalability of their methods, problems when there is lack of standardization for underground parcel numbers or land categories, and the difficulty of incorporating a BIM or GIS component on an already built subway system.
To partially overcome this difficulty of documenting already built underground infrastructure, Marzou and Aty propose the division of a metro station management by task by modelling electric, mechanical, and structural components separately (2012). However, their process also requires the use of as-built plans of the metro stations, which may not be readily available. Furthermore, the use of BIM alone is not sufficient for a scalable modeling of a metro system due to the city-wide nature of underground transit. Therefore, integration with CityGML has been proposed (Wang et al., 2019). To achieve that requires the extension of the IFC by subdividing entities, but authors only envisioned this with respect to utilities.
The goals of employing an olog for the conceptualization of metro systems are universality and robustness. The level of detail provided by BIM must be permitted in this model, but the new conceptualization must achieve sufficient adaptability so as to be directly applicable to multiple metro systems across the world.

Category Theory and the Olog
Category Theory is a branch of Mathematics invented in the early 1940s to provide a bridging framework between two seemingly related subfields in the discipline: Topology and Algebra. To accomplish such a goal, mathematicians looked for a mechanism that could extract the essential structure and the fundamental nature of these two fields. Category Theory, thus, provides a deeply clarifying language for existing difficult mathematical ideas. It is also useful in more modern fields such as in the evaluation of clustering algorithms (Spivak, 2014).
An olog is a category that functions as a database schema, (Spivak and Kent, 2012). A category is composed of a set of objects and a set of morphisms such that: (i) every morphism is a function between objects, (ii) for every object, an identity morphism exists, and (iii) all morphisms can be composed respecting associativity.
In an olog, objects are called entities, and morphisms are called aspects or relationships. To be an entity, real world examples of the object must exist. These examples are referred to as instances of an entity. Instances of different entities are related to each other by aspects. Ologs are often represented as directed graphs where the nodes are the entities, and the edges are the aspects.
Importantly, item (i) requires that these aspects must be functional: for each instance of an entity there corresponds a unique instance of another entity. For example, every moon orbits a unique planet. Some planets may have more than one moon, but every moon corresponds to a unique planet. Certain entities might need to be further specified or delimited to functionalize practical relationships. Whereas it is wrong to say that there is a functional relationship between "a father" and "a child" because a father could have multiple children, one could say that there is a functional relationship between "a father of an only child" and the "child", or between "a father" and "a maximal set of children who are paternal siblings". These nuances are very important when adapting ologs to real-world systems such as metro stations.
Next, two categories (hence two ologs) can be related by a functor. A functor is composed of a pair of maps: a map between objects and a map between morphisms. The map between morphisms must respect the map between objects, as well as the identity morphisms and the composition of morphisms. That is, if the objects a and b are related by a morphism g in category C, any functor from C to another category D which takes a to F(a) and b to F(b) must take g to an existing morphism between the images F(a) and F(b), denoted F(g). Functors are often used to prove different ologs have the same underlying structure. The ologs described above (fathers and sets of siblings, moons and planets) can be mapped via a functor that takes fathers to moons, sets of siblings to planets, and the aspect "is a parent of" to the aspect "orbits". In the context of ologs, a functor needs not to define a map between instances; a map between entities (i.e. the objects) and aspects is enough.
Ologs combine a user-friendly interface with mathematical formalism, thereby generating high expressivity. Functional relationships between entities allow for easy extraction of facts and groups within the olog. The language of functors (i.e. maps between categories) allows for the direct comparison between distinct ologs, which address the transferability of this ontological conceptualization. Converting an olog to a database schema can also be done formulaically, as this model is computer readable. Finally, ologs can be easily extended as new information is obtained and this process does not require expert knowledge.
Among other applications, ologs were shown to aid in documenting gene functions (Wu, 2019), improving the reproducibility of scientific experiments (Burgos, 2020), and integrating manufacturing service databases (Wisnesky et al., 2017). However, ologs have not been systematically applied to The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W4-2021 16th 3D GeoInfo Conference 2021, 11-14 October 2021, New York City, USA Engineering. Herring and associates proposed in 1990 that Category Theory could be used to better document GIS models by formalizing spatial and geometric relationships (Herring et al., 1990), but their proposal was not adopted. Recently, Mabrok and Ryan (2017) proposed using ologs to model water distribution systems and demonstrated the transferability of ologs by applying it successfully to multiple water systems.

SCOPE AND METHODS
This paper illustrates the use of ologs for documenting metro systems through building ologs for portions of the passenger areas for two New York City metro stations: the 14th Street-Union Square station and the 86th Street-Central Park West station. For the purposes of this paper, only the fare-control areas-the underground prepayment metro station regions-of the stations were considered. Such areas are designated by their aboveground access points when distinction is needed.
A dataset containing specific information about more than 1,300 objects across 8 stations in Manhattan and Brooklyn annotated with images is used as a source for building the olog of the 14th Street-Union Square station. Personal photographs of the farecontrol areas of 86th Street station are used for the second olog.
To build the ontologies, first all entities present in the farecontrol areas were listed, as they appear in the dataset with corresponding instances. For the scope of this paper, each instance received a human-readable ID (half-height turnstiles are called HHT.A, HHT.B, etc.) that uniquely identified it. Some of these entities were then grouped together to create another one. For example, a half-height turnstile and a full-height turnstile were considered both to be turnstiles, so a third entity "turnstile" was created and added to the list. Such grouping, in terms of morphisms, can be represented by a (pseudo-)identity morphism: "is a type of". Note that this "is a type of" does not formally correspond to the identity morphism, for it maps instances of two different entities rather than an instance back to itself. However, "is a type of" has identity-like properties as it vanishes in composition: if a novel "is a type of" book and every book "is written by" a set of authors, then a novel "is written by" a set of authors.
After all relevant entities were extracted and pseudo-identity morphisms were documented, essential relationships between these entities were listed. Such relationships can capture the level of detail desired by the ontologist: positional relationships could identify physical distance between objects, or higher-level rules could describe relative placement of these objects in the metro station. For the purposes of this paper, entities that have a relationship to either a turnstile or a metrocard vending machine were selected, and relationships were mostly of the latter form.
Finally, identified relationships had to be turned into morphisms. Whereas in the previous step the relationship might capture a true albeit general statement, the morphisms must capture a functional relationship. For example, the relationship "is separated from the platform by" is not a morphism between the entities "a metrocard-vending machine" and "a turnstile". In all cases considered, more than one turnstile satisfied the above relationship for every metrocard-vending machine. Then the relationship had to be refined with quantifiers and specifiers in order to define a crisper entity that is in a functional relationship with "a metrocard-vending machine": "a maximal set of turnstiles in the same fare-control area". The details of this process are shown in Figure 1.
To validate the transferability and universality of the model, the fare-control areas in the 86th Street-Central Park West station were subjected to this method. In this case, entity extraction was performed by visually inspecting photographs of the station.

14th Street-Union Square Station
In enumerating entities from the 14th Street Union Square data, there are 4 full-height turnstiles, 4 half-height turnstiles, 1 nocash metrocard-vending machine, and 2 cash-and-card metrocard-vending machines. The metrocard-vending machines can be grouped and so can the turnstiles. Between the metrocard vending machines and the turnstiles one notices the relationship "is separated from the platform by" constructed in Figure 1. Between the two types of metrocard vending machines one also recognizes a constraint, since card-only machines are placed only in fare-control areas where a cash-and-card machine exists.
The olog obtained from this information is shown in Figure 2. Instances of some of these entities and instances of the aspect "composes" are illustrated in Figure 3.

86th Street-Central Park West Station
The olog produced for the fare-control areas of the 86th Street station is the same as shown in Figure 2. Instances of some entities for this station are documented in Figure 4. The structure of the olog remains consistent between the 14th Street and the Figure 1. Turning the relationship between a metrocard vending machine and a turnstile into a morphism. Note that this process revealed the existence of another entity-"a maximal set of turnstiles in the same fare-control area"-inferreable from the original instances, and another morphism-"composes" 86th Street stations. Notably, the 14th Street-Union Square station was among the original 28 stations of the NYC metro system from the 1900 Interborough Rapid Transit system, while the 86th Street-Central Park West station opened in 1932 under the Independent Subway System (a second private company) and was renovated in 2018. Despite such construction differences, the olog encompasses similarities between the two stations.
In Figure 5, three ologs are shown. Each represents one of the three fare-control areas of the 86 th Street station: one accessible at 86th Street, one at 87th Street, and the other at 88th Street. The three separate ologs are created via inspection of the "maximal set of turnstiles in the same fare-control area" instances. As there are no metrocard vending machines in the 87 th Street fare-control area, the corresponding olog has only five entities instead of all nine shown in Figure 2. This illustrates how a reduction in the number of instances translates into a reduction in the complexity of the olog because an entity relies upon the existence of realworld examples (instances).

DISCUSSION
The first observed advantage of the olog over existing construction ontologies is ease of design. Translating collected data can be improved by automatically reading physical relationships from laser scans, images, or existing BIM documentation. The olog can be enhanced by the addition of such types of information but does not fail in their absence. For example, because footage of passenger areas in metro stations is Figure 2. Olog representing multiple types of turnstiles and vending machines on the 14th St-Union Square station, where each solid box is an entity and each arrow is an aspect. The flattening of the structure allows for the distinction between different kinds of metrocard-vending machines and turnstiles, while noting the relationships between these entities and their commonalities (via associativity). The use of quantifiers and determiners (maximal, non-empty) allows the transformation of one-to-many relationships into functional relationships. The colors refer to the further information provided in Fig. 3.

Figure 3.
Instances of turnstile-related entities and of the aspect "composes". The two instances of "maximal set of turnstiles in the same fare-control area" correspond to the two separated fare-control areas in the station. Notice that the composition of "is a type of" and "composes" returns the aspect "composes" (which confirms the definition of "is a type of" as a pseudo-identity).
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W4-2021 16th 3D GeoInfo Conference 2021, 11-14 October 2021 usually available, an olog built from a video still could have its entities extracted by an image recognition model, and the basic relationships could be provided by an image captioning model. The refinement of functional relationships must be further examined and must be ground-truthed. Notably, different ontologists might conceptualize the same metro station differently. Therefore, deciding on a set of relationships a priori could be a step towards standardizing this process.
Within the limited scope of the two stations and four fare-control areas studied in this paper, the olog achieves the goal of transferability. The different construction history of the 14th Street and the 86th Street stations demonstrates an olog's flexibility in encompassing high-level similarities between stations even when design and materials visibly differ. Further work considering the New York City metro could focus on testing transferability for the entirety of the stations, including platform areas, transfer zones, mezzanines, and shopping amenities. We note that there are three instances of "a maximal set of turnstiles in the same fare-control area" but only two instances of "a maximal non-empty set of cash-and-card metrocard-vending machines in the same fare-control area", which implies that there is a fare-control area in this station where users cannot buy a new metrocard. Indeed, the image of the aspect "is separated from the platforms by" contains only two elements. Mathematically, one can argue that this aspect is not surjective (i.e. the image is not the entire codomain) if and only if such a fare-control area exists. The same aspect would be surjective in the olog designed for the 14th St-Union Square subway station. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLVI-4/W4-2021 16th 3D GeoInfo Conference 2021, 11-14 October 2021 This concept of transferability is further advanced by the idea of functoriality. There is a trivial functor relating the olog of farecontrol areas in 14th Street to its counterpart in 86th Street, since these ologs coincide. As a counterpart, non-functoriality may play an even more important role in asset management. The three types of fare-control areas in the 86th Street station cannot be related by a one-to-one functor. Therefore, they do not serve the same function (i.e. do not have the same underlying structure). This translates to distinct passenger experiences, as a passenger who must buy a metrocard will not be able to use the 87th Street entrance. Arguably, this property of the olog is useful in asset management. For example, it may be the operator's aim to always have an accessible metro-card vending machine in the fare-control area at the 86th Street station.

CONCLUSIONS
Overall, the ologs provide a novel and suitable framework for knowledge representation in metro stations. Their main advantages are transferability, flexibility, and ease of design. The appropriate level of detail can always be incorporated, if data are available. The main limitation of the olog is the arbitrariness in defining some relationships, which could be fixed within a metro system by a priori standardization. Further work should focus on representing other areas of the stations, such as platforms and shopping amenities, and automatizing the olog design process when there are pre-existing images or data.