AUTOMATIC ONTOLOGY GENERATION OF BIM AND GIS DATA

: Data represented in the form of geospatial context and detailed building information are prominently nurturing infrastructure development and smart city applications. Bringing open-formats from data acquisition level to information engineering accelerates geospatial technologies towards urban sustainability and knowledge-based systems. BIM and GIS technologies are known to excel in this domain. However, fundamental level differences lie among their data-formats, which developed integration methods to bridge the gap between these distinct domains. Several studies have conducted data, process, and application-level integration, considering the signiﬁcance of collaboration among these information systems. Although integration methods have narrowed the gap of geometric dissimilarity, semantic inconsistency, and information loss yet add constraints towards achieving interoperability. Integration using semantic web technology is more ﬂexible and enables process-level integration without changing data format and structure. However, due to its developing nature and complex BIM-GIS data-formats, most approaches adapted requires human intervention. This paper presents a method, named OGGD (Ontology Generation for Geospatial Data), that implements a formal method for automatic ontology generation from XSD documents using transformation patterns following three extensive processes; ﬁrst, formalization of XSD elements and transformation patterns; the second process identiﬁes corresponding patterns explicitly, and the last process generates ontology for XSD schema. XSD elements from open-standard data models of BIM and GIS, ifcXML and CityGML, are manipulated and transformed into a semantically rich OWL model. The ontology models created can be applicable for information-based integration systems that will nurture knowledge-discovery and urban applications.


INTRODUCTION
The future of urban development relies on information collected from Architecture, Engineering, Construction and Facility Management (AEC/FM), and geospatial domains (Song et al., 2017). Building Information Modelling (BIM) and Geographic Information System (GIS) are conventional representations of these systems. Both fields have been developed for addressing respective problem domains. However, with technological advancements and user specifications, they have advanced, individually, with overlapping features. Data in each domain represent vital information, and thus bridging the gap between their heterogeneous data-sets without information loss becomes critical (Wang et al., 2019). Traditional methods for integrating BIM and GIS have surfaced issues of data incompatibility, misinterpretation, and absence of information (Liu et al., 2017). Integration using semantic web technology has shown promising results due to its natural ability towards heterogeneous data integration (Zhu et al., 2018). Though, interoperability between BIM and GIS using existing semantic integration is limited to the manual or semi-automatic process of generating ontology, one of the main building blocks of the semantic web.
Data integration systems require global schema generated from heterogeneous data sources and correspondence between schema elements based on names, data types, constraints, and semantic properties. Rich ontologies expressed by the semantics of data exploit such techniques. Thus, semantic web technique qualifies for achieving semantic interoperability by interlinking heterogeneous data sources, like BIM and GIS, and representing semantic information in a common RDF (Resource Description Framework) format (Hor et al., 2018). * Corresponding author Semantic web technologies based integration methods has enabled bidirectional information exchange for BIM and GIS (Karan et al., 2016). However, these approaches are ongoing processes, mainly manual and time-costly, to achieve interoperability between BIM and GIS (Liu et al., 2017). Especially enriching ontology definitions using data-formats for particular domain is a slow process at early stage. In contrast, multiple approaches (Hacherouf et al., 2015) are proposed for transformation of XML data-format to OWL (Web Ontology Language) schemas to find XSD (XML Schema Definition) constructs (i.e. xs:element, xs:complexType, xs:simpleType, etc.) matched with their valid OWL representations (class, ObjectProperty, Data-typeProperty, etc).
The proposed study presents a method named OGGD (Ontology Generation for Geospatial Data) that implements a formal method for automatic ontology generation from XSD documents using transformation patterns following three extensive processes. First, formalization of XSD elements and transformation patterns; the second process identifies corresponding patterns explicitly, and the last process generates ontology for XSD schema. XSD elements are extracted from the XML-based open-standard data model of BIM and GIS, ifcXML and CityGML respectively, and transformed into OWL model (Hitzler et al., 2012).

METHODOLOGY
Our proposed methodology utilizes ontology development for XML based geospatial data by extending Janus (Bedini et al., 2011) and PIXCO (Pattern Identification for XSD Conversion to OWL) (Hacherouf et al., 2019) and proposing formal automatic ontology generation of BIM and GIS data. Janus defines sets of transformation patterns for correspondence rules between XML schema and OWL schema constructs with the OWL2-RL profile (Motik et al., 2012). Using these sets of patterns, PIXCO provides an extension to generate well-structured ontologies. However, in conjunction, these approaches provide incomplete population process and overlook XSD elements, comprising list, length etc., an XSD schema. The idea for extending these approaches in OGGD is, more the presented transformation process considers XSD components, the more ontology generated is semantically rich and exhaustive.
The proposed framework is illustrated in Fig. 1 for the maximum transformation of XSD constructions to OWL. The proposed methodology implements multiple algorithms and is based on three main processes. First process utilizes XSD schema from XML documents (ifcXML and CityGML) to develop a mathematical model, FS(XS) (Formal Structure of XML Schema) (Hacherouf et al., 2019), for formal modelling of XSD constructs. Further, we also perform the formalization of transformation patterns (Bedini et al., 2011, Hacherouf et al., 2019 using the context of FCA (Formal Concept Analysis) (Ganter, Wille, 1999). The second process identifies patterns for each XSD construction and further, it amends pertinent patterns for the respective constructs. Results from the second step along with corresponding rule from transformation patterns are utilized in the next step. In the third process, OWL models are generated for each XSD construct associated with appropriately identified transformation patterns, and a composite OWL file is created for OWL models.
OGGD is a python-based generalized solution for transforming XSD schemas to OWL models using RDF Library (Krech, 2020), and consumes BIM and GIS open-formats for its potential validation. In this study we conduct an experiment on an example ifcXML schema (Fig. 2) for prototyping. Ontology model generated for Fig. 2 using OGGD architecture is validated by using Protégé (Musen, 2015) tool and shown in Fig. 5.

EXPERIMENT AND RESULTS
OGGD accepts XSD schema and performs series of operation including; extraction, formalization, identification and generating OWL model. We produced an example ifcXML schema (Fig. 2) Fig. 2 that is exported in RDF/XML (Fig. 3) and RDF/Turtle (Fig. 4)   The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2020, 2020 XXIV ISPRS Congress (2020 edition) A single XSD construct can be identified with multiple transformation patterns (e.g. C 1 in Table 1). In order to identify pertinent pattern for certain XSD construct, XSD attributes and attribute value's types orients the correct pattern.
The prototype method is also experimented for the schema specifications of IFC formats 1 and CityGML base schema 2 format. Initial results highlights that each IFC schema format have different number of XSD constructs (Fig. 6) and the time required to execute algorithm of identifying patterns (listed in Table 3) is not proportional to number of XSD constructs. Elapsed time for ontology generation of IFC schema formats, however, remains proximate. For the CityGML base schema, OGGD identifies 145 XSD constructs with minimal period of time to identifiy patterns and generate introductory ontology.
Preliminary results show ontologies generated from OGGD prototype exploits and evaluate Janus and PIXCO frameworks, highlighting axioms including class, ObjectProperty and Data-typeProperty to manifest OGGD implementation. However, significant effort is required to define an automated framework that articulates XSD schema elements to OWL, even if the transformation patterns are defined.  Figure 5. Model of ontology generated from example in Fig. 2 1 https://technical.buildingsmart.org/standards/ifc/ifc-schemaspecifications/ 2 http://schemas.opengis.net/citygml/

DISCUSSION
Indeed, generating OWL/RDF graph is a tortuous approach with critical edge cases for its syntax and reasoning. OWL API itself has support for few platforms as semantic web technologies are still developing and maturing; thus, it depends on related libraries like RDFLib. Python-based RDFLib also has a limitation of supporting OWL profiles in RDF format. For example, a list of data values as collection needs to be specified as basic list constructs (Klyne, Carroll, 2004). Hence the process of defining concepts (subjects) and properties (objects) linked with relationships (predicates) for corresponding pattern transformation from XSD structure to OWL becomes challenging. Nevertheless, the library provides you a root-level access to generate a very refined ontology with a detailed OWL profile that unfolds potential rich semantics of subjects, predicates, and objects. Figure 6. Results comparing execution time for pattern identification and ontology generation of IFC schemas for listed transformation patterns in Table 3 There are limited formal methods defined for ontology generation, and prototype work shows potential in the automation of OWL modeling. The approach of the semantic web is promising but time-consuming and still developing. Janus and PIXCO methods have relatively processed XSD schemas from the B2B (Business to Business) and UBL (Universal Business Language) domain for validation. OGGD progressively implements these approaches with focus on extending XSD elements that are not processed (like list, maxLength and minLength etc.) though are components of geospatial schemas. Additionally, there are limitations with XSD notations and no direct mapping for XML constraints into OWL. Thus, OGGD proposes to consider maximum XSD components by participating further in supplementary transformation patterns. Currently, the algorithm complexity of the OGGD implementation is polynomial, and it depends on the number of XSD constructs to be processed by each algorithm for OWL transformation. If n is the number of XSD constructs and k is a number of the nested algorithm, then the complexity of algorithms can be denoted as: where The complexity for automation of the ontology generation can be accessed as at early stage ontology development itself requires human intervention, and the process is time-costly. It involves the development, transformation, and validation of OWL  Table 3. Preparatory patterns implemented by OGGD for transforming XSD elements into to OWL schema sub-structures and further integrated with a complete model. Implementation of the proposed methodology is an ongoing process and proposed is an innovative study for implementing transformation patterns with outlined XSD elements for a prototype in Table 3. The latter feature provides an opportunity to deliver adaptive results. OGGD is designed as generalized architecture to support XSD documents with a valid schema. The architecture is also scalable for more transformation patterns and implementing or modifying corresponding rules between XSD constructs and OWL representations to generate semantically exhaustive ontologies. However, newly developed ontology models require a certain level of revision to be used further for integration knowledge generation.

CONCLUSION
In this study, we presented the OGGD method that provides potentially generalized solutions for the automatic generation of OWL from XSD schema using transformation patterns for deriving maximum XSD attributes of BIM and GIS data. We extended the three-step method for ontology development using geospatial data; XSD formalization, pattern identification, and ontology generation. The contemporary OGGD implementation results unfold rich semantics information that can be extracted by extending supplementary transformation patterns, which promises semantically rich geospatial ontologies. Future work for the study focuses on the implementation of extended transformation patterns and perform validation with completeness, consistency, and correctness of generated ontology models. Furthermore, OWL models will be investigated, and performance tests conducted to devise the complexity of algorithms and apply recommended optimizations. Generated RDF from ontology models can further nurture knowledge discovery for smart city applications like facility management and urban environment analysis, by retrieving information from cross-domain integrated RDF graphs.