A FRAMEWORK FOR LOW-COST MULTI-PLATFORM VR AND AR SITE EXPERIENCES

: Low-cost consumer-level immersive solutions have the potential to revolutionize education and research in many fields by providing virtual experiences of sites that are either inaccessible, too dangerous, or too expensive to visit, or by augmenting in-situ experiences using augmented and mixed reality methods. We present our approach for creating low-cost multi-platform virtual and augmented reality site experiences of real world places for education and research purposes, making extensive use of Structure-from-Motion methods as well as 360° photography and videography. We discuss several example projects, for the Mayan City of Cahal Pech, Iceland’s Thrihnukar volcano, the Santa Marta informal settlement in Rio, and for the Penn State Campus, and we propose a framework for creating and maintaining such applications by combining declarative content specification methods with a central linked-data based spatio-temporal information system.


INTRODUCTION
Immersive technologies have the potential to significantly improve future education and research alike.Here we use xR to refer to the vast spectrum of immersive technologies that are becoming available, described with different terms including augmented reality (AR) at one end of the spectrum and virtual reality (VR) at the other (Milgram, 1994).xR, as an interactive communication medium, is seeing a resurgence in popularity thanks to massively improved and more cost-effective technologies, exemplified by commercial head-mounted displays (HMDs) such as the Oculus Rift and HTC Vive as well as mobile xR solutions based on the Google Cardboard or Daydream, the Samsung GearVR or similar smartphone-based approaches.Low-cost consumer-level xR solutions have the potential to revolutionize education in many fields by providing immersive experiences of locations that are either inaccessible, too dangerous, or too expensive to visit.While field trips are an essential part in many disciplines taught in STEM (Science, Technology, Engineering, and Math) education and foster embodied experiences of places where students can be situated into an informal learning environment, they are underutilized due to numerous constraints, a situation that current mass development in immersive technologies and new ways to create xR content promise to eliminate.Moreover, xR methods hold the potential for augmenting in-situ experiences of sites by providing access to additional information or media, and by linking to related places.On the scientific side, immersive xR research and visual analytics share a common interest in creating intuitive interfaces and digital immersive analytics workbenches (Chandler, 2015).
In our work, we are creating and evaluating educational xR site experiences as well as xR workbenches for researchers in different domains and for different platforms using low-cost but effective content creation methods based on Structure-from-Motion (SfM) techniques and 360° photography and videography.One focus of our work is on the development of an efficient framework to create and maintain these xR applications for different platforms using a central linked-data based spatio-temporal information system that stores available 3D models, media resources, and domain-specific background information, and a content authoring approach in which scenes, views, and transitions between them are declaratively defined with the help of associated queries to the central information system.In the following, we provide an overview on the employed data capture and content creation methods, on the xR experiences we are developing together with the challenges of evaluating these experiences, and on the application creation framework that is currently under development.

CONTENT CREATION
New technology and software methods and tools have drastically reduced the costs and time needed to create content for xR applications allowing even laypeople (e.g., the experts from the modeled domain) to create xR content themselves.This promises to help foster generation of content for immersive xR applications in education and research.In our work, we are making strong use of 360° photography and videography for creating xR scenes as well as SfM techniques to create 3D models.

360° Photography & Videography
360° images taken with one of the many available 360° cameras can be projected onto the inside of a sphere surrounding the user in a xR scene and, thus, allow users to view the scene from a single standpoint by turning their heads (Holmes, 2017).This approach is illustrated in Figure 1 showing a 360° image that was created with the Panono 360° camera (the camera we are mainly using in our work) and a 3D scene created from it in Unity3D 1 , a widely used game engine.The visuo-motor coupling along with the curvature of the sphere creates a 2.5D experience to users even though the images are only 2D without depth information.Moreover, stereoscopic 360° images can be used to texture two spheres with each sphere only being visible to one eye resulting in a real 3D experience.Using 360° videography, the videos are similarly projected onto the inside of the sphere but users experience continuous visual flow without actual walking.Users can still turn their head though to look around.Together, 360° photos and videos provide an extremely simple and efficient way to create entire scenes for xR applications with the obvious limitation that no direct interactions with objects shown in the scene are possible due to the lack of actual 3D models.SfM is a technique that allows for the construction of 3D models using photographic images based on photogrammetric techniques.SfM has been used in multiple fields, such as architecture, archaeology, geology, among others.The basic principle of SfM is parallax imaging similar to human vision of 3D objects: Two images of an object taken from different perspectives can generate a stereo view according to the depth created by the parallax between them through identifying the location of zero parallax (Zhao, 2017).By deriving 3D measurements from 2D images, SfM allows for the construction of models of large areas, such as a digital elevation model 1 https://unity3d.com/(DEM) of some terrain or 3D model of a site, but also for creating models of small objects, such as individual artifacts.
Agisoft PhotoScan2 is a widely used rapid 3D modeling software that intuitively stitches together photos to form 3D geometries and the main tool we are using for SfM in our work (Zhao, 2017).The workflow of PhotoScan is fully automated once the images have been collected (Agisoft, n.d.a).It comprises several stages, including image-capturing, camera alignment, point-cloud building, mesh building, and texture building.Although precise measurement and detailed modeling require higher-end photographic equipment, typical consumer level phones or cameras with resolutions greater than 5MPix can be used for SfM (Agisoft, n.d.a), which makes it a low-cost data capturing method.
Figure 2 shows an example from our work (Wallgrün, 2018) in which we used SfM to produce models in the archaeology domain to create xR site experiences for the Mayan site of Cahal Pech located in Belize, a site that is particularly important for the information it provides on the early Mayan population of western Belize (Gentle, n.d.).Since PhotoScan is able to build a model from arbitrary images as long as the same part of object is presented on at least two photos (Agisoft, n.d.a), it is possible to capture images at different distances from the object to capture both the overall structure and fine details of the object.Cahal Pech prohibited the use of unmanned aerial vehicle (UAV) to take photos from the top.However, because the site was fully accessible by foot with climbable stairs and platforms, we were able to use cameras (a Nikon DSLR camera and a Sony ILCE camera) mounted on a 27' adjustable pole to get images from different heights and construct the shown models in PhotoScan.Structures B1, B2 and B3 (shown together in Figure 2a) were constructed using about 1000 images.The smaller structure A2 (shown in Figure 2b) was derived from 387 images.
In another project (Zhao, 2017), we used imagery taken inside the Thrihnukar volcano located on Iceland and applied SfM methods to model the volcano infrastructure (see Figure 3).In this case, 280 photos were imported into PhotoScan to generate a dense point cloud for the cave of the interior of the volcano.
Because the position of each point belonging to the point cloud was identified and associated with aligned pixels of the imported photos, the color information from the photographs could be accurately projected to the point cloud maintaining vividness and precise geometric features of the volcano, making it suitable for a virtual site experience and to conduct 3D measurements of large-scale geologic entities remotely in a VR setup (see Section 3.2).
PhotoScan can also perform distance, area, and volume measurement.In research, these kinds of measurements often serve as a basis to the study of the site/object.To achieve this, the model should be georeferenced using precise control points.
The model can also be properly scaled to the actual size using reference distance in cases (e.g., when georeferencing is not possible) but georeferenced models are preferable because they are more geometrically precise (Agisoft, n.d.b).As we describe in Section 3.2, georefencing the model also made it easy to combine it with other geodata, in our case LiDAR data collected with a laser scanner.In the different projects we are integrating heterogeneous datasets (i.e., tabular data, 360° photos and videos) with 2D geospatial datasets and map visualizations and with 3D photorealistic models of real-word features to create geo-visual immersive learning platforms and analytical workbenches (Figure 5).
In the following, we provide a brief overview on these applications and how they were created from the content discussed in the previous section, and we discuss the challenges of evaluating such low-cost xR site experiences empirically to assess their suitability and continuously enhance and extend them.different GUI elements and 3D objects placed in the scene to navigate between the image/video locations either in a predefined order or based on spatial adjacency with arrows pointing towards neighbored locations.In addition, for certain image locations, it is possible to switch from a 360° view of how the place currently looks to one showing a future vision for the place designed by the students participating in the studio.In the mobile version, these interactive elements are operated via gaze control, while in the HTC Vive version the handcontrollers of the Vive can be to either touch or point at the interactive elements to trigger them.

HTC Vive VR Experience of Thrihnukar Volcano
The VR application shown in Figure 4  and immersive learning experience.However, evaluating xR experiences and technology is a significant challenge.We, therefore, briefly discuss some of the challenges and our approach to xR evaluation in the next section.

Evaluating xR Site Experiences
With any development of novel tools, it is important to understand fundamentally how a human user is accommodated.Evaluation of xR experiences presents a number of challenges however.Traditional issues of usability including ease of use, future intention to use, and usefulness (Davis, 1989) always remain an underlying component of evaluating technology mediation, in this case mediation of a site experience.The next challenge comes from the nature of xR.Unlike traditional interfaces, xR also needs to be evaluated based on features of the experience itself, presence (Slater 1999), or being in the space, embodiment (Kilteni, 2012), and interactivity (Sundar, 2004).These concepts play a crucial role in measuring the success of an immersive technology to engage and involve a human user in simulated content.Lastly, there is the challenge of capturing how much knowledge is derived.Measures of comprehension provide insight into the success of xR conveying information in an understandable way to a human user, enabling transfer of that knowledge into real application.
Challenges also exist in how evaluation is conducted on xR.xR consists of a number of components which vary depending on the system, whether it is AR or VR, or if it uses a Google Cardboard or an HTC Vive.These distinct differences present a challenge of identifying what characteristics are impacting any evaluation of a system.Considering xR could consist of any number of characteristics, it is necessary for us to examine each system by its key components.We aim to build on this idea (Oprean, 2017), referred to as the foundational approach (Oprean, 2014), to better distill a specific xR system into its key components (i.e.field of view, agency, etc.) to identify any individual contributions of a component on each measure.Using this approach, we aim to establish a better understanding of xR by identifying which characteristics perform better than others.

APPLICATION CREATION AND DEPLOYMENT FRAMEWORK
The methods described in Section 2 allow for creating content for xR applications comparatively cheaply and efficiently.However, often a large amount of manual work is still required to create and maintain the actual xR experiences, typically within a game engine such as Unity3D.Since in our work we are creating xR applications for a large number of different devices and setups including consumer-level HMDs such as the HTC Vive and Oculus Rift, mobile devices such as Android and iOS based mobile phones in combination with Cardboard devices, AR applications for tablets, and WebVR based web presentations, we are working on a framework that simplifies the workflows needed and makes the creation of new builds much more efficient.
Figure 6 illustrates the approach taken in our framework: All collected data, media resources (including 360° photography and videography), created 3D models, and general background information about the domain are stored in a central linked data based information system.A stSPARQL (Kyzirakos, 2012) based query interface allows for querying the central information system with mixed semantic-spatio-temporal queries such as 'Provide all 3D Models of Historic Artifacts found within 100m of Cahal Pech that are from the Postclassic Period' using both quantitative ('within 100m') and qualitative ('from') spatial and temporal relations.stSPARQL is a spatiotemporal extension of the SPARQL query language and provides support for qualitative spatial relationships (e.g., topological relations from the 9-Intersection Model (Egenhofer, 1991)) and temporal relationships (e.g., relations from Allen's Interval Algebra (Allen, 1983)).The stSPARQL query for the above query example is given in Figure 7.The declarative content description file provides a generic specification of the xR application in terms of different scenes and transitions between them and potentially based on queries to the information system.To give an example: A simple site experience similar to the Santa Marta one from Section 3.1 can be defined as consisting of (1) an overview scene in which each 360° image returned by a particular query is represented by a simple sphere object located on a 2D map of the modeled area and (2) n image view scenes, one for each of the 360° images returned by the query, using the approach described in Section 2.1 of employing the image as the texture of the inside of a sphere surrounding the camera.The defined transitions would state that interacting with a sphere on the overview map would switch to the image view scene of that particular 360° image, while in the image view certain interactions (e.g., interacting with a button in the navigation menu) would cause a transition back to the overview scene.
The central application creation tool (realized as an editor script inside Unity3D) reads the content specification and interprets it to create the xR experiences for the different platforms.It incorporates models and media from the central information system as specified to create the different scenes, produces the designated interactive elements such as exploration tools and GUI components, and establishes the links between different scenes based on the transitions described in the specification file.Moreover, it adds build specific components for the different xR platforms to the applications such as platformspecific assets, particular camera setups for the different platforms, different interaction tools, etc.At the moment, the specification is a simple JSON file with embedded queries and still limited expressivity that is continuously being extended.In the future, creating the specification of xR experiences will be supported by visual tools running inside the Unity3D editor.
The ontology required for organizing and querying the linked data storage needs to cover general concepts and background knowledge from the application domain as well as concepts and relationships related to observation data, models, and media resources.We have recently started to investigate connecting the domain specific knowledge in our application domains to existing upper ontologies such as the CIDOC Conceptual Reference Model (CRM) ISO standard (Doerr, 2003) and Dolce (Borgo, 2010).While the application creation framework described here currently only exists as a very first prototype, the declarative content authoring approach based on linked data and mixed semantic-spatio-temporal querying abilities is very flexible and has already saved us large amounts of work in maintaining our different xR applications.Nevertheless, there exist challenges in the spatio-temporal and ontological modeling needed to apply this approach in complex domains, for instance in the area of archeology and cultural heritage (Belussi, 2014) or geoscientific domains such as the Volcano (Zhao, 2017) application from Section 3.2, to be addressed as part of future work.

CONCLUSIONS
We presented our approach to creating low cost immersive site and field trip like experiences, strongly based on SfM photogrammetry methods and 360° photography and videography.While some of the described projects are still under development, the approach has already proven to be very effective in practice from the perspective of creating content for xR applications comparatively cheaply and quickly.Remaining challenges we discussed pertain to the evaluation of the created xR experiences to demonstrate their effectiveness in education and research, and to efficiently creating and managing xR applications for many different xR platforms and settings based on this content.We proposed an approach based on declarative content specification and a linked-data based spatio-temporal information system that we believe to be particularly suited to address this second challenge.Further developing this approach by incrementally increasing the expressivity of the content specification language and designing visual tools for creating content specifications is one of the main goals for our future work.
(a) Panono 360° image.(b) Scene in Unity3D with the image used to texture the inside of a sphere surrounding the camera and the corresponding rendered left-right eye camera images on the right 2.2 Structure-from-Motion (SfM)

Figure 3 .
Figure 3. Dense point cloud model of the infrastructure of Thrihnukar volcano displayed in PhotoScan 3. XR SITE EXPERIENCES & THEIR EVALUATION Figure 4 shows three exemplary xR site experiences we created in our work for different xR platforms and including VR, AR, and mixed reality approaches.In the different projects we are integrating heterogeneous datasets (i.e., tabular data, 360° photos and videos) with 2D geospatial datasets and map visualizations and with 3D photorealistic models of real-word features to create geo-visual immersive learning platforms and analytical workbenches (Figure5).

Figure 4
Figure4(a) shows a VR site experience of the Santa Marta informal settlement in Rio, Brazil, that we developed for mobile devices in combination with the Google Cardboard as well as for the HTC Vive to conduct a study with students in a joint architecture and landscape architecture studio course(Oprean, 2018).The experience has been built entirely from 360° image and video material collected at the site.It uses two different views: (1) an overview map view (left side of Figure4(a)) showing locations of 360° images as points and 360° videos as polylines (the polylines are not visible in Figure4) on a 2D overview map of Santa Marta; and (2) the 360° view (right part of Figure4(a)) in which users are placed in the center of a 360° image or video and can immerse themselves into the respective scene.Users can select the points or polylines in the overview map to change to the corresponding 360° scene.In the 360° view, users can open additional elements such as a zoomed-in map or an informational text display, as well as use (b)  has been developed for the HTC Vive with the goal of allowing geoscientists and students to apply and practice real-world skills inside a virtual environment simulating the scientific workflow of geoscientific fieldwork.Leveraging high-resolution geoscience data from the Icelandic Thrihnukar volcano and SfM mapping, we created a high-fidelity VR environment in which users can apply measurement tools and visual approaches to detect geologic features(Zhao, 2017).Both a LiDAR model and a photorealistic model created via SfM (Figure3) of the inside of the volcano are presented as point clouds in the VR environment but using different visualization approaches: The point colors of the LiDAR model represent intensity values indicating rock types whereas the photorealistic model provides a direct representation of the volcano actual appearance with the color values extracted from the texture generated by the photo alignment process of the SfM mapping.We designed model transformation, information search, and geometric measurement tools for the application that are arranged in different virtual panels.Figure4(b) shows the interface for changing the way the volcano is visualized in the application.The project is not limited to the exploration of Trihnukar volcano but intended to be generalized and extended to other geologic entities and domains by establishing workflows to import data in different formats and from different open data portals into xR applications and by providing tool support that will allow geoscience students, teachers and non-expert researchers to create user-defined and customized xR content for teachinglearning and research purposes.3.3Mixed AR/VR Experience: The Penn State ObeliskThis project, illustrated in Figure4(c) and (d), leverages the Penn State Obelisk (Penn StateNews, n.d.), one of the oldest landmarks at the Pennsylvania State University campus, to provide an immersive experience of an artistic expression of the geological history of Pennsylvania.It is an ongoing project that offers an interactive immersive learning platform (specifically, for K-12 and introductory level geoscience students) in which the learning process can be enhanced through seamless navigation between dynamically linked 3D real-world objects, data (both spatial and non-spatial), and multimedia contents (e.g., 360° images).In the pure VR version for the HTC Vive, the user can physically walk around a 3D model of the Obelisk created via SfM methods.Upon gaining insights about the 3D object's physical characteristics (i.e., size, shape, color, and texture), the user can use the HTC Vive controller to select one of the stones making up the obelisk to see attribute information (both aspatial and spatial) retrieved from a database and have the origin of the stone displayed on a 2D map of Pennsylvania.Next, users can use the hand controller to point at a map location to experience the natural environment of the stone's origin through 360° imagery and video(Figure 4(d)).
Figure 4. (a) Mobile VR experience of an informal settlement from a study with (landscape) architecture students.(b) VR experience of the Icelandic volcano Thrihnukar for geoscientist including different measuring tools.(c) Marker-based AR prototype displaying historic information for the Penn State Obelisk.(d) 360° view in the AR prototype showing the natural environment that a stone in the Obelisk originated from

Figure 6 .
Figure 6.Sketch of our xR application building framework

otype rdfs:subClassOf* own:Model3D . ?artf rdf:typ ?atype; own:foundAt ?geomA ; own:fromTime ?timeA . ?atype rdfs:subClassOf* own:HistoricArtifact . own:CahalPech own:locatedAt ?geomCP . own:PostClassicalPeriod own:timePeriod ?timePCP . FILTER(strdf.distance(?geomA, ?geomCP) < 100 && strdf:during(?timeA,?timePCP)) }
Figure 7. Example of a semantic-spatio-temporal stSPAQRL queryQuerying the central information system can either happen during the application building process, for instance to create scenes with certain kinds of entities or models shown in certain ways, or during runtime.The second option allows for creating xR experiences that provide a high level of flexibility and configurability, for instance to provide researchers with the tools to freely explore all available data by choosing what they want to see and how.