BEHAVIOUR CONTROL WITH AUGMENTED REALITY SYSTEMS FOR SHARED SPACES

Augmented Reality (AR) in a traffic context has mainly been used in navigation with path augmentation, focused around safely guiding the user with prior knowledge of the route and the destination. Other works are reported to warn drivers by visualizing other traffic participants or dangers, which are yet currently out of sight. However they do not cover aspects of mediating control by recommending users with actions, even when such efforts are expected to foster collaboration in a multiagent environment. To the best of our knowledge, AR has not yet been applied to visualize virtual control information, e.g. virtual lanes or signposts, notably in the context of shared spaces. Such an environment should support spatial understanding of proximate participants with adaptive augmented controls to recommend actions to each user. However when such systems work in context where a conflict of interest would arise, a rule based control logic centered on priority should be accounted for. Traditionally, these rules are defined by traffic management. This paper presents a Behaviour Control with AR (BCAR) Systems based framework for control of user behaviour in a shared space via augmentation and proposes how a control logic can be part of it. The framework which incorporates navigation focuses on mapping users from real to the virtual world .This framework also enables simulations and visualization of multiagent interactions and proposing controls for user actions leveraging the environment complexity reduction achieved through the real to virtual transfer. A prototype implementation of the proposed framework with ARCore and unity3D has been evaluated for pedestrian behaviour control to understand its feasibility.


INTRODUCTION
Information about pedestrian movement in cities is relevant not only for urban planners or government officials but also retailers, advertising agents and those who are involved in management of urban space. A pedestrian would want to move in a most convenient way, trying to minimize delays when having to avoid obstacles and other pedestrians and intends to take an optimal path and to walk with the adequate velocity allowing to reach a destination at a certain time. The optimal behaviour for a given situation can be derived by plausibility considerations. However, the pedestrian normally does not think about these optimal behaviours, but have automatically learned them by trial and error to use the most successful behavioral strategy, when being confronted with standard situations. These strategies are also different in different cultures.
Modelling pedestrian behaviour has its own challenges (Helbing, 1991). Firstly, considering behaviours modelled to suit specific environments, a pedestrian model may find itself in a nonstandard situation. Secondly, it probably has not learned the optimal strategy yet. Thirdly, sometimes emotions or other reasons may lead to a sub-optimal behaviour concerning its movement. Every behaviour shows a certain degree of imperfection or irregularity and these reasons lead to deviations from the optimal behaviour modelled.
The environment can have an impact on user behaviour, and dynamic spaces like shared spaces can be even more challenging. In shared space designs, the segregation between motorized and non-motorized traffic is removed, creating an integrated space * Corresponding author without traffic signs or signals, curbs and road markings (see Figure 1). Instead, traffic flows are controlled by social interactions and supported by infrastructure measures like colored road surfaces and the thoughtful placement of road furniture. Due to this lack of legally binding elements like pedestrian crossings and signs, people are said to be more safety-conscious and to pay more attention to the behaviour of other traffic participants (Hamilton-Baillie, 2008).
However, shared spaces also have been criticised for many reasons. The elderly and disabled people feel less safe, attributed to the lack of vertical separation between pedestrian and vehicle movement regions. As all traffic participants are expected to use the same shared surfaces, disabled people have stated that they are not confident navigating such a space (Thomas, 2008).
Shared space advocates have suggested that speeds in most shared spaces should be around 30 km/h. This is due to the fact that at speeds above 32 km/h deaths and serious injury becomes much more likely. Hamilton-Baillie and Jones (2005) have used evolutionary biology to explain the significance of this speed emphasizing that 30 km/h is approximately the maximum speed of a running human being, the speed our body has evolved to adapt to. Even though shared spaces have appeared in multiple configurations, the primary focus has been to keep the driving speeds significantly low.
Such design changes with speed limits are more reliable to be useful in countries where the share of pedestrians and cyclist are comparable to other vehicle users. But for other countries and scenarios (Shearer, 2011) where the demographics are more inclined towards motor transport due to social or economic reasons (Zealand, 2003), low limits would not be an encouraged idea. However, cyclists and pedestrians will not use a road if they perceive it to be unsafe because of a high speed limit. In such scenarios it would be hard to do justice to them as it would cost too much to provide the necessary infrastructure for pedestrians and cyclists. Then there are also the legal aspects in terms of changes to existing pedestrian traffic rules and laws to facilitate this in a convenient and safe manner.
Another concern is on the interaction between cyclists and pedestrians within the shared space. The primary focus of cyclist lobby groups have been around separated cyclist facilities since the invention of motorised traffic, so the suggestion that separated facilities may not always be the best option is sometimes not welcomed. The fears cyclists have concerning the removal of separated cycle lanes may be unfounded; with evidence showing that most cycle crashes do not involve automobiles, and those that do, occur largely at intersections where effective separation infrastructure is difficult to provide and largely ineffective (Koorey, 2005). People fall off or hit objects for various reasons. In addition, cyclists also have many crashes on paths with pedestrians too.
Few of these concerns have been instrumental towards research for visual aids for pedestrians as priority road users with Augmented Reality to safeguard them from accidents and introducing an increasing sense of control and security.
The goal of the research described in this paper is to investigate the potential of AR to realize a virtual infrastructure by superimposing virtual graphics onto current existing environment in real time. In this way, it could be used to realize adaptive virtual control signals for the traffic participants to have right of way in high vehicular traffic flow. This calls for systems similar to traffic signal infrastructure to realize virtual signal infrastructure in shared spaces. We also foresee that by its nature AR can be used for virtual information overlay, and thus can facilitate for virtual lane marking between pedestrians, cyclists and car drivers in future.
By its design AR ties closely with the real world -a unique feature that other applications such as virtual reality do not have (Zhang et al., 2018). As a result, its users viewing overlapped scenes are in the close proximity and can communicate locally. This is in alignment with the concepts of shared spaces where prime focus has been around building social spaces for more interactions. In this work, the possible use of Augmented reality is proposed with the BCAR systems of how it could be used to introduce controls for supporting road users to collaboratively achieve common goals (e. g. crossing an intersections, increasing safety). In the paper the components of the framework are described and a prototype of the framework is realized, based on available software tools like ARCore and Unity3D.

RELATED WORK
Modelling shared spaces have always been a difficult task and ad hoc rules for control have been part of the simulation efforts to avoid conflicting scenarios between the different traffic participants. Gibb (2015) modelled pedestrian-motorist interaction in shared spaces using VISSIM micro-simulation software by substituting cars with groups of moving pedestrians (called dummy pedestrians), and introducing specific priority rules for conflict areas between road users. Another work (Anvari et al., 2015) modelled potential encounters between a vehicle and pedestrian and resolved them with conflict avoidance strategies included speed change, steering change or a combination of both. Two other research projects have dealt with shared space modeling, namely the research at the Imperial College in London (Anvari et al., 2015) and the research project MixME carried out in Austria (Schönauer et al., 2012). Even though not much work has been done on the realm of augmentation in shared spaces to the knowledge of the author, Augmented reality has had relevance on multiple fronts to realize shared space control.
Augmented reality heads up display (HUD) have been instrumental in control actions by safely guiding a vehicle driver to yield to pedestrians by determining a turn lane based upon proximity to a vehicle (Beckwith and Ng-Thow-Hing(2015)), the system detects the presence or absence of one or more pedestrians entering or present in the turn lane and also determines a crosswalk path across the turn lane. Some other works have shown (Kim et al., 2016) that spatial information provided in the form of virtual shadow of pedestrians on the HUD resulted in not only better driver performance but also smoother braking behaviour.
To achieve the objectives of spatial understanding and to recommend control actions using adaptive visual systems, multiple spheres of AR have been explored.

Tracking and positioning for AR application
Most AR positioning and tracking technologies were borrowed from autonomous robotics and computer vision. Vision-based systems for tracking using markers or natural features generally perform well in scenarios with slow camera motion (Nöll et al., 2011). However, in situations where the image quality is compromised, for example during fast camera movements that cause blurring or during sudden illumination changes, pure visual tracking systems tend to fail. On the other hand, pose tracking using inertial sensors (accelerometers and gyroscopes) is more suitable for following fast motion since the sensors can operate at a much higher frequency, but usually provide biased measurements with high noise levels. For this reason, there has been a lot of research on sensor fusion pose tracking systems attempting to combine measurements from visual trackers and inertial sensors in order to achieve more robust tracking (Schön and Gustafsson (2005), (Bleser et al., 2006), (Bleser, 2009)). Rambach et al. (2016) proposed an approach to sensor fusion using a deep learning method to learn the relation between camera poses and inertial sensor measurements . A long short-term memory model (LSTM) is trained to provide an estimate of the current pose based on previous poses and inertial measurements. This estimate is then appropriately combined with the output of a visual tracking system using a linear Kalman Filter to provide a robust final pose estimate.
Outdoor Augmented Reality typically requires tracking in unprepared environments. Schall et al. (2009) designed and developed a hardware tracking module using Differential GPS (DGPS) or Real-Time Kinematic (RTK) based GPS . Orientation estimates are implemented with a second Kalman filter that is able to estimate the orientation, velocities, accelerations and sensor biases by processing measurements obtained from gyroscopes (angular velocities), accelerometers (linear accelerations) and magnetometers (magnetic field). For compensation of the drift of the inertial sensor and of the magnetic deviation effect induced by electro-magnetic influences, the system additionally apply a drift-free, deviation-free visual tracker that allows for online learning of natural features.

Navigation with AR applications
The literature around navigation has focused along multiple approaches. Collaborative navigation and information browsing tasks have used differential GPS based systems in an urban environment (Reitmayr and Schmalstieg (2004)). Among the different classes of navigation, the map-based navigation methods require a global map of the environment to make decisions for navigation (eg. Borenstein and Koren (1989), Borenstein et al. (1991), Oriolo et al. (1995)). Another class of navigation methods reconstructs a map on the fly and uses it for navigation (Dayoub et al. (2013), Sim and Little (2006), Wooden (2006)).
Project Tango Tablet with no extra sensor have shown to be instrumental in navigation with the on-board depth sensor to support area learning, (Li et al., 2018) but with a high computational power burden. Assistive applications have used a combination of CAD maps and path planning algorithms to achieve the objective (Zhang et al., 2019). The use of AR frameworks for navigation has found itself useful in robotics Corotan (Irgen-Gioro) where the authors have used it as an all-in-one solution to indoor routing, localization, and object detection.

Adaptive Augmented Systems
Adaptive AR involves modifying the visual holograms to match the context. A system which dynamically modify visual attributes of each annotation in AR display system, such as color or line-style of the frame, based on its distance to the user or occlusion relationship between it and the real environment to facilitate perception of annotations location and spatial relationship was studied (Uratani et al., 2005). Absolute depth in this study refers to Euclidean distance, in the Z direction, between the user and the annotation. However the study lacked a visualization framework, through building a rendering engine that is aware of the real environment so that the visual attributes of annotations can be adjusted dynamically. Ghouaiel et al. (2014) developed an AR application that adapts to the distance to a target object in the scene . The user stands in front of the Basque museum and the AR application overlies a digital sign on the top of the museums facade. As the user steps away from the museum, the application increases the size of the digital sign and vice versa. The size variation of the virtual sign indicates to the user if he is close or far away from the museum. Magnitude of the translation vector returned by the tracking algorithm was used to compute the Euclidean distance to a target object (e.g., a house). Furthermore, their system adapts to the brightness of the virtual scene according to measures of the illumination of the physical environment (as measured through an ambient light sensor on a smartphone) and to ambient noise.

Control Systems in AR
Some of the early works on the use of rule based systems focused on learning in assembling domain, Wiedenmaier et al. (2003) did not use expert systems, but they compared AR with paper instructions and expert guidance for a typical industrial assembling task and discovered that the assembling was completed in shortest time when the user was guided by an expert, followed by the use of AR support and in last place paper instructions. Dynamic operator instructions have been used (Syberfeldt et al., 2016) utilizing the concepts of workbooks, worksheets, and tables to build repositories for control actions and rendering AR objects parsing the rules in an if-then manner. An e-learning platform (Martínez et al., 2011) with a virtual tutor was designed whose behaviour evolve and adapt according to the actions obtained from the user. The system was based on natural language rules (fuzzy rules).
However much focus on rules and policies have been worked on improving the visual clutter caused by multiple AR augmentations presented simultaneously to the user. Use of reinforcement Learning to formulate a set of policies have been successfully demonstrated in building a system which adapts output based on prerequisites (DeChicchis et al., 2019). However for the simplicity of our demonstration, we propose to build a priority based control system, where one of the participants is given priority in whose favour the decision is made, even when the system communicates control via augmentation to all users.

DESIGN OF BCAR SYSTEMS
The BCAR Systems (see Figure 2) consists of multiple clients with pose tracking and augmentation capabilities communicating with a server hosting a virtual environment of the real world space. Each client updates its position, navigation waypoints and destination via the network communication interface. The virtual world on the server would mimic the real world as experienced in the realm of virtual reality. This virtual world spawns and updates 3D virtual avatars every time a new client connects to the system. The server observers interactions between the connected 3D client avatars after considering their location, navigation path information and destination and looks upon possible conflicts arising from present and possible future state predictions. To resolve any conflicts between participants, it looks upon a control engine to infer whether the client is to be given priority and arrive at priority user control actions. The control engine would replicate a set of rules or conditions that has been learned to mediate for conflict resolution from predefined conflict resolution models. These control actions are communicated to the client augmentation layer to manipulate the virtual control objects viewed through the visualization medium in real time. Hololens based see through HMD could be an example of a visualization medium to support this functionality in a free and hands free manner.
The mapped virtual world on the server is supported by a visualization interface to view the impact of the proposed control actions on participants and their acceptance of the system proposed controls. It would also serve as a feedback medium to improve control recommendations for all users to improve the system. This is synonymous to the present day traffic manager viewing traffic control data to understand what went wrong and what should be done to avoid future failures. This visual interface would also cater to support traffic control and monitoring. When applied in the context of shared spaces such a system would prove beneficial to a disabled pedestrian willing to cross the space. Another similar use case for an indoor environment is a fire evacuation procedure with persons at the far end of the floor spaces given priority over others to avoid for bottlenecks at exits (Yi-Fan et al., 2011).

Figure 2: System Overview of BCAR Systems
The working logic for BCAR system for a connected client is explained in Figure 3. As our approach supports real-time loca-tion updates for multiple users with the need to identify participants with common destination points, an accurate referenced model of the environment is first created. A scaled map of the environment is imported into the framework. This could be any model of the environment (ground plan, CAD model or point cloud) and the scale map convertor is initialized with the scaling parameters.
For the current prototype framework we assume that the user is aware of his initial position on the world map. Once the client is initialized, the client communication manager communicates with the BCARS server and exchanges a unique id for each user. This id is used to reference the user to his virtual counterpart. The spawn manager spawns the real world user avatar at his defined location and is responsible for the life cycle of the avatar. Every tracking update from the client user tracking will update his avatar position. The scale map convertor module would convert all real world information to virtual world coordinates for any client position update. The input module on the client would handle navigation path planning and pass this information to the intention module. The intention inference module will overlay navigation way point information to assist the system understand expected client intention. As the system is aware of the intention of the user and his spatial location relative to other users sharing a common intention (destination), real multi-user environments are remodelled to multiuser controlled virtual environment. A control engine for mapping expected actions for possible conflicting use-cases in the interaction of priority users with other users is available in the framework. The scene assessment module would communicate with the control engine in such conflicting scenarios to propose expected user actions. The Control object manipulation module would overlay augmented control signals adapted for the scene under observation to all users part of the assessment and pass this information for client visualization.
As the system focuses around resolving conflicts in future point of time, an assumption is made that all participants would be using the client terminal to connect to the system. However in the in the absence of such a terminal, participants would be tracked and the rules could be projected using central projection systems. The system also does account for the violations of the augmented rules by resolving violations in a socially acceptable manner.

IMPLEMENTATION OF BCAR SYSTEMS
We prototype a client server based AR application with client capabilities for tracking and adaptive augmentation based on adhoc rules (see Figure 4(a)). As this is our first study, we have focused on the feasibility of the system and hence donot consider the scalability aspects. It also does not cover all possible dangerous interactions usually expected to be covered in real traffic encounters. We also donot demonstrate how the system handles the violations of rules even when a visual observation of the virtual world visualisation would partially account for the same. ARCore used for tracking is built with and supported by a Unity multiplayer environment for the shared multi-user experience.
As an initial experiment for the proof of concept for our system, we have chosen an indoor scenario to create an intersection of two participants and show how the control signals can be recommended to all users. Smartphones are used for tracking and to augment controls in the demonstration The scenario simulates two pedestrians, whose trajectories will eventually collide, if no regulation is adopted. The system has to • Identify possible conflicts by analyzing the paths of traffic participants in a certain environment • Provide a solution in terms of a regulation (giving priority for one participant) • Visualize the situation to the users, including their paths, the possible conflict and the imposed prioritization rule.
This seemingly simple example can easily be generalized to complex scenarios, where the necessity of such a regulation becomes more evident, e.g. several cyclists may be informed to form a group so that they will be allocated a virtual cycle track; this groups interaction with neighboring or potentially crossing pedestrians will also be monitored, organized and communicated via AR.

ARCore for tracking and navigation
ARCore is an augmented reality framework for smartphones with android operating system. It includes features such as motion tracking, environment understanding, and light estimation providing developers information that can be used for numerous navigation tasks. It is an advanced substitute of the deprecated Project Tango. Without the extra depth sensor, an ARCorepowered cell phone is able to track its pose and build a map of the surroundings in real time using Visual-inertial odometry (VIO).
The framework returns the pose of the physical camera in world space for the latest frame. This is an OpenGL camera pose with +X pointing right, +Y pointing right up, and -Z pointing in the direction the camera is looking, with "right" and "up" being relative to the image readout in the usual left-to-right, top-tobottom order. Specifically, this is the camera pose at the center of exposure of the center row of the image.
An indoor map of our office space is scaled in Unity and used as a World map for the scene. Using the camera pose information from ARCore, we can track user movements in the world space over time relative to where the tracking initially began. The device translation from frame to frame can be retrieved, then properly scaled and used to track movement on the building floor plan in the map view.Roberto Lopez Mendez ARCore SLAM is applied as the base for visual odometry.

Unity for Multiplayer Experience
Unity supports multi-user experience having support for both High Level API and Network Transport API for advanced multiplayer games. However for the connect scope of our work we look at a high level implementation with each user joining the multiplayer environment in Unity.
The scaled map of the environment in unity space has user avatars positioned at fixed positions.This is based on the assumption that the user already knows his start location. Each user launches the application and walks with the smart phone camera enabled and guided by the AR system( Figure 4b). As the user moves guided by the system, user movements in the real world are replicated in the virtual world by moving avatars with the live pose data supported from ARCore. As the users move in the physical world, their virtual avatars translate in the virtual space with no real world boundaries, providing a test bench for spatial relationship of objects of interest. The raycasting technique discussed in simulation approach is implemented as a rule implementation to signal users of possible collisions. Two ARcore supported smartphone devices (OnePlus 3T and Redme Note7) were installed with the developed application and connected to the server. Two users were instructed to exit an intersection guided by the navigation system after launching the application at predefined positions. The user approaching the intersection further away from the (see Figure 6) destination was assigned the priority participant (P2) and other was assigned a lower priority(P1).The user screen on the smartphone is provided with a navigation view, virtual world view and a virtual 3D traffic light to enable control signal for user action.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLIII-B4-2020, 2020 XXIV ISPRS Congress (2020 edition) Multiple views are showcased herein to demonstrate the technical implementation of system.
The first participant P1 would start ( Figure 5 (a)) to move towards the destination, during the same time the P2 ( Figure 5 (b)) would start to walk independently heading for his destination. The system continuously monitors the immediate vicinity of the avatars by raycasting with a predefined radius. There interactions are captured from the virtual world world view (Figure 5 (c),(d)). Once P1 approaches P2 at the intersection, the control logic senses the P2 participant to have approached P1 and signals P2 to proceed with control instructions for P1 to stop giving way to avoid conflict. The augmented traffic light in red is visualized for P1 to stop ( Figure 5 (e)), giving way for P1 who sees a green signal ( Figure 5 (f))

RESULTS AND CONCLUSION
The prototype system was implemented and was able to conduct the desired experiments. Thus, the proof of concept is achieved. However, the system witnessed unexpected behaviour at multiple instances. During the experimental phase, a significant drift of the position of the participants at random points was noted to affect the accuracy of the positioning system and hence spatial information in the VR world. In the absence of accurate spatial information, multiple false control signalling were noted. The system performance on the device considering memory and CPU usage was found to be satisfactory.

OUTLOOK ON FUTURE WORK
The framework proposed provides an environment to view the interactions of the participants involved. However accurate spatial information is key to realizing the general objective. Thus several improvements are planned for the near future. A basic prerequisite is the exact position of the users, which we intend to tackle based on mapping environmental features (see Schlichting and Feuerhake (2018)). This requires a map with 3D features from the environment. As shared spaces are centered along complex and multi user interactions, the feasibility of the system to account for all the participants have to be studied (pedestrians, vehicles and road users). In order to define adequate rules, the scene has to be understood by the system; this involves mechanism to identify potential groups of users (e.g. Cheng and Sester (2018)), who could be treated together and potentially could get priority in certain situations (e.g. create virtual platoons). Finally, it has to be studied, which visual cues are intuitively understood by people, so that they can easily follow them.