APPROACH AND EVALUATION OF A MOBILE VIDEO-BASED AND LOCATION-BASED AUGMENTED REALITY PLATFORM FOR INFORMATION BROKERAGE

Providing mobile location-based information for pedestrians faces many challenges. On one hand the accuracy of localisation indoors and outdoors is restricted due to technical limitations of GPS and Beacons. Then again only a small display is available to display information as well as to develop a user interface. Plus, the software solution has to consider the hardware characteristics of mobile devices during the implementation process for aiming a performance with minimum latency. This paper describes our approach by including a combination of image tracking and GPS or Beacons to ensure orientation and precision of localisation. To communicate the information on Points of Interest (POIs), we decided to choose Augmented Reality (AR). For this concept of operations, we used besides the display also the acceleration and positions sensors as a user interface. This paper especially goes into detail on the optimization of the image tracking algorithms, the development of the video-based AR player for the Android platform and the evaluation of videos as an AR element in consideration of providing a good user experience. For setting up content for the POIs or even generate a tour we used and extended the Open Geospatial Consortium (OGC) standard Augmented Reality Markup Language (ARML).


INTRODUCTION 1.1 Specifics of mobile devices
Providing information on mobile devices differ significantly from desktop computers.Ruuska-Kalliokulju et. al. (2001) has listed the following aspects of differentiation: 1. Physical, social and cultural context of use influences the way of the interaction.2. Personalization of the mobile device is a key design issue.3. Quality and quantity of the applications and services differ.4. Devices of mobile information and communication technology become more specific to a certain task.
Trying to keep tasks transparent and to relieve the user, there is a growing demand of communication, transmission and synchronizing between the devices.
During the further procedure we consider the user will be en route as a pedestrian.Due to the physical state as well as the environment the user is exposed to various forms of distractions which affects his ability of concentration and receiving information.To simplify the reception of information and interaction a context-sensitive solution is aimed.In this paper we try to reach this aim by focusing on a video-based and location-based AR.According to Tönnis (2010) we divide AR into three categories; Tracking, interaction and presentation.

Tracking
One key aspect of AR is an accurate tracking of the user.GPS is necessary and used though it is limited to outdoors, its accuracy differs within several meters and the GPS signal does not provide any information regarding the orientation of the device if the device is not on the move.To provide navigational information indoors we used Beacons.For orientation and increasing the accuracy indoors and outdoors, we used additionally image tracking.Our approach matches the camera view of the mobile device with a database of already existing and prepared reference pictures.If the camera view matches with a reference picture an event is triggered.An AR element will be positioned and displayed on the camera view and started autonomously.The image tracking allows to position the AR element on its exact position.

Interaction
For interaction this approach uses the position of the user, the view of the camera of the mobile device and its sensors.For positioning and starting AR elements on the display a combination of localisation and image tracking is used.To switch between camera view and the radar view the build-in sensors are checked.If the user is holding the mobile device vertically the camera view is shown and the scanning process of the image tracking process is active.Then again if the user is holding the device horizontally the radar view is displayed.Alternatively, the user can handle the app by using a minimal user interface.

Presentation
Displaying text-heavy documents on small displays claim high cognitive requirements on the short term memory of the user.Jones et al. (1999) has shown in a study regarding the effects of small displays on retrieval tasks that there is a discrepancy of 50% between mobile devices and desktop computers in favour The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W1, 2016 International Conference on Geomatic and Geospatial Technology (GGT) 2016, 3-5 October 2016, Kuala Lumpur, Malaysia This contribution has been peer-reviewed.doi:10.5194/isprs-archives-XLII-4-W1-151-2016 of desktop computers.Therefore, we used the video format as the alternative option to the text format.
For editing videos on a desktop computer and displaying videos on an Android smartphone we developed a video editor.For each platform (Windows, Linux and macOS) a specific software prepares the files by cropping and resizing the videos.Further formats such as hyperlinks, websites, audios, haptic feedback or combinations of these has been implemented as AR elements but are not used mainly.

Preparing a tour
Our approach of building an AR platform consists of a website and an Android app.The website storytellAR.de is the main platform to create a POI or a sequence of POIs as a linear or non-linear tour.The user is able to set up a POIs, specify the GPS coordinates, upload the reference picture, define a beginning and final POI and optionally determine an order of the POIs.All the information will be saved in the AMRL format.The Android app is an AR browser which can request the ARML file, download the reference pictures and display the AR elements of the camera view of the mobile device.

TRACKING
For location-based AR our approach can be divided into three technical steps: 1. Localisation of the user.2. If the user reaches predefined POIs, events will be triggered.3. Virtual information will be communicated to the user via mobile device.
This approach uses a combination of image tracking and GPS or Beacons for the localisation of the user.

GPS
If there is GPS reception GPS is used.To increase accuracy and orientation of the localization image tracking will be applied additionally.Though the number of reference pictures is not restricted.For every matching process the camera view has to be compared with every reference picture.This process may decrease the performance with an increasing number of reference pictures.For that purpose, all reference pictures are geo-referenced.By reaching any POI only pictures of nearby POIs are used for the image tracking process.Therefore, less reference pictures are compared and latency time is decreased.

Beacons
When being indoors, where a GPS signal is absent or weak, a navigation system utilizing the Beacons technology is implemented.A Bluetooth Beacon is a small device that transmits radio signals in a maximum distance of 30 meters.
A basic type of indoor navigation using Bluetooth Beacons is proximity information.Events can be triggered in the application when the user enters or leaves the discovery zone of a beacon.We used a proximity locking approach in which we defined two zones for each Beacon, an entry zone and an exit zone.The entry zone is always smaller than the exit zone (e.g.entry zone = 2m, exit zone = 5m) in order to lock a user's position in the area he entered (e.g. a room).When a user enters the discovery zone of a Beacon the image tracking will be started equivalent to the GPS tracking.The user can browse freely the area of interest.When the user exits the exit zone we assume he left the area of interest and the image tracking process will stop.
Figure 1.Beacon entry and exit zones

Image Tracking
This approach of using AR is independent from markers or QR codes.Only reference pictures of predefined POIs are required.Alan (2013) called this practice of using only distinctive landmarks natural feature tracking (NFT).This enables AR also being used on heritage buildings, protected monuments and so forth.
In addition, the accuracy of tracking the user and placing the AR elements improved significantly via image tracking.By choosing the reference picture we know exactly where the user is standing and in which direction the smartphone is facing.
Hence the AR elements can be placed on an exact position predefined according to the reference picture.Thereby orientation and fidelity is added which a tracking using GPS or Beacons alone may not provide.

ORB algorithm:
The image tracking procedure consists of three steps.The first step is the image matching process.We used the ORB (Oriented FAST and Rotated BRIEF) algorithm of Rublee et. al. (2011) of the Open Source Computer Vision Library (Itseez, 2016).Image sections of the camera view which are called keypoints are randomly selected and matched continuously with the reference picture.The larger the match value, the smaller the difference between the pictures.
To increase the performance, the reference picture and the camera view are converted to a monochrome image and been resized to a low resolution (Dastageeri et. al., 2015).In addition, the low resolution decreases the error rate regarding the uniqueness of the keypoints.This is due to the fact that through the calculation of the keypoints more information can be taken into consideration if the resolution is small (Figure 2, left).
Figure 2. Excerpt of the camera view image with highlighted patch: left 8x8 pixel patch (400x300px), right 8x8 pixel patch (1600x1200px), (Kampa et. al., 2014) By using a high resolution, the image contains with the same patch size less information (Figure 2, right).To achieve the same results, the size of each patch has to be increased exponential (Figure 3).
Figure 3. 64x64 pixel patch (1600x1200px), (Kampa et. al., 2014) 2.3.2Filtering method: Though this comparison of the keypoints were not sufficient enough.The results showed that too many camera view images were assigned wrongly to the reference picture.Therefore, three filtering methods were added.The first method is a symmetric comparison.Only the keypoints that are found in the comparison of the camera view image and the reference picture but also vice versa are taken into consideration.Keypoints without this attribute were removed.The second filtering method examines the values of the ORB algorithm.Keypoints with a very low match value were eliminated.The third filtering method checks for outliers.
To identify outliers and delete them the RANSAC filter was being used (Fischler et. al., 1981).

Defining the number of keypoints:
The significant parameter for the result and performance of the image tracking is the number of keypoints.(Dastageeri et. al., 2015) 3. INTERACTION

User interface for mobile devices
The main difference between a mobile device and a desktop computer is independence of the geographical location (Franz, 2005) which has to be considered for the concept of operations.''(…) handheld devices are used by people on the go.Attention spans are limited, as the devices are brought into situations where they are secondary to the user's focus.Desktop computers receive dedicated focus, but handheld devices are given only fragmented bits of attention" (Weiss, 2005).Hence the user is distracted which affects the concentration, in particular the visual and mental attention (Duh et. al., 2006).
Thus complex user interface actions or permanent requests of user input make the interaction difficult and has to be avoided.
As opposed to the usage of desktop computer, users of mobile devices have also additionally only very limited or no influence to temperature, lightning conditions and loudness of their environment.Moreover, for mobile devices there are no standards regarding layout of the user interface or software in general.Indeed, operating system vendors recommend design guidelines (Google Inc., 2016, Apple Inc., 2016) but they are not obligatory and respectively not extensively applied by the developers.
Nonetheless the user interface should be in equal measure appealing and usable for user groups of all age classes as well as people with handicaps (Ruuska-Kalliokulju et. al., 2001).To cope to these requirements, we reduced the concept of operations to a minimum."The history of interface and interaction design is a path from complexity to simplicity (…)" (Valli, 2006).After starting the Android app, the user has to pick an ARML file.There are no further menu options or buttons on the start screen.

Scope of operations
The scan process starts immediately after choosing an ARML file.The lower the complexity of the menu structure the faster a task can be achieved successfully (Ziefle, 2002).The current concept is structured to show only three buttons in total in the main screen: 1. Menu button top left; Within the menu the user has a playlist of all provided videos and can start them manually 2. Radar bottom left; The radar shows the nearby POIs dependent from the user's position 3. Help top right; By pressing the help icon a tutorial video about handling the app can be started (only visible during scanning process, not visible if a video is played) The three icons are styled according to the minimalist user interface design genre flat design."Aesthetics and form are only decisive at first sight.Though at second sight functionality and ease of use are more important" (Kiljander, 2004).Therefore, the design as well as the scope of operations is reduced to a minimum.
The scan process keeps searching for matches of the camera view picture with reference pictures.After a successful image tracking an event can be triggered automatically.There is no need for further actions of the user.For testing our approach of the minimalist user interface we developed a mobile AR app together with the administration of the University of Applied Sciences Stuttgart for the exhibition of study paths at the 16 th of November 2015.During this annually event every bachelor programme was introduced through an oral presentation, information booth and banner.We decided to use the banner as a reference picture and prepared a video for every bachelor programme.By scanning the banner, a video was triggered describing the bachelor programme (Figure 4).
In addition, we set up an information booth for the AR app to demonstrate the app, discuss with the users and collect feedback.We also provided the app for the Android paltform at the booth and installed it on devices by request on site.To be able to watch the AR videos after the exhibition the user had also the option to start each video manually through the menu.
By reason of general interest on the part of the visitors as well as the administration of the university we decided to adapt the feedback and launch an updated version of the app for the next exhibition in November 2016.

Usability
Besides the buttons on the main screen, the user can interact just by the way of holding the mobile device.By holding the mobile device vertically, the camera view gets activated and the tracking starts.By holding the device horizontally, a radar is being showed.We wanted to allow many ways to achieve the same result.The user can decide which way he prefers.
"Usability is the measure of the quality of the user experience when interacting with something -whether a Web site, a traditional software application, or any other device the user can operate in some way or another."(Nielsen 1994 cited in Schweibenz et. al., 2003) The options of interactions are restricted though it is on purpose.We tried to find the 20% of functions to solve 80% of the features according to the 80/20 rule."For each application or feature set, it`s helpful to identify the 20% of the functions that will meet 80% of the users` task needs."(Mohageg et. al., 2000)

AUGMENTED REALITY
For creating an AR user experience there are several aspects that have to be taken into consideration.Azuma (1997) distinguishes AR by three characteristics: 1. Combination of reality and virtuality 2. Low-warpage interactivity 3. Recording in 3D According to Alan (2013) there are four key aspects regarding the combination of reality and virtuality.
1.The physical world will be enhanced through digital information by overlaying digital information on a view of the physical world.Especially the correct usage of AR is pointed up by Alan (2013).The selection of the AR element as well as its position has to be according to the situation.

AR elements
Considering our approach to be a general solution for information brokerage our application allows the usage of videos, pictures, texts, audios, websites, web links, vibration or a combination as an AR element.Considering the loudness outdoors subtitles can be added optionally.We chose the SubRip text (SRT) format which fulfils our requirements regarding such as allowing to define also the timespan of the text.

Video as an AR element
We chose the video format as our main AR element.Similar to city or museum guides a video of an actor appears giving information about the POIs.This concept is on the one hand familiar to the user then again the way is new.The user is more involved in the process as the AR app responds to its behaviour.If a topic is not interesting the user may change the orientation of the mobile device or keep his position if more information is desired.

System requirements
The requirement for showing video-based AR on mobile devices is dependent from the support of videos on textures for the respective platform.2013) is the support of various codecs obligatory such as VP8 and can therefore considered as given by the operating system.

Integration of AR video and sound
The AR video is positioned on the camera view of the mobile device.The position of the AR element is fixed on the environment.By changing the position, the AR element will remain on its position and if need be change its size.We used the "User Defined Targets"-process with extended tracking of Vuforia (Qualcomm, 2015).Though the Android MediaPlayer does not support videos with an alpha channel.
Therefore, we had to implemented our own approach.Wikitude did solve that issue by double the height of the video.In the upper portion the video is saved as a RGB-Video, in the lower portion the alpha value is presented by shades of grey.In this connection black stands for invisible and white for completely visible (Wikitude, 2015).Our implementation is similar though it is specific for the Android platform.The sound source can also be assigned to a location.By using headphones, the user will be able to locate the sound source of the stereo sound.

Editing of AR videos
There several editing steps necessary to achieve good results in the presentation of the videos on the mobile screen.It should be taken into consideration that the protagonist of the video will be displayed without a background and be placed on the camera view of the smartphone.Therefore, the background has to be eliminated in the preprocessing process.We used the blue screen and green screen technology.Both techniques are common for that purpose.Eventually every background color can be used during the shooting of the protagonist.The key aspect is that the background color differs as much as possible from the skin or hair color or the clothing of the protagonist.Furthermore, the shades of the protagonist should be avoided.
After the shooting a postproduction containing of five steps is required: 1. Cropping the protagonist by using the Chromakey effect 2. Fazing the outline of the protagonist 3. Increasing the brightness 4. Exporting the file using the ProRes4444 codec 5. Transferring the video format from .mov to .mp4 If desired further effects can be added as shown in (Figure 4.).
Figure 5. Steps of preparing a video for video-based AR (Dastageeri et. al., 2015) To automatize this workflow, the procedure was implemented on a webserver using the free software ffmpeg (Bellard, F., 2016).The main features of the web service were uploading the videos, choosing the background color for removal, executing all five steps of the above mentioned workflow and at last providing a download link for the videos.This solution was platform independent and no further software was needed (Dastageeri et. al., 2016).Though processing of large video files were time consuming.
In a second approach we omitted the upload and download process by implementing a stand-alone desktop software for the Windows, macOS and Linux platform.This increased the performance significantly and simplified the handling of the software.By showing previews of the outcome video the results can be seen beforehand without executing the workflow.By allowing to set up the parameters of the background color and the outline of the protagonist the results can be improved.After installing this tool there is no technical know-how or further specific software necessary to shoot and edit AR videos.Using this feature an evaluation was feasible and done by students of the degree program Business Psychology.

Evaluation of the AR videos
In the context of a semester project in the summer term 2016 of the degree program Business Psychology at the University of Applied Sciences in Stuttgart the concept of our approach was developed and evaluated.The semester project was coached by H. Dastageeri and M. Storz on the part of the technical aspects (Frommknecht et. al., 2016).

Length of the AR videos:
To define the length of a video it is crucial to define first the length in which the things one heard or saw remain in memory.Furthermore, the information should be conscious and available for a specific time.Regarding this issue there is the theory of awareness margin.This theory is saying that the content of read or listened texts without perception on purpose remain conscious if the text is no longer than a maximum of ten seconds or 40 syllables (University Paderborn, 2005).
Specific for videos another study (learn2use, 2016) has shown that viewers of online videos have the following characteristics: 1. 10% of viewers watched the video for only 10 seconds 2. 80% remain watching longer than 20 seconds 3. Less than 50% remain watching longer than 60 seconds Moreover, it is said that videos that got a total time of 30 seconds are watched by 85% of the viewers but with a total time of two minutes only 50% kept watching.
Specific for online marketing videos the Stuttgart Media University (2011) has shown that 18 seconds is the optimal length.Hornung (2016) differed in his study.He considered also the purpose of the video and summarized in his results two conclusions regarding the length of the video: 1.As short as possible 2. It depends on the video type Tutorial videos should have a length of 45 until a maximum of 90 seconds.This time span is sufficient to make a product interesting and to display some important features regarding it.
Commercial videos otherwise should have a length of 15 until a maximum of 59 seconds.
Frommknecht et.al. (2016) summarized in their work that for AR videos the aim is to create awareness and attention at first.So videos try to win over the consciousness and concentration of the user and especially rise curiosity.For this type of video, it is recommended to set the length from 30 to 50 seconds.Then the user can decide to watch further videos or not.Frommknecht et. al. (2016) considered these further videos as tutorial videos according to Hornung (2016) and recommended a length of 45 until 90 seconds.If the user shows further interest the length of the next video of the same POI should have a length of 90 until 120 seconds.

ARML use case
In the context of the research project SPIRIT of the University of Applied Sciences Stuttgart and the RheinMain University of Applied Sciences a prototype was implemented.The project aim is an interactive way of communicating information on the basis of mobile AR for and together with the reconstructed Roman fort and archaeological museum Saalburg.Its visitors will be guided through an edutainment experience by using our videobased and location-based AR app for the Android platform.In this respect the following aspects had to be taken into consideration.Saalburg belongs to the UNESCO's World Cultural Heritage, hence no markers can be placed and there is no or a very weak mobile network available.A further requirement is that the technical solution should be feasible for the general use for museums and exhibitions.
To fulfill the requirements adequately we had to separate each aspect very strictly.Therefore, we implemented a framework for mobile devices unattached from its content.That way the technical team can work and test independently from the content.Simultaneously the design team can generate, specify and test the content autonomously.We use OGC ARML 2.0 (Open Geospatial Consortium, 2015) as a link between the teams.The framework is implemented in Java using the Qualcomm Vuforia library and the free open-source libraries OpenCV and libGDX.We extended the framework with a XML Pull Parser and created an ARML file containing a feature including amongst others name, location, picture and video.The following ARML file has been created according to the ARML 2.0 XML grammar.
Source code 1: ARML 2.0 example If the user reaches a certain GPS location marked by the tag <pos> and the camera view of the smartphone matches with the reference picture "Picture of the HFT Building 1" the video "Main Entrance_2.mp4" will be played automatically.In doing so, the predefined video will overlay the real-time camera view on the smartphone display.Any further changes of the content can be done without recompiling the source code or installing the .apkfile of the Android app.Just by replacing the ARML file the new content can be set up.Thus the design team is much more flexible which increases the development process.
This approach enabled a test-driven development.Together with a classical author specific use cases were created for the Roman fort Saalburg.Based on the tests missing features of ARML were specified and then extended and tested iteratively.Currently the extended ARML parser handles 19 additional tags such as <activeArea> which defines the radius of <gml:Point>.

CONCLUSION
AR offers new ways of familiarizing someone with its nearby environment.It has the potential to communicate information in a multimedia-based way.Though it is crucial to use it adequately.Without an immediate feedback or drawbacks such as s long scanning process of the image tracking process the user experience is not achieved.Then it is particularly important where which information is placed.First the user needs a summarized information on POIs.If desired more information should be available.The AR element may start with a short teaser describing the main content of the POI and introducing the content that is also available and might be interesting regarding the POI.
Furthermore, the user interface has to consider situation of mobile users.Transferring the design guidelines or experience of desktop computers is not helpful.The user interface has to be adapted to the needs and demands of pedestrians.Especially the outdoor environment should be considered.
Future implementations may adapt the display brightness according to the environment by using the camera or the sound volume according to the background noise.The aim should be to automatize as much as possible and to relieve the user.
For an easy and fast exchange of content the ARML standard proved to be very helpful.For linear and non-linear tours an extension of the standard ARML might be beneficial.Based on our experience we aim to propose an extended ARML standard helping to design linear and non-linear tours in the future.

Figure 4 .
Figure 4. Demonstration of the approach during the exhibition of study paths at the University of Applied Sciences Stuttgart 2. The AR software finds its position autonomously in the real world where the digital information should have been shown.3. The information will be displayed according to the local position and orientation of the user in the physical world.4. AR is an interactive experience, a person can notice an information and then again change the information to the desired one.The level of interaction may vary between a simple change of the orientation up to influencing or even generating new information.Alan (2013) compiles AR consisting of six ingredients: 1

Table 1 .
Table1.shows an excerpt of the measured values with varied number of keypoints.For this time measurement the reference pictures had a resolution of 395x175 pixel and the camera view 400x300 pixel.We used for the test a Excerpt of the test results of the time measurement by increasing keypoints, The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4/W1, 2016International Conference on Geomatic and Geospatial Technology (GGT) 2016, 3-5 October 2016, Kuala Lumpur, Malaysia requirements are the support of OpenGL ES 2.0, sensors (compass, acceleration and positions sensors, Bluetooth and GPS receivers), camera and the necessary codecs for showing videos and playing audios.Indeed, the high amount of device fragmentation may lengthen the time to develop the software but problems only occurs in the areas of programming close to hardware level such as OpenGL or sporadic.In accordance to Android 4.4 Compatibility Definition Document (27.Nov. The necessary methods for this purpose are provided from Android 4.0.3(Min.Api Level 15).Further