CLASSIFICATION OF BUILDING FUNCTION USING AVAILABLE SOURCES OF VGI

This paper examines the feasibility of using data from OpenStreetMap (OSM), Facebook and Foursquare as a source of information on the function of buildings. Such information is rarely openly available and if available, would vary between cities by nomenclature, making comparisons between places difficult. Volunteered Geographic Information (VGI) including data from social media represents new potential sources of building function data that have not yet been exploited for this purpose. Using a part of the city of Milan as the study area, building data from OSM and points of interest (POIs) from OSM, Facebook and Foursquare were extracted to derive the building function. This resulted in the classification of building function for more than 80% of the buildings and demonstrated that both Facebook and Foursquare can complement the building function derived from OSM, helping to fill in missing gaps. This preliminary study has demonstrated the potential of this approach for deriving building function information from open data in a simple way yet still requires independent validation with alternative sources as well as extension to other areas that have different amounts of OSM and social media coverage.


INTRODUCTION
Information on the function of buildings, i.e. whether a building is commercial, residential, industrial, mixed use, etc., is sometimes recorded by local authorities or by national mapping agencies. However, such data are rarely openly available, yet they are incredibly valuable for a number of different applications ranging from the determination of energy demand (Caputo et al., 2013), evaluation of contributions to greenhouse gas emissions (Lucon et al., 2014) and as inputs to urban climate and energy balance modelling (Tornay et al., 2017). Furthermore, data on building function can be used in risk assessments of exposure, urban planning applications, smart cities development and the efficient allocation of resources.
In lieu of actual data from authoritative databases, we can turn to Volunteered Geographic Information (VGI) (Goodchild, 2007) as a potential source of building function data. OpenStreetMap (OSM), which is one of the most popular and successful examples of VGI (Mooney and Minghini, 2017), contains abundant information on buildings, including tags that can be used to derive the building function. OSM also contains Points of Interest (POIs), which can be used to complement the building information found in OSM. In addition to OSM, passive or ambient VGI from social media can be mined for information on building function. For example, Facebook has many sites devoted to places that contain a geographic location. Foursquare is a mobile application that connects people with geographic locations, including buildings that have specific functions such as a restaurant or cafe. Together these different sources can be integrated to produce a * Corresponding author building function data set, which has not been previously investigated. Moreover, such an approach could prove valuable in cities where databases on building function do not currently exist.
Hence this paper aims to demonstrate how data extracted from different sources of VGI can be combined to produce a data set on building function. This approach is applied to an area within the city of Milan. After brief descriptions of the main VGI sources used in this study, i.e. OSM, Facebook and Foursquare, the main workflow is outlined, from data extraction to integration into a single building function database. The results are then presented showing the contribution of each VGI source in mapping building function for the study area. Finally, limitations are outlined followed by plans for further work in this area.

STUDY AREA AND VGI DATA SETS
The area selected for the study is a 2km x 2km section corresponding to the city center of Milan, Italy (see Figure 1). The area is characterized by a high density of buildings, including commercial activities such as shops and company offices, accommodation, public buildings and government offices, churches, schools and universities, residential buildings and even a castle (Sforzesco Castle). To automatically classify the function of these buildings, three VGI sources were exploited: OSM, Facebook and Foursquare. The following subsections describe each of these sources in more detail.

OpenStreetMap
OpenStreetMap (OSM, https://www.openstreetmap.org) is the most popular VGI project to date. Started in 2004 as a crowdsourcing project for road mapping, its focus was then extended © OpenStreetMap contributors to any physical object located on the Earth's surface. As a result, OSM is currently the largest, most diverse, most complete and most up-to-date geospatial database of the world. Due to the richness of its content and the free and open access granted by its license, OSM is frequently studied by researchers and used heavily in cartographic, business, educational, governmental, humanitarian and leisure applications (Mooney and Minghini, 2017). The OSM database consists of vector features, i.e. geometric data types and related attributes. Geometric data types consist of nodes, which are used to describe objects that are points; ways, used to describe both linear and polygon objects; and relations, which describe relationships between two or more nodes, ways and/or other relations (https://wiki.openstreetmap.org/wiki/Elements). The attributes associated with the geometric data types are called tags and consist of the combination of a key (defining the object property) and a value (specifying the object value for that property). Each OSM object must have at least one tag, but there is no limit to the number of tags an object can have (https://wiki.openstreetmap.org/wiki/Tags). The OSM mapping guidelines, which list the tags to be used for any specific object, have been agreed upon over the years by the OSM community and are maintained on the Map Features wiki page (https://wiki.openstreetmap.org/wiki/Map Features). The evolution of these guidelines over time has been recently studied from a data quality perspective (Antoniou and Skopeliti, 2017). Statistics on the current usage of each OSM tag are provided by the popular Taginfo service (https://taginfo.openstreetmap.org).

Facebook
Facebook (http://facebook.com) is a digital platform that allows its users to communicate with each other and share information in multiple formats, including text, images and video, related to any topic of human activity. This massive social network has millions of users spread across all continents. Thus, it provides an enormous amount of information that can be used in multiple applications. Among the diversified uses of Facebook, it can be used to identify physical entities, such as buildings with some type of use. Facebook also provides the explicit location (with coordinates) of these entities as well as other useful information, such as functionality, services provided, profile of the users or the quality of the services. Facebook provides a Graph API (https://developers.facebook.com/docs/graph-api), which offers tools for searching and downloading data. This digital social network is organized as a graph: "nodes" are individual objects, such as a user, a photograph, a publication or a comment; and "arcs" are links associating a user with a publication, photograph or place. This makes the search for Facebook data challenging because it forces the use of stratified search strategies. For example, if the goal is to identify the user ratings of restaurant services located in a particular geographical area, the first step is to identify and locate each service. Only after that is it possible to establish, through a new survey, a link between those nodes and other elements with relevant information (e.g. publications or comments).

Foursquare
Launched in 2009, Foursquare (https://foursquare.com) is a mobile app offering location-based services to record consumer experiences and business solutions. Until 2014, it was primarily a social networking tool through which users could share their current location (e.g. a restaurant, a museum or a shop) with friends through the so-called "check-in" function. In 2014, the focus of the Foursquare app changed to become primarily a local search and discovery tool (https://en.wikipedia.org/wiki/Foursquare). Using a proprietary technology to detect the location of users, Foursquare allows them to search for places of interest in their surroundings and displays personalized recommendations based on the time of day, user check-in history, their tastes and previous ratings. Users can write tips, i.e. short messages about a location, which can also include a photograph. Tips can be liked and saved by any user. A user writing quality tips, i.e. with several likes and saves, is awarded with experience points, which makes their content more prominent. Foursquare allows users to specify their tastes, in particular food items, styles of cuisine or environmental aspects; these are matched to the available tips at surrounding venues to provide customized searches and recommend places that match the user tastes. Finally, Foursquare allows users to rate places of interest by answering questions. In addition to determining the popularity of a venue, ratings are also used to get complete information about a venue (e.g. whether a restaurant takes credit cards). Foursquare, which currently includes 105 million venues around the world, is one of the most studied examples of location-based services that have adopted a gamification approach (Frith, 2013).

METHODOLOGY
The target classes selected for building functions are the following: residential, commercial, industrial, educational, healthcare, cultural, entertainment, religious, accommodation, safety authorities, agricultural, civic amenities/government offices, transportation, non-governmental/non-profit, and military. The workflow adopted to associate one of these classes to the buildings in Milan first requires the extraction of the building data (i.e. polygons) from OSM, while the POIs (i.e. points) were extracted from OSM, Facebook and Foursquare. The OSM buildings and the OSM, Facebook and Foursquare POIs were then separately mapped onto the building function classification. Figure 2 shows the workflow used to associate each POI to the corresponding building, which requires checking whether the POI was inside a building or not; if it was outside but within a maximum distance (5m) from a building, it was assigned to the closest building. Finally, the classification results are computed for each single VGI source and for their combination. Each of these steps is discussed in more detail in the following subsections. Figure 2. Workflow used to assigned the POI function to the buildings

Extraction and classification of OSM buildings
As mentioned above, both buildings and POIs were extracted from OSM. The extraction was performed in July 2018 using the HOT Export Tool (https://export.hotosm.org/en/v3), which relies on a version of the OSM database updated every few minutes and allows users to perform customized downloads in terms of the object tags. The main OSM key defining the building function, which is attributed to the polygon outlining the building (a "way" in the OSM data model), is building, which can assume all the values listed in https://wiki.openstreetmap.org/wiki/Key:building. Some values of the building key do not specify any building function; the most frequently used key is yes, which simply defines the presence of a building and is adopted when the building is digitized from satellite imagery; this usually makes it impossible to define its function. Table 1 shows the values of the building key associated with the selected classes of building function.
In some cases, the tag building=yes is combined with another tag that can reveal its function, e.g. amenity=restaurant to indicate that the whole building is used as a restaurant. These additional tags, not included in Table 1 and mainly corresponding to the keys amenity, shop, tourism, office, historic and man made, were also considered for deriving the building function.

Extraction and classification of OSM POIs
The OSM POIs (consisting of nodes in the OSM data model), were also extracted in July 2018 using the HOT Export Tool.

Extraction and classification of Facebook POIs
The extraction of POIs from Facebook was accomplished using the "Place Search" endpoint of the Graph Facebook API and the GeoPandas Python Package. Given one area of interest (defined by a central point and a maximum distance from it in meters), the "Place Search" was used to retrieve the location of the Facebook pages of type Place and some of their attributes given by the API, including name, category list, about and description. To use the this tool, the search area needs to be a circle. As the study area is not circular, a buffer was defined around a point located in the center of the region under analysis, ensuring that the buffer contains the entire study area. Only one request was required to extract all available data within the search area limits. The data available in the extracted attributes were used to assign one of the target classes selected to associate building functions with each POI, using keywords. For example, store, shop or restaurant were assigned the class commercial while ballroom or club would The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4, 2018 ISPRS TC IV Mid-term Symposium "3D Spatial Information Science -The Engine of Change", 1-5 October 2018, Delft, The Netherlands This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-4-209-2018 | © Authors 2018. CC BY 4.0 License. be assigned to the class entertainment.

Extraction and classification of Foursquare POIs
The Foursquare POIs were also extracted in July 2018 using the Personal/Non-Commercial Places API provided for free by Foursquare (https://developer.foursquare.com/places-api). The "Search for Venues" request was used in order to retrieve the points. Due to the API limitations, a maximum of 50 POIs (i.e. Foursquare venues) could be returned in each request. Thus, given the bounding box of the study area, several requests were submitted for different locations inside the study area. The requests were densely allocated so that no venues was omitted. This process resulted in multiple entries for some venues, which had to be removed before the analysis. Although the API provides rich information for each venue, the focus here was limited to their geometry, name and classification. The latter can be derived from the content of the category field. Again, some values occurring in this field are not related to buildings and were thus excluded from the analysis, e.g. park, road and scenic lookout. A mapping between the values found in these fields was then made to the target classes for building functions selected in this study. The full mapping is not included here due to space limitations, but some examples are the classification of the categories restaurant, bar or shop as commercial and the classification of Art Museum to cultural or church to religious.

Association of POIs with buildings
To assign the POI characteristics to the building function classes, it is necessary to establish a relation between the POIs and the building polygons. This was done using the spatial location of the POIs. If a POI was located inside the polygon of a building, it was considered that its characteristics could be associated with the building. However, in many situations, volunteers do not place a POI inside the building polygon, but close to the building entrance instead; this highlights access to what the POI refers to, such as a particular shop. Therefore, a proximity analysis was additionally used to associate the POIs to the buildings. For the purposes of this study, if a POI is within a 5m buffer of a particular building (and not located inside any other building), the information from that POI was then associated with that building. Tests were made considering different distances. The 5m value was used in the analysis because it proved to be a good compromise between considering POIs that are, in fact, not associated with the building and discarding useful POIs that are further away from buildings.
This approach for associating POIs with the buildings will, in many cases, associate several POIs to one building. As these POIs may have different functions, an additional building function category was added to the analysis called multiple, which is assigned to buildings that are either associated with several POIs with different functions, or the POI functions are different from the function that is extracted from the building tag.

Extraction and classification of OSM buildings
In total, 2902 buildings were extracted for the area depicted in Figure 1. In a previous study, a comparison against the authoritative building data set showed that the OSM buildings located in the city center of Milan are characterized by a very high completeness and positional accuracy (Brovelli et al., 2016). The classification of these buildings into the selected target functions was performed according to the matching rules of Table 1 and is displayed in Figure 3. Using only the tags associated with the OSM building polygons, 1249 buildings (43% of the total) could be associated with a specific class of building function. The remaining 1653 buildings (57% of the total), represented in gray in Figure 3, remained unclassified. The fraction of classified buildings associated with the target functions are the following: residential (75%), commercial (15%), religious (4%), educational (2%), healthcare (1%), civic amenities/government offices (1%). Accommodation, cultural, industrial and transportation appear with percentages lower than 1%, while no buildings with entertainment, safety authorities, agricultural, non-governmental/nonprofit and military functions were found.  The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4, 2018 ISPRS TC IV Mid-term Symposium "3D Spatial Information Science -The Engine of Change", 1-5 October 2018, Delft, The Netherlands 4.2.1 OSM POIs: 2197 POIs were extracted from OSM using the methodology outlined in Section 3.2. The tags of 2193 POIs could be associated with a building function, while only 4 remained unclassified, corresponding to the tag historic=ruins, which describes the remains of buildings fallen into partial or complete disrepair. The percentage of classified POIs associated with the target functions are the following: commercial (86%), civic amenities/government offices (6%), accommodation (3%), cultural (3%) and educational (1%). The healthcare, entertainment, religious, safety authorities and non-governmental/nonprofit functions appear with percentages lower than 1%, while no POIs were found with residential, industrial, agricultural, transportation and military functions. From these, 90% were located inside buildings (see Table 2) and 6% within 5m of buildings. The remaining 4% were further away from buildings and were, therefore, not considered in this analysis.

Extraction and classification of POIs
Several of the extracted POIs, some of them with different classifications in terms of function, are located inside the same building. Figure 4 shows the number of POIs extracted from OSM located inside each building, while Figure 5 shows the number of different functions that can be associated with the building using the OSM POIs that are located inside the building. Even though some buildings have many POIs (with a maximum of 30 POIs inside one building), the number of different functions obtained from this source is in most cases only one, and only a few buildings have three different function categories.  Table 2). From these, only three could not be associated with a building function. One corresponds to the city center of Milan, another to a "Plaza" and the other to a "historical place". The other POIs that are associated with the target building functions considered here are split by commercial (50%), cultural (14%), entertainment (13%), religious (5%), non-governmental/non-profit (5%), educational (4%), civic amenities/government offices (4%) and the remaining functions Map data: © OpenStreetMap contributors 2%. From the extracted POIs, only 64% were associated with buildings (52% inside and 12% in the 5m vicinity) ( Table 2).

Foursquare POIs:
From Foursquare, 312 POIs were extracted (see Table 2). From these, 11% could not be associated with any building functions as they provided data about the location of plazas, gardens or parks. For the POIs with associated building functions, the great majority have the function commercial (85%), followed by accommodation (8%), cultural (5%) and entertainment (2%). All other functions were not found in the Foursquare POIs, except for religious, with only one POI (0.4%). Only 53% of the POIs extracted from Foursquare were inside buildings (29%) or in their 5m vicinity (24%), and therefore only these were used in this study. Figure 6 shows the building function classification obtained with the POIs extracted from OSM (inside and in the 5m proximity). It can be seen that many buildings (1019, corresponding to 35% of the buildings in the study area) have at least one function assigned. Figure 7 shows the building classification obtained with the POIs extracted from Facebook. It can be seen that only 32 buildings were classified (1% of the buildings in the study area). However, a few buildings that did not have an assigned function from OSM have now been assigned to one or multiple functions. Figure 8 shows the building classification obtained with the POIs extracted from Foursquare. These POIs allowed functions to be assigned to 163 buildings (6% of the buildings in the study area). It can be seen that multiple functions are associated with some of the buildings. Figure 9 shows all of the buildings in the study area that were assigned a function using either the OSM building tags or the POIs

Association of POIs with buildings
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4, 2018 ISPRS TC IV Mid-term Symposium "3D Spatial Information Science -The Engine of Change", 1-5 October 2018, Delft, The Netherlands  The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-4, 2018 ISPRS TC IV Mid-term Symposium "3D Spatial Information Science -The Engine of Change", 1-5 October 2018, Delft, The Netherlands

CONCLUSIONS
This study has shown how different sources of VGI including data derived from social media can be used to classify buildings according to their function. More than 80% of the buildings in the study area could be classified using a detailed set of building function classes. This information is rarely openly available yet has many potential applications.Such an approach provides a simple yet powerful way of extracting this information.
Most of the data about building function was extracted from OSM, either from the building tags or the POIs. However, the addition of data obtained from Facebook and Foursquare allowed functions to be assigned to some buildings that were not classified with OSM data, filling in some gaps and demonstrating the complementarity of these different data sources. In contrast to the OSM building tags, most POIs identified buildings with some kind of commercial activity. Moreover, POIs do not provide data on residential buildings so to obtain this type of function, OSM building tags are the only source of information. We also acknowledge that the building function characterization considers activities that can be associated with parts of a building, e.g. the ground floor, and does not currently allow the specification of different building functions per building section, e.g. commercial shops in the ground floor with residential locations above, as no information about the building height or the separation of use within the building is considered.
At present this approach has not been validated using an independent data source. In the future, an accuracy assessment will be undertaken using data from authoritative databases or other independent sources. The latter could include photographs from online repositories such as Panoramio and Flickr, Google StreetView, Mapillary or field visits. In addition to validation, we could also use these latter sources of information to complement or enrich the analysis, e.g. they could provide missing data on building function, thereby filling in missing gaps.
In the future we will also map the buildings with multiple functions to each of the functions actually assigned to it as well as assigning a degree of confidence to each building and function. This value would be based on the number of data sources that produced the classification relative to all data sources used. The more data sources that provide the same information, the higher the confidence. This could also help to determine if the building really has multiple functions or if a single function dominates.
Finally, we plan to apply this procedure to other cities in order to assess the usefulness of each source of data in other countries and regions that have different characteristics, e.g. different coverage and tagging behavior in OSM or greater use of social media in documenting places.