ANALYSING RELATIONSHIPS BETWEEN URBAN LAND USE FRAGMENTATION METRICS AND SOCIO-ECONOMIC VARIABLES

Analysing urban regions is essential for their correct monitoring and planning. This is mainly accounted for the sharp increase of people living in urban areas, and consequently, the need to manage them. At the same time there has been a rise in the use of spatial and statistical datasets, such as the Urban Atlas, which offers high-resolution urban land use maps obtained from satellite imagery, and the Urban Audit, which provides statistics of European cities and their surroundings. In this study, we analyse the relations between urban fragmentation metrics derived from Land Use and Land Cover (LULC) data from the Urban Atlas dataset, and socio-economic data from the Urban Audit for the reference years 2006 and 2012. We conducted the analysis on a sample of sixty-eight Functional Urban Areas (FUAs). One-date and two-date based fragmentation indices were computed for each FUA, land use class and date. Correlation tests and principal component analysis were then applied to select the most representative indices. Finally, multiple regression models were tested to explore the prediction of socio-economic variables, using different combinations of land use metrics as explanatory variables, both at a given date and in a dynamic context. The outcomes show that demography, living conditions, labour, and transportation variables have a clear relation with the morphology of the FUAs. This methodology allows us to compare European FUAs in terms of the spatial distribution of the land use classes, their complexity, and their structural changes, as well as to preview and model different growth patterns and socio-economic indicators.


INTRODUCTION
Currently, in Europe nearly 75% of the population lives in urban areas (EEA, 2013a).Urban population and urban land growth are ongoing processes which are expected to continue increasing in the years to come (Ribeiro Barranco et al., 2014).The urbanization process has impacts beyond city boundaries, and managing urban and periurban areas is therefore now essential (EEA, 2013a).Land use planning is a political challenge needed to conciliate urban land use with environmental issues, in order to prevent impacts on quality of life (EEA, 2013b;Kompil et al., 2015).As European institutions and national and local authorities establish policies, knowing their effects in advance is becoming a major factor (Kompil et al., 2015).
Many recent studies have analysed urban spatial development in European cities (Uuemaa et al., 2013).Several official institutions have published reports that analyse in detail the current situation of landscape fragmentation, soil sealing and urban growth in Europe, and how to measure, understand and mitigate it (COM, 2012;Jaeger et al., 2011).The comparison of cities across European countries is gaining attention in recent studies, which are categorising cities according to their urban form and sprawl (Arribas-Bel et al., 2011;Schwarz, 2010), their development, and their population indicators (Haase et al., 2013;Ribeiro Barranco et al., 2014).These authors point out that cities are growing faster than their populations are increasing.Modelling urban development, economic and demographic changes are additional studies that attempt to predict future development scenarios, and also further understanding of the potential growth of cities (Kompil et al., 2015;Uuemaa et al., 2013;Wissen Hayek et al., 2015).

* Corresponding author
Land use fragmentation is a spatial segregation process which is basically derived from human and socio-economic activities.Its study has become essential to safeguard quality of life and prevent ecological impacts (Wei and Zhang, 2012).Urban fragmentation reflects the way in which urban areas spread towards rural areas (Angel et al., 2010), and its quantification reveals urban expansion processes.Landscape metrics and socioeconomic variables are widely used to measure urban configuration and growth (Arribas-Bel et al., 2011;Dewan et al., 2012;Jiao et al., 2015;Schwarz, 2010).Accordingly, the number of tools to compute such metrics is diverse (MacLean and Congalton, 2015).Only a few of them focus on urban studies and work with vector datasets, and they are required for the ongoing development of multi-temporal databases of land use and land cover (LULC) data in this format (Sapena and Ruiz, 2015a).
Copernicus is the European earth observation programme, which, by means of remote sensing techniques, provides reliable up-todate land information.The European Environment Agency (EEA) coordinates it on a local scale; its first implementation was the Urban Atlas (UA).UA provides two-date, detailed and harmonised LULC maps (scale 1:10,000) for large EU Functional Urban Areas (FUAs).UA data for the reference year 2006 are available for 305 FUAs with more than 100,000 inhabitants, while the UA 2012 consists of an update and extension of the former to include 697 FUAs with more than 50,000 inhabitants.The UA 2012 has been partially validated and many of its data are now accessible.It is expected to be completed by the end of 2016 (Copernicus, 2010).On the other hand, the Urban Audit project represents the statistical counterpart of the UA cartography (Eurostat, 2015).The National Statistical Institutes (NSI), the Directorate-General for Regional and Urban Policy (DG REGIO) and Eurostat provide the statistical data, on a voluntary basis, for many European cities and FUAs.The Urban Audit project includes a set of variables covering several aspects of quality of life (demography, housing, health, etc.) on different dates for ease of data comparison.The ultimate aim of this initiative is to contribute to improving the quality of urban life.These two multi-temporal datasets are considered as complementary, and are valuable instruments for monitoring urban planning policies across Europe.
Fragmentation is considered to be a powerful instrument to quantify the geographical efficiency of the urban expansion process.It is a multidimensional term capable of revealing urban spatial patterns (Inostroza et al., 2013;Irwin and Bockstael, 2007), thus enriching understanding of how and where urban areas are growing.
In this study, our objectives are: (1) to obtain an exhaustive set of urban fragmentation metrics derived from LULC data in two dates for a set of European FUAs; (2) to evaluate the relation between this set of metrics and socio-economic variables at a given date and their evolution at two different dates; and (3) to explore the potential of modelling the prediction of simple and composite socio-economic variables through multiple regression analyses.To this end, we summarised a massive computing of fragmentation metrics and statistics and then attempted to explain the behaviour of demographic and socio-economic variables by modelling them using a reduced set of metrics.

Datasets
As noted above, multi-temporal databases of LULC and statistical data make the study of urban fragmentation and monitoring more attainable.UA and Urban Audit datasets deliver data at city and FUA spatial levels on different dates.The FUA consists of the city and its commuting zone.In this study, we use FUA level data, since the border between urban and rural areas is becoming ambiguous, combined with the fact that periurban areas are being built faster than cities (EEA, 2013a).Considering this level is therefore more suitable for our goals.
The study was conducted for years 2006 and 2012 due to UA availability.In this period, some FUAs underwent boundary changes.For this reason, we use the UA 2006 revised version (from UA 2012 dataset), the boundaries of which have been revised and updated.However, these discrepancies need to be considered when matching the two datasets, since Urban Audit statistical data have not been updated to the new boundaries.Thus, depending on the data availability, a convenience survey was performed, since data were lacking for some countries and variables.Only FUAs with at least 85% of common area on both dates and available statistical data were considered.This survey was performed using the statistical unit datasets from GISCO  (Geographical Information System of the Commission).
Eventually, subject to these restrictions and limitations, the analysis was conducted on a sample of sixty-eight European FUAs from twelve different countries (Figure 1).
Once the data had been downloaded, the legend was adapted.The initial twenty and twenty-seven land uses from UA 2006 and UA 2012 (Meirich, 2008) were aggregated into the following nine classes: residential, which represents urban fabric; commercial, also covering industrial areas, public buildings, ports and airports; green areas representing urban areas of vegetation; leisure areas including sport facilities; roads; barren land meaning non-used, construction and mineral sites; and the remainder agricultural, forest, and water.

Land use fragmentation metrics
One-date and two-date urban fragmentation metrics were computed by the tool IndiFrag (Sapena andRuiz, 2015a, 2015b).This tool calculates a variety of indices to assess urban fragmentation from LULC data in vector format at three hierarchical levels: object, class and super-object.In this case, the complete set of metrics was applied for each date at FUA (superobject) and land use (class) levels.Then, the difference between the fragmentation indices (FI) of the two dates was obtained, henceforth referred to as the fragmentation index change (FIC), together with the set of two-date based multi-temporal indices (MI) included in IndiFrag, designed to highlight growth patterns.
We thus produced two types of metrics: uni-temporal and bitemporal (Figure 2).
The metrics collection is made up of twenty FI at FUA level, plus twenty more for each land use for the referenced years 2006 and 2012, their respective FIC and, separately, twelve MI for each class, plus one at FUA level.We attempted to summarise the initial two hundred uni-temporal metrics per date and three hundred and nine bi-temporal metrics per each FUA.

Selection of indices
Two statistical methods were applied to reduce the number of indices and the redundant information present in the initial set: correlation analysis and principal component analysis.The study was restricted to four urban land uses or classes, namely those most directly related with socio-economic variables: residential, commercial, green areas and leisure.

Correlation analysis:
Spearman's correlations between indices were analysed to detect redundant information.This statistical analysis has been widely used to discard highly correlated variables (Gong et al., 2013;Ren et al., 2013;Schwarz, 2010).Considering their level and date, one test per group of metrics was performed.Metrics with strong correlations were examined and only one metric per group of correlated indices was used for further analysis.

Principal component analysis (PCA):
after the first screening, a PCA was applied to reduce the dimensionality of data.Factor analysis and PCA are widely used to reduce a multitude of metrics to a meaningful subset, but also to create new synthetic indices by interpreting each component (Gong et al., 2013;Plexida et al., 2014;Schwarz, 2010).This method was used to select not only FI, FIC and MI metrics, but also to interpret components as new indicators derived from the original data.Its interpretation is not straightforward but each component represents different fragmentation aspects.By analysing the spatial distribution of the loadings of each metric in the space defined by the components with eigenvalues greater than one, metrics were clustered according to their pairs of component loadings (Figure 3), then one metric per cluster was selected.
After the statistical analyses, the selected indices were categorised into five semantic groups: area and perimeter, shape, aggregation, diversity, and multi-temporal (Table 1).A detailed description of these metrics can be found in Sapena andRuiz (2015a, 2015b Sapena andRuiz (2015a, 2015b).

Socio-economic variables
The socio-economic data were extracted from Urban Audit for the years 2006 and 2012.The number of variables collected was reduced to those available in at least twenty-five FUAs, in order to have enough observations for the statistical analysis.
Subsequently, to eliminate redundant information, a set of variables were selected that were representative of the different dimensions covered in the Urban Audit database (Table 2).We refer to these variables as URAU.Differences between them were calculated to show changes in quality of life variables in the studied period, noted as URAU changes (URAUC).

Modelling socio-economic variables
We performed an exploratory analysis of the prediction of the socio-economic variables based on urban land use fragmentation indices.Stepwise multiple regression was applied and final variable selection was based on the Akaike Information Criterion (AIC) and implemented as a script in R statistical software (R Team Core, 2015).The AIC balances model complexity, in terms of number of predictors included in the equation, with goodness of fit, as measured by residual sum of squares (Akaike, 1974).
The regression models based on one date were built for predicting socio-economic variables (URAU) from fragmentation metrics (FI) as predictors, and they were computed for 2006 and 2012 independently.In bi-temporal models, differences of each socioeconomic variable on the two dates (URAUC) were used as response variables in the prediction models, and the two-date based fragmentation metrics (IFC and MI) were used as predictors.

Index selection
A series of correlation tests were applied to the fragmentation metrics to determine collinearity and avoid duplication.An initial test was done for each group (FUA and each land use), and the most correlated indices were omitted (ρ > 0.9).Then, the complete metrics set was tested to detect the main relationships between different groups.This process was conducted equally in  1. Metrics with prefix "d" correspond to FIC, and the rest to MI.
(Figure 4).After the selection process, these indices explained different perspectives of urban fragmentation; they were not redundant, and were consequently employed as predictors of socio-economic variables.
The interpretation of the first four principal components (PC) from the FUA group (Table 3) revealed a semantic clustering of FUAs according to their spatial configuration.They seem to cluster by country, with the exception of the UK (Figure 5).The first four components (PC) with eigenvalues higher than one are shown, along with partial and cumulative variances (σ 2 ) represented.Values in bold correspond to the highest loadings of each component.Indices with * are included in the final metrics subset Thus, at FUA level, PC1 mainly represents diversity and evenness (DSHAN), but also the size of the objects (inversely TM, and COHE).An increase in diversity means a more homogeneously distributed land use configuration and therefore smaller mean size of objects (a comparable amount of urban surface to agricultural or forest surface would significantly decrease its mean, due to the reduced size of urban objects).PC2 describes the aggregation level of the FUA, but it is also influenced by size, since high values of DEP are found in large FUAs and, conversely, IFFA is lower in large FUAs with a greater number of objects.PC3 represents shape and size of the objects (RMPA), and as analysed in PC1, it is inversely related to diversity (DD).PC4 reveals the complexity shape of the objects (IF); the simplest objects are usually related to urban elements, while agricultural, forest, and water are more complex in shape.
In order to ascertain whether it is feasible to interpret socioeconomic variables through fragmentation metrics, and hence to explore its potential, correlation analyses were conducted revealing their workability, as shown in Figure 6.

Exploratory analysis of prediction models
Several models were generated to assess further relationships among urban fragmentation metrics and socio-economic variables.Table 4 shows those with a coefficient of determination (R²) higher than 0.5 (in some cases the adjusted R² might be lower).The number of observations ranges from twenty-five to sixty-five, depending on the number of FUAs with available data.
Values with around sixty observations are more robust, as they represent data from twelve different countries.
In general, models showed that demographic variables are highly associated with area-dependent metrics.For example, pop1 depends heavily (R 2 =0.90 and 0.94) on the number of residential objects and their size (Nob_R and TM_R), among others.These indices express the total amount of residential surface.When omitting them from the analysis, to avoid spurious relationships, other indices related to distances between objects and diversity are included (DEP, DSHAN), reaching an R 2 of around 0.75, although they are still scale influenced.Population density variables (popurb and popresi) are not affected by size indices, but instead by shape and configuration indices (RMPA, DEM_C, IF, etc.).Fragmentation metrics are not reliable predictors of the increase of population by themselves, showing that there may not be a straightforward relationship between changes in urban areas and population growth.Label names are as in Table 1 and Table 2.   1 and 2 for the label descriptions; labels beginning with "d" mean differences.
When interpreting living condition models, the observables are reduced to twenty-five FUAs in Germany, one in Belgium, one in Estonia, and three in Greece, with some variations, with the exception of ndwe.Descriptors of ndwe are almost the same as pop1 (their ρ = 0.99).However, when their respective differences are compared, a much lower correlation (ρ = 0.22) is obtained.This confirms the previous interpretation of inequality in urban area growth and population change.Otherwise, inc has few observables, which might be not robust, but to a certain extent it can be explained by the heterogeneity, medium size and density of green urban areas (DSHAN, TM, DO_G) in Germany.The relationship between median income and urban metrics, especially diversity and urban green objects, seems to be robust across periods and merits further investigation.It is particularly worth exploring whether this relation is due to the high presence of German FUAs or whether it holds more generally across Europe.In addition, growth of green urban areas (RC_G) seems to be related to the number of households with children (prchild).
Labour variable rates are more interesting than absolute values, considering than the latter are highly dependent on size and their changes are not reflected by fragmentation changes.While the models for unemployment do not have an obvious interpretation, it is interesting to note that activity seems positively related with urban diversity.A similar trend is found with median household income, signalling a relationship between urban diversity and standards of living.On the other hand, rate changes (rtact and rtunem) reveal a positive relationship with green areas and leisure facilities.
In relation to transport variables, models are quite robust across time, and urban morphology clearly bears some relation with commuting transport modes.While public and private transport depend, as expected, on object size (TM), distances from the city centre (DEP) and residential land (Nob), journeys to work on foot clearly depend on urban fragmentation.The relation between urban metrics and commuting to work should be further explored in future research.Furthermore, avlen tend to change with residential and commercial distance from the city centre and change in expansion intensity (dDEP_R, RC_C), similar to the case of the number of cars (prcar).
Some socio-economic variables had no significant correlation with fragmentation indices, and thus they were not analysed; examples include mortality and fertility rates, and level of education.In addition, some variables are not yet available, but they will soon appear in the Urban Audit database, such as basic amenity indicators or poverty related variables.This will make it more feasible to study standard composite socio-economic variables, related to quality of life and human development, using different combinations of land use metrics as explanatory variables.

CONCLUSION
The present paper is based on a methodology that allows us to compare European FUAs in terms of the spatial distribution of the land use classes, their complexity and structural changes, as well as to preview and model different growth patterns and socioeconomic indicators.This study may help policymakers with sustainable development policies at city level, since urban areas expand according to demographic demands and economic growth, and this research has revealed their relationship.
A massive computing of one-date and two-date based fragmentation metrics in sixty-eight European FUAs was summarised into an uncorrelated set by statistical methods.These methods were applied intra-group and inter-group, since high correlation of the same index between different land uses was detected.The PCA was employed as a variable selection method, but also to interpret each component, which may be seen as a potential method for clustering urban areas.Then, stepwise regression models were assessed, providing further explanation for the linear relationship between urban land use fragmentation metrics and socio-economic variables.Our results show that demography, living conditions, labour and transportation variables have a straightforward relationship with the morphology of FUAs, and also with their evolution at two different dates, even in a reduced period of time.When comparing absolute values, they might be influenced by sizerelated indices.This paper reveals the notable relation between morphology and FUA statistics.However, some aspects should be further analysed, such as median household income, or journeys to work, since there are still many missing values in the Urban Audit database.
This study also highlights the potential of remote sensing for LULC mapping, and their fragmentation metrics derived for modelling socio-economic variables.Thus, the interrelation of the Urban Atlas and Urban Audit datasets has been proved to have great potential to monitor urban areas across Europe.Future research may attempt to improve these models, and create standard composite variables related to quality of life and human development.

Figure 2 .
Figure 2. Step-by-step work flow from data collection to modelling.(S1) Urban Atlas and Urban Audit datasets are obtained from Copernicus and Eurostat databases; then (S2) an analysis is conducted to determine functional urban areas (FUA) with comparable boundaries and socio-economic data (URAU); (S3) using IndiFrag, metrics are computed for 2006 (FI 2006), 2012 (FI 2012), together with their differences (FIC) and multi-temporal indices (MI); (S4) the most correlated metrics are filtered out; and (S5) the uncorrelated indices are extracted; (S6) URAU variables based on the amount of data are selected and their differences are assessed (URAUC); finally, (S7) uni-temporal models (URAU 2006 with FI 2006, and URAU 2012 with FI 2012), and bi-temporal models (URAUC with FIC and MI) are conducted.

Figure 3 .
Figure 3. Spatial distribution of the principal component loadings of the FI 2006 at FUA level, after the correlation analysis.
wom75 Women per 100 men -aged 75 years and over depen Age dependency ratio oldepn Old age dependency ratio (65 and over) M imrt Infant mortality rate (per 1000 live births) LC ndwe Number of conventional dwellings inc Median disposable annual household income pr1h Prop. of 1-person households prpen Prop. of lone-pensioner households prchild Prop. of households with children aged 0-17 E dcarepop Prop. of children 0-4 in day care or school humcap Prop. of persons (aged 25-64) with ISCED level 5 or 6 as the highest level of education to work by car (%) jwpub Share of journeys to work by pub.transp.(%) jwfoot Share of journeys to work on foot (%) avtime Average time of journey to work (min) avlen Average length of journey to work by private car (km) prcar Number of registered cars per 1000 pop killac People killed in road accidents per 10000 popTable 2. Label and description of the final subset of socioeconomic variables (URAU), grouped as: demography (D); mortality (M); living conditions (LC); education (E); labour (L); Figure 4. Correlation coefficients among the final subset of bitemporal metrics after PCA selection, where the sub-index refers to each land use: commercial (C), green areas (G), leisure (L), and residential (R), and its absence means FUA level.Label names are as inTable 1. Metrics with prefix "d" correspond to FIC, and the rest to MI.

Figure 6 .
Figure 6.Correlation coefficients between the fragmentation metrics and the socio-economic variables from 2006.The subindex refers to each land use: commercial (C), green areas (G), leisure (L), residential (R), and its absence means FUA level.Label names are as in Table1 and Table 2.

Figure 5 .
Figure 5. Spatial distribution of FUAs according to the first and second principal components (PC1 and PC2).The abscissa axis divides FUAs based on diversity and uniformity features.The ordinate axis clusters FUAs according to their configuration and size.Country codes are as in Figure 1.
. Selected fragmentation metrics categorised in groups: area and perimeter (AP), shape (S), aggregation (A), diversity (D), and multi-temporal (MI), and their level of application in this study: FUA (f), and land use or class (lu).Further description is available in

Table 3 .
Results of principal component analysis (PCA) at FUA level.