Semantic‐based urban growth prediction

Urban growth is a spatial process which has a significant impact on the earth’s environment. Research on predicting this complex process makes it therefore especially fruitful for decision‐making on a global scale, as it enables the introduction of more sustainable urban development. This article presents a novel method of urban growth prediction. The method utilizes geospatial semantics in order to predict urban growth for a set of random areas in Europe. For this purpose, a feature space representing geospatial configurations was introduced which embeds semantic information. Data in this feature space was then used to perform deep learning, which ultimately enables the prediction of urban growth with high accuracy. The final results reveal that geospatial semantics hold great potential for spatial prediction tasks.

further spatial phenomena (e.g., land cover classes) by using existing semantic knowledge. Therefore, the methodology enables support for decision-making processes in urban planning and management.
The article is structured as follows. Section 2 outlines the related work on urban growth prediction. Therefore, the applied machine learning algorithms are analyzed as well as the data they rely on. Section 2 identifies a research gap and defines the contribution and motivation of this work in more detail. Section 3 describes the methodology. It therefore delineates how geospatial semantics were used to train an MLP in order to predict urban growth. Section 4 presents the results for the urban growth predictions along with an accuracy assessment and the corresponding analyses. Section 5 discusses the findings and the potential of the proposed method. It examines the potential of geospatial semantics as an information source and its advancements for spatial prediction tasks. In particular, it discusses in detail the amount of available geospatial semantics on the prediction accuracy is discussed. Finally, Section 6 provides a conclusion as well as an outlook for the range of additional applications for which the proposed method can be utilized, as well as novel research directions where our findings might be useful.

| REL ATED WORK
In the last two decades, several methods have been developed and tested for the detection and prediction of land cover changes and urban growth within the complex nature of urbanization. The foundations of urban growth as a part of the urbanization process build upon the city's geospatial characteristics as well as upon complex institutional and economic agents at local as well as national scales (Batty, 2008). According to Black and Henderson (1999), without these agents, cities would tend to grow larger and become fewer in total. Batty (2013) shows that a city which is growing in size impacts economic agglomerations negatively and positively and generates new socioeconomic attributes. To support effective urban management processes, a series of studies investigate several machine learning methods for predicting land cover change in general, as well as urban growth specifically, such as cellular automata (CA) and logistic regression (LR). These studies can be categorized into three major groups: studies comparing the predictions of different machine learning methods (Berberoğlu, Akín, & Clarke, 2016;Xu, Gao, & Coco, 2019); studies investigating the integration of several machine learning algorithms to perform urban growth prediction (Chaudhuri & Clarke, 2014;Guan, Wang, & Clarke, 2005;Lin & Li, 2016;Triantakonstantis & Stathakis, 2015;Xia, Zhang, Wang, & Zhang, 2019;Xu et al., 2019); and studies discussing the effect of auxiliary data on prediction accuracy (Stanilov & Batty, 2011;Xia et al., 2019;Zhou, Varquez, & Kanda, 2019). In addition to machine learning strategies, these studies discuss the definition of urban as well as non-urban areas. This section is structured in the following manner. First, the three major groups of machine learning strategies within the literature are discussed. Then the common definition of urban and non-urban areas is reviewed. Subsequently, approaches in the literature utilizing deep learning within the domain of geographic information science are presented. Berberoğlu et al. (2016) and Xu et al. (2019) compare different machine learning approaches to predicting urban growth. Berberoğlu et al. (2016) performed an urban growth prediction using an artificial neural network (ANN), a Markov chain (MC), and an LR. For this purpose they utilized remotely sensed imagery from 1967 to 2007 with a spatial resolution of 10 m × 10 m as input and aimed to predict the urban growth for 2023 in Adana (Turkey). They reported that their ANN scored a higher prediction accuracy than their LR (see Table 1). Shafizadeh-Moghadam, Asghari, Tayyebi, and Taleai (2017)   The different methods as well as their highest values for the corresponding accuracy indices are listed. "N/A" (not available) indicates that the measure is not reported for accuracy assessment. It can be seen that many prediction methods lack common indices for assessing the accuracy of their models. CA and SLEUTH are widely used for urban growth prediction. The spatial resolution indicates the size of the cell size for which the urban growth prediction was made.
model predicted with an overall accuracy of 74.1%. Triantakonstantis and Stathakis (2015) Zhu and Liu (2018), an attempt was made to provide a general framework for geographical site selection, based on graph-structured spatial data and a graph convolutional neural network (GCNN). However, they provide unstable results and no accuracy assessment. Yan and Ai (2018) used a GCNN for the detection of regular and irregular building structures.
They scored an overall accuracy of 98% in performing this binary classification. However, no further quantitative accuracy assessment was provided, such as a confusion matrix which would have provided a more detailed understanding of the classification accuracy for each of the classes.  stated that including spatial context information, such as spatial relatedness, spatial co-location, and spatial sequence patterns yields significantly better classification results in prediction. Mc Cutchan and Giannopoulos (2018) applied association analysis to demonstrate that CORINE land cover classes co-locate with geo-objects of certain classes. They highlighted the feasibility of using geospatial semantics to predict land cover classes. These classes of the geo-objects were encoded in OWL and were extracted from LinkedGeoData (Stadler et al., 2012). Grekousis (2019) provides a review on 45 papers published between 1997 and 2016, which utilize ANN and deep learning in urban geography. According to his meta-analysis, previous studies suggested that deep learning has the ability to model complex urban problems. Grekousis (2019) additionally suggested, that there is a strong need for novel and innovative methodologies in urban geography.
In most studies, urban areas were defined as areas with built environment (e.g., houses) and non-urban areas are defined by areas with unbuilt environment (e.g., forest) (Berberoğlu et al., 2016;Guan et al., 2005;Triantakonstantis & Stathakis, 2015). Unbuilt environments can also include the open unconstructed spaces within a city, such as parks. The urban growth predictions were mostly made in accordance to determine the no change, change from urban to non-urban and non-urban to urban (Berberoğlu et al., 2016;Santé, García, Miranda, & Crecente, 2010;Triantakonstantis & Stathakis, 2015).
To the best of the authors' knowledge and after conducting a thorough literature review, no work on urban growth predictions is known that utilizes extensive information on geospatial semantics including class information on geo-objects such as buildings, streets, or points of interest. However, recent results have suggested that both urban and non-urban land cover has spatial associations with geo-objects of different classes (Mc Cutchan & Giannopoulos, 2018). Finally, no research has been found that performed experiments with the incorporation of extensive semantic annotations of geo-objects and their spatial relationships to spatial phenomena which are required to be predicted. Thus, the potential impact of given geospatial semantics concerning the accuracy of a spatial prediction remains an open question.

| URBAN G ROW TH PRED I C TI ON ME THOD
The proposed methodology relies on three main steps: (1) data pre-processing; (2) development of the urban growth prediction model; and (3) the validation of the model (see Figure 1).
Two data sets, namely a vector data set with semantic descriptions as well as a raster data set, were used as input for the procedure. The vector data set is based on OpenStreetMap (OSM) data, which describe geo-objects for 2012 for the whole of Europe. This specific ROI was chosen since the utilized ground truth data (raster data) on A feature vector for each of these cells was generated and labeled with the corresponding urban change class.
There are four possible urban change classes, as already mentioned: NU, UN, UU, and NN. The feature vector was computed based on the geo-objects which are within a defined proximity of the selected cell. Each feature vector was added into a matrix which will be referred to as the geospatial configuration matrix (GSCM) throughout this article. The GSCM describes the feature vector of each cell as well as the label of the corresponding urban change class. Subsequently, the GSCM was divided into two data sets. The first data set was used for training, and the second data set was used for testing the MLP. The training data set was made up of 90% of the feature vectors of the GSCM, whilke the test data set was made up of the remaining 10% of the GSCM.
The MLP was trained using the training data set and evaluated using the test data set. A confusion matrix and a kappa coefficient are the result of such a model test. This training and testing procedure was repeated 10 times (10-fold cross-validation; Kohavi, 1995), resulting in a final confusion matrix and kappa coefficient, allowing the performance of the applied approach as well as the effects of over-and underfitting to be assessed. Thus, the proposed method is ultimately able to predict the urban growth class for areas with the same spatial resolution as the input imperviousness data set (i.e., 20 m × 20 m).

| Data preprocessing
The procedure utilizes vector and raster data sets as inputs for the selected ROI which is defined by the geographical extent of Europe. In order to be able to make predictions, the data were preprocessed. Therefore, geo-objects from LinkedGeoData were first transferred into a local database along with their semantic descriptions. Next, the imperviousness data set from Copernicus was reclassified into four classes: NU, UN, UU, and NN. Based on these two preprocessed data sets, a GSCM was computed. This matrix describes the geospatial configuration of every grid cell of the imperviousness data set with respect to the geospatial semantics.

| Vector data and geospatial semantics
The vector data set consists of geo-objects. Each geo-object can be either a polygon, a point or a linestring. In addition to its geometric information, each geo-object contains an attribute which describes its class (e.g., restaurant, street, supermarket, highway, airport). Each of these classes is part of a tree-structured OWL ontology, which is defined by LinkedGeoData. Therefore, every class can have one parent and multiple children. The vector data set The overall workflow of the methodology presented. The method takes vector data from LinkedGeoData as well as grid cells from the imperviousness change data set. Both data sets are then preprocessed. Afterwards the data from LinkedGeoData are used to predict the urban growth described by the imperviousness data set. The results are then finally evaluated using an accuracy assessment and its ontology were obtained in the following manner. LinkedGeoData (Stadler et al., 2012) was used to transform OSM data into linked data. For this purpose, OSM data were downloaded covering the entire area of Europe for September 2012 and were converted into linked data. The linked data generated were then exposed to a local endpoint using Sparqlify (SmartDataAnalytics, 2018 an ID, the OWL classname, and the column for the well-known binary, which describes the geometry of the geoobject. The OWL ontology enables the superclass of a geo-object to be determined. Thus, it make it possible to derive the information of class and subclass relations. A Java program was written within the scope of this work to query the SPARQL endpoint with SPARQL and to insert the geo-objects into the corresponding tables using SQL.

| Raster data on imperviousness
The second data set which is required as input for the proposed procedure is the raster data set of the impervious- It can be seen that class 4 corresponds to built areas which did not change from 2012 to 2015. Class 1 corresponds to non-built areas which did not change to built areas from 2012 to 2015. Class 2 represents areas which F I G U R E 2 Screenshot of the change of imperviousness data set for the south of Vienna, Austria. Some of the available urban growth classes can be seen: areas in which imperviousness did not change are shown in dark grey; new imperviousness cover areas are shown in red; and areas which remained without imperviousness are shown in light grey changed from non-built to built areas (e.g., from green area to building). Classes 3, 5, 6, and 7 cannot be seen in Figure 2. However, class 3 corresponds to areas in which the built areas have vanished. Class 5 represents areas in which the built areas have increased (e.g., more buildings). Class 6 represents areas in which the built areas have decreased (e.g., fewer buildings). Class 7 represents areas which could not be classified. Classes 3, 5, 6, and 7 are the minority of classes. Within this article, built areas are treated as urban areas and non-built areas are treated as non-urban areas. The four classes relevant for this work are NU, UN, UU, and NN. In order to obtain these four classes, a reclassification was performed on the raster data set. The reclassification scheme can be seen in Table 2. Increased imperviousness (class 5) and decreased imperviousness (class 6) were reclassified to UU since the built-up area was maintained from 2012 to 2015 (only change in density). Cells that were not classified (class 7) were excluded from the data, as it was not possible to reclassify them. The reclassified raster data were loaded into PostGIS. After importing the vector and raster data to PostGIS, the GSCM was created.

| Geospatial configuration matrix
The GSCM was computed based on the reclassified cells, which have the same spatial extent of 20 m × 20 m. For each cell ce j , a feature vector f j was created in the following manner. All geo-objects which were within distance d max of the centroids of the cells were computed. Based on these geo-objects, the distinct set of their OWL classes (restaurant, street, etc.) was extracted, as well as the maximum number of available geo-object classes, which is defined by the cardinality of the set of distinct classes. The distinct set of available classes is denoted by C, the maximum number of classes is c max (c max = |C|), and a single class is denoted by Subsequently, all geo-objects for each cell ce j which were within distance d max were loaded. The resulting set of geo-objects which were within d max of the cell ce j were denoted by G j . In order to keep the computational complexity feasible for this query, only the centroids of both the geo-objects and cells were considered. It should be noted that increasing d max will increase the computational complexity as well as the potential number of geoobjects which will be within that proximity.
The set of all distances from the center of a cell ce j to all geo-objects in G j of class c k was provided by a function denoted by dist(ce j , G j , c k ). Additionally, the set of all azimuths from the center of a cell ce j to all geo-objects in G j of class c k was provided by a function denoted by azimuth(ce j , G j , c k ). Both functions would return an empty set ∅ if no geo-object in G j of the class c k existed. A third function, denoted by count(ce j , G j , c k ), returned the number of geo-objects of class c k in G j . It is important to note that each geo-object can be a part of multiple OWL classes. To take the example of a "jewelry store" geo-object: "jewelry store" is a subclass of "craft", "craft" is a subclass of "shop", and "shop" is a subclass of "amenity", the class which describes the highest degree of abstraction. Thus, a jewelry store is a member of all four of these classes. This fact is considered for computing the sets of azimuths, distances, and the number of classes. A single geo-object is likely to appear in multiple computations: every jewelry store was TA B L E 2 The reclassification scheme for the imperviousness raster data set considered not only when computing the distances to all jewelry stores within d max , but also when computing the distances to all shops. This enabled us to include the complete amount of available semantics for each geo-object. Figure 3 provides a visual explanation of the three functions dist(ce j , G j , c k ), azimuth(ce j , G j , c k ), and count(ce j , G j , c c max ). It shows a cell ce j in the center, with a circular area around it defined by the distance d max . This area contains the centroids of the different geo-objects, which originally were points, polygons or linestrings.
Points of the same shape indicate that the centroids belong to geo-objects of the same class. Thus, three different classes can be observed. All of these points correspond to the set G j . The three points in the hexagon shape correspond to class c k . Three azimuths are measured, namely a 1 , a 2 , a 3 , as well as three distances, d 1 , d 2 , d 3 . Applying function dist(ce j , G j ,c k ) thus yields {d 1 , d 2 , d 3 }, and we have azimuth(ce j , G j , c k ) = {a 1 , a 2 , a 3 } and count(ce j , G j , c k ) = 3.
These three functions were utilized in order to create the feature vector f j for cell ce j . The feature vector contained the minimum, maximum, and standard deviation of the set of all distances for each class c m , returned by dist(ce j , G j , c m ). It additionally contained the minimum, maximum, and standard deviation of the set of all azimuths for each class c m , returned by azimuth(ce j , G j , c m ). The final element of the feature vector was the count of every class c m , which was returned by count(ce j , G j , c m ). Thus, the feature vector f j for cell ce j was defined as: (1) The GSCM was then created by computing f j for every given cell ce j (see Equation 2). The label function extracted the label of the corresponding cell and was one of the four urban growth classes. The labels were required by the MLP in order to determine which feature vector corresponded to which urban growth class. If G j was empty (i.e., there were no geo-objects in the proximity of d max around cell ce j ), no prediction could be made for this cell as no information on its surrounding geospatial configuration was available. Consequently such a ce j was not considered for training and testing purposes and f j was not included in the GSCM. The GSCM was computed for seven d max values: 20 m, 50 m, 500 m, 1 km, 5 km, 10 km, and 30 km. This enabled us to investigate the impact of the distance search radius d max on the prediction accuracy. The GSCM was then utilized to train and validate an MLP in order to predict the urban growth change.

| Creation and validation of the urban growth model
Once each of the seven GSCMs (i.e., for each distance band) was computed, an MLP was defined (see Figure 4). Subsequently, the MLP was trained and validated with each of the seven GSCMs in the manner of a 10-fold crossvalidation. The 10-fold cross-validation splits the data set into training and test data sets at every iteration in order to perform an accuracy assessment.
An MLP aims to minimize the prediction error (LeCun, Bengio, & Hinton, 2015). Unlike a conventional ANN, an MLP contains multiple hidden layers. Introducing hidden layers increases the complexity of the predictive model and therefore enables it to separate more complex data sets by modeling nonlinear relationships. Additionally, an ANN which has no hidden layers fails to model nonlinear relationships, such as the XOR function (Minsky & Papert, 1969). However, introducing more hidden layers requires more training data. An MLP is trained in multiple epochs in an iterative manner in order to lower the classification error. In every epoch, a forward propagation and a backward propagation are computed. The forward propagation computes a prediction based on the activation functions and weights. The weights are then changed by the backpropagation in order to minimize the overall classification error. For the minimization of this error, a gradient descent problem has to be solved which can be tackled by optimization procedures such as the stochastic gradient descent method, Adam or Adadelta (Ruder, 2016). The weights adjust in every epoch while the optimization procedure converges to a local minimum. Once this local minimum is reached, the final set of weights enable the computation of an optimized classification result.
As previous work on urban growth predictions strongly indicates that ANNs are superior to other machine learning models, we used an MLP, a specific type of ANN. However, we applied support vector machines, gradient boosting, logistic regression as well as an MLP on randomly selected GSCMs prior to our experiments to measure their performance. The MLP scored the highest overall accuracy during these informal trials and we therefore did not consider the other machine learning approaches for further experiments. This was done since the prime objective of this work is the investigation of geospatial semantics as an information source for spatial prediction tasks, such as urban growth prediction. The MLP used in this work was composed of one input layer, four hidden layers, and one output layer (see Figure 4). It utilized dropout (Baldi & Sadowski, 2013) in most of the layers in order to reduce overfitting. The first layer contained a batch normalization (Ioffe & Szegedy, 2015) in order to convert the original feature space (which was set up in three different units, namely meters (distance), degrees (azimuth), and a count of geo-objects) into a feature space consisting of a single dimensionless unit. All layers, except the last one, used exponential linear unit activation functions (Clevert, Unterthiner, & Hochreiter, 2015). The last layer used a softmax activation function. The proposed MLP architecture was created in an iterative manner: different MLP architectures were tested and evaluated according to their prediction accuracy. The proposed MLP architecture scored the best results. The specific parameters used to train the MLP can be seen in Table 3. To optimize the weights of the MLP the Adamax algorithm (Kingma & Ba, 2014) was used.

F I G U R E 4
The architecture of the MLP used in this work. There are six layers in total and four hidden layers. The last two layers do not have a dropout The accuracy assessment contains the confusion matrix as well as an overall accuracy measure and a kappa coefficient along with their fluctuations. The overall accuracy was computed as the ratio of the number of all correct predictions and the number of all wrong predictions: Kappa is defined according to Cohen (1960): where p 0 is the proportion of correct predictions and p c is the expected proportion of predictions due to chance.
Furthermore, recall and precision were computed for each urban growth class for each GSCM. Recall and precision measure the prediction accuracy for each urban growth class as: where t p is the number of true positive, f p false positive and f n false negative predictions. Additionally, the development of the training and validation loss as well as testing accuracy were computed for each training and validation epoch of the MLP along with their overall R 2 values. This approach allows any potential over-or underfitting to be detected.

| RE SULTS AND ANALYS IS
This section presents the computed results. Table 4 shows the computed values for each of the seven GSCMs.  in order to minimize the computation time but also to have enough information on every urban growth class. Row 2 in Table 4 states the overall number of cells which had geo-objects within the maximum search distance d max .
The distribution of the urban growth classes among these cells can be seen in rows 3-6 of  (1,137). This can be explained by the fact that increasing d max increases the likelihood of having a larger number of different geo-objects, which corresponds to a higher c max . As c max increases the number of features of the corresponding GSCM increases with it. Rows 9 and 10 in Table 4 show the values for the overall accuracy as well as the kappa coefficient of the 10-fold cross-validation. Their fluctuations are shown in parentheses. The values of the fluctuations suggest that the kappa and overall accuracy are stable. A visual representation of the development of the overall accuracy and kappa coefficient with respect to the distance d max can be seen in Figures 5 and 6. Based on Figures 5 and 6 and the overall accuracy and kappa values in Table 4, multiple observations can be made. The overall accuracy and the kappa coefficient rise and fall similarly depending on the d max . The highest overall accuracy and kappa coefficient for predicting urban growth is scored using a d max value of 5 km (GSCM V). The highest increase in both overall accuracy and kappa coefficient values, can be observed from raising d max from 50 to 500 m (GSCM II and GSCM III). However, the overall accuracy and kappa coefficient fall once d max increases beyond 5 km. Thus, GSCM VI and GSCM VII exhibit lower overall accuracy and kappa coefficients than GSCM V. This indicates that the most relevant geospatial information that determines the class of urban growth can be found within a distance of 5 km, with respect to the proposed model. However, it must be noted that increasing d max automatically increases the number of features used to predict the urban growth class.
Additionally, the coefficients of determination can be seen at the bottom of Table 4. They were computed for the training and test losses over all the epochs for every MLP. They indicate that no overfitting was present. Figure 7 illustrates the confusion matrices for all GSCMs. It can be observed that increasing d max reduces the confusion between the predicted and true urban growth class. The NN urban growth class is predicted with the highest accuracy using 5 km or 10 km for d max . The NU class is predicted most accurately with GSCM VII using 30 km for d max . NN and NU are the only two classes which exhibit a significant confusion using GSCM VI or GSCM VII. The UN class is predicted with the highest accuracy using GSCM VI and GSCM VII. The UU class is predicted F I G U R E 5 Comparison of overall accuracy for the different GSCMs. The overall accuracy changes depending on the d max value used for the GSCM. The maximum overall accuracy that can be scored is 1 with the highest accuracy using any GSCM. Additionally, it can be observed that increasing d max changes the different accuracies with different magnitudes. UN has the lowest accuracy for GSCM I, but one of the highest for GSCM VII. Thus, increasing d max increased the prediction accuracy for this class. Although the prediction accuracy of NU increases in two major steps (i.e., changing d max from 20 to 500 m as well as from 500 m to 30 km), it can be observed that its accuracy did not increase as strongly as for UN and UU. The same holds for NN. Its prediction accuracy rises by increasing d max from 20 m to 10 km, but not with the same magnitude as for UN and UU.
Additionally, it can be observed that its prediction accuracy decreases with an increase in d max from 10 to 30 km as its confusion with NU increases. Figure 8 shows the precision of the prediction results for each GSCM. Figure 9 shows the recall of the prediction results for each GSCM. The exact numerical values of Figure 9 can be found in Table 5 and the exact values of Figure 8 in Table 6. Both figures show the corresponding indicator depending on the distance factor d max . Several trends can be observed. The NU class was predicted with an increasing recall once d max was increased from 10 to 30 km, where it reached its maximum. However, its precision decreased when d max was increased from 10 to 30 km. Thus, increasing the distance d max to 30 km decreased the fraction of areas correctly identified as changing from non-urban to urban, among those predicted as such. The increasing recall suggests that more areas that changed from non-urban to urban were correctly predicted as such. Therefore, it correctly predicts more areas changing from non-urban to urban. The same holds for the prediction of areas which belong to the UN class, as its precision decreased once d max was increased from 10 to 30 km. Its recall increased when d max was increased from 1 to 5 km. The prediction precision and recall of the NN class exhibited an inverted behavior. Recall for the prediction of this class decreased when d max was increased from 10 to 30 km, but the precision increased. The UU class has the highest recall and highest precision overall.

| D ISCUSS I ON
The results demonstrate that geospatial semantics can be effectively used for urban growth prediction and further suggest that spatial prediction tasks can rely on geospatial semantics. The results provide further insights concerning the quality of the prediction performance, depending on the amount of information considered.
Additionally, the results enable comparison of performance with existing methods for predicting urban growth.

F I G U R E 6
Comparison of kappa values for the different GSCMs. The overall accuracy changes depending on the d max value used for the GSCM This section discusses the benefits of the proposed method and the role of geospatial semantics. Next, the results are discussed with respect to the different prediction performances of each class. Finally, the proposed approach is compared to existing methods for urban growth prediction.
The results demonstrate that the proximity in which geo-objects are considered for predicting the urban growth class is essential. Unlike a purely pixel-based approach, the approach presented is not spatially limited to a focal, zonal, or local pixel neighborhood, but provides improvements by modeling spatial relationships by distances and azimuths to geo-objects of a certain class. The geospatial semantics describe these classes and are therefore paramount for the proposed method. Fewer geospatial semantics yield fewer features, whereas more geospatial semantics yield more features for predicting the urban growth. The results show that more features can F I G U R E 7 The seven confusion matrices for the corresponding GSCMs. A change in the prediction accuracy for each class depending on d max can be observed. The elements of the confusion matrices are normalized by the total number of correct and incorrect classifications per row (NN = non-urban no change, NU = non-urban to urban, UU = urban no change, and UN = urban to non-urban) increase recall and precision of the predictions. Thus, more geospatial semantic knowledge can improve the prediction accuracy. Unlike urban growth prediction methods, which require factors to be modeled which potentially impact the urban growth in an explicit manner, the proposed method enables a series of factors to be incorporated in an automated manner: a potential factor could be the distances to roads or even the azimuths to industrial sides. It can be seen that the precision for each urban growth class depends on the distance d max F I G U R E 9 Recall versus distance d max (NN = non-urban no change, NU = non-urban to urban, UU = urban no change, and UN = urban to non-urban). It can be seen that the recall for each urban growth class depends on the distance d max cells.ce j when this radius d max was increased, more geo-objects and therefore more classes were considered for the spatial prediction task. Thus, geo-objects which were further away were considered and the number of features was increased. This increase, however, can introduce unnecessary features, as geo-objects which are further away might not have a causal relationship with the urban growth class of cell ce j .
The precision and recall of the prediction of UU areas was the highest. There are two potential reasons for that. First, urban areas are likely to share similar features such as distances to multiple objects which can mostly be found in cites, such as roads, restaurants or supermarkets. Non-urban areas might be close to some of the objects of these types (such as a family house or a remote gasoline station); however, there are unlikely to be as much of them in the same proximity as in an urban area. Moreover, non-urban areas which are close to urban areas might have similar objects within the same distance, but they might scatter around them in different azimuths. For example, an area of forest which borders a city in the very south might be as close to shops, streets, and houses as an urban area, but they all scatter north of it. This first reason explains why urban areas are less likely to be confused with areas which are non-urban as well change from non-urban to urban, such as green areas close to cities. Second, the remaining UU class can likely be confused with the UN class. However, there are only 1,390 areas of this type within the imperviousness data set. This decreases potential confusion accordingly. The NU urban growth class was predicted with the highest accuracy using 30 km for d max . However, decreasing precision and increasing recall resulted when changing d max from 10 to 30 km. This indicates that more non-urban areas which changed to urban were identified, but the rate of false positives also increased. Thus, more areas were confused with the NU class. Looking at the confusion matrix for d max = 30 km (see Figure 7g), it can be observed that this confusion is mostly present with the NN class. Thus, areas which remained urban or change from urban to non-urban were predicted more accurately than areas which remained non-urban or change from non-urban to urban. Considering that urban areas contain more significant associations to geospatial semantics, described by LinkedGeoData, than non-urban areas (Mc Cutchan & Giannopoulos, 2018), it can be said that a higher amount of available geospatial semantics yields higher prediction accuracy for areas which are urban or change from urban.
Thus, the results strongly indicate that more geospatial semantics improves the prediction accuracy for determining the urban growth class.
Related research on urban growth prediction provides a limited assessment of the accuracy of the various models (see Table 1). This makes it more difficult to provide a meaningful comparison of the proposed model (as well as future research) to those models. Additionally, the ROIs used might not be the same, which further increase this difficulty. However, the novelty of our approach can be justified based on two aspects: first, our proposed method for urban growth prediction scores the highest overall accuracy and kappa value for an ROI of the size and heterogeneity of Europe; and second, it is shown how geospatial semantics can be utilized for urban growth predictions resulting in promising prediction accuracies. Our work therefore demonstrates that geospatial semantics are a rich information source for spatial prediction tasks such as urban growth prediction.
Our approach scored an overall accuracy of 88.60% and a kappa value of 0.833 for the ROI, Europe. Two of the eight reviewed articles provide an overall accuracy for their model assessment. Our method scored an overall accuracy higher than these two approaches. Five of the eight reviewed articles on urban growth prediction provide a kappa coefficient for their accuracy assessment. Our proposed method scored a higher kappa value than all of these approaches, except that presented by Xu et al. (2019). They provide a model with an almost perfect kappa value of 0.94 (overall accuracy is not provided). However, their approach is limited to the region of the south of Auckland, which is a significantly smaller and more homogeneous ROI than Europe (which contains multiple countries). Consequently, the performance of their model for an ROI of a size and heterogeneity such as Europe is unknown. As their model is based on a CA, which uses predictions of an ANN, hyperparameters must be set for both. This increased number of hyperparameters might provide sufficient results for a specific and limited region such as the south of Auckland; however, it can lead to an overfitted model and create incorrect predictions elsewhere. In particular, it has to be considered that the behavior of CA can be complex and unstable. In contrast, our model is based on an MLP (a specific type of ANN) only, and consequently only needs the corresponding hyperparameters, which decreases the chances of overfitting accordingly. In essence, our model scores the highest overall accuracy and kappa value, given our ROI of the size and heterogeneity of Europe. This also makes our method ideal for predicting urban growth at a continental scale.
Our accuracy assessment demonstrates that geospatial semantics are a rich information source for performing a spatial prediction task such as urban growth prediction. It therefore incorporates 1,300 different OWL classes of different types of geo-objects, in order to compute different features for the GSCM. As each of these features represents a different geospatial factor (such as the minimum distance to a river or the maximum distance to a church), the MLP is able to learn which of these geospatial factors are more relevant for specific urban growth classes. In contrast, approaches such as SLEUTH or decision trees require an explicit modeling of such geospatial factors which potentially influence the process of urban growth. This might limit the performance of the predictions as the variety of different landscapes increases, and therefore the number of potentially influential geospatial factors, once the ROI gets bigger. Our proposed method avoids this problem by using the GSCM (which stores thousands of features) as an input for the MLP, which automatically determines which of the geospatial factors provided are more relevant for a specific urban growth class. In summary, the proposed method introduces advancements for predicting urban growth, by providing a feature space representation (GSCM) which enables accurate prediction for an ROI of the size of Europe. Additionally, the proposed method demonstrates that geospatial semantics can be used for spatial predictions, such as urban growth prediction, which exhibit a promising overall accuracy and kappa coefficient.

| CON CLUS I ON S AND OUTLOOK
Within this work a novel method for predicting urban growth was presented. The method predicts four possible classes: NU, UN, UU, and NN. The method relies on vector data with geospatial semantical enhancements as well as deep learning. The data were retrieved from a local linked data endpoint and then used to creating a feature space which embedded geospatial semantics. The method was evaluated using a 10-fold cross-validation which ultimately enables an accuracy assessment. The accuracy assessment revealed the promising results of the method. It was shown that geospatial semantics can be used for spatial prediction tasks such as urban growth prediction. Additionally, it was shown that the four classes underlie different complexities in terms of the defined geospatial configuration.
Based on the methodology presented, further research problems can be tackled.One of these is the impact of the level of abstraction in an ontology on the prediction accuracy. It is important to investigate if an ontology with higher or lower level of abstraction yields better results. For this purpose, information from the original ontology can be detracted. For example, an Asian restaurant becomes a restaurant; a school becomes a building; or a shop becomes an amenity. Secondly, data fusion with remotely sensed imagery could be performed. This could potentially improve the prediction accuracy. Thirdly, further land cover changes other than urban growth could be predicted in order to gain more insights into the underling spatial processes which describe these changes and their complexity. Fourthly, the methodology presented could be used to predict land use classes which rely not only on optical electromagnetic reflectances, but also on functions of geo-objects, that is, how they are being used and therefore semantically represented. There are many potential novel applications and research questions. This article has revealed the potential of geospatial semantics for spatial predictions. Future research can therefore utilize these novel insights in order to explore more efficient decision support systems, which can model the world