Benefits of a multiple‐solution approach in land change models

Land change (LC) models are dedicated to a better understanding of land use and land cover dynamics. A fundamental aspect of those models lies in the calibration of spatial parameters underlying such dynamics. Although there are many studies on the calibration of LC models, current efforts have a common goal of seeking to find a single global optimum solution, even though land change dynamics may be inherently heterogeneous throughout a given space. This article presents a calibration approach for finding multiple optimal solutions. A crowding niching genetic algorithm (CNGA) is incorporated into a cellular automata LC model. The model is applied to simulate urban expansion in Wallonia (Belgium) as a case study. Our findings demonstrate the ability of the model to locate multiple solutions simultaneously. In addition, the CNGA performs better than the standard genetic algorithm—besides, the CNGA helps to better understand the properties of land change dynamics within a given landscape.

Genetic algorithms (GAs) are powerful optimization methods, proposed by Holland in the 1960 and 1970s (Holland, 1975). They have already been successfully applied to many LC models (e.g., Al-Ahmadi, See, Heppenstall, & Hogg, 2009;García et al., 2013;Liu et al., 2015). Pareto front methods are often integrated with GAs to illustrate the trade-off among conflicting optimization objectives, which assists in optimizing the allocation of different, conflicting, land uses simultaneously (Mustafa, Heppenstall et al., 2018). A typical GA consists of a genetic representation of the problem space, an objective function to evaluate each candidate solution, and selection, crossover, and mutation operators. A GA starts by generating a first generation of individuals, often randomly. Each generation has a set of individuals (population) that are evaluated through an objective function. For each new generation, the GA selects a portion of the existing population to act as parents to breed a new generation using a crossover operator in which two parents are combined to create a new child. Each child is altered by a mutation operator that adds a small random number to each gene. The GA hence tries to maintain genetic diversity from one generation to the next. Owing to the use of a population, a GA can theoretically find multiple local optimal solutions. However, most standard GAs tend to prematurely converge to one optimal solution, because they have difficulty maintaining population diversity during evolution (Črepinšek, Liu, & Mernik, 2013;Sheng et al., 2016).
Accordingly, various approaches were proposed to overcome this shortcoming and to maintain stable subpopulations at all peaks in the search space, which is referred to as multimodal optimization. Li, Li, and Wood (2010) grouped these approaches into three main types. The first type involves sequential approaches in which the algorithm is repeated several times so that various solutions are expected to be found at each time. The second type is a parallel approach in which the population is divided into subpopulations and multiple solutions are expected to be found. The parallel approach can be considered as a kind of iterative method. The third type performs division of the population based on several techniques, amongst which the niching technique is the most common one (Das, Maity, Qu, & Suganthan, 2011;Hall, 2012;Mengshoel & Goldberg, 2008). Niching is a technique for finding and maintaining multiple favorable parts of the search space around many optima in parallel, by reducing individual drift effects resulting from selection and crossover operators in the standard GA. In other words, the term "niche" in the optimization domain refers to an area of the search space where only one peak resides (Das et al., 2011;Singh & Deb, 2006).
The idea of coupling GAs with niching approaches dates back to the work of Goldberg and Richardson (1987).
The crowding niching method was initially proposed by De Jong (1975) with the aim of avoiding premature convergence by preserving population diversity. In the De Jong (1975) method, a child is compared to a small number of parents randomly taken from the current population, and the child replaces the most similar parent.
Deterministic crowding (Mahfoud, 1992) is a modification of the De Jong (1975) method, in which the method randomly takes two parents from the current population to generate two children. Each child then replaces the nearest parent if it is of greater fitness. Chen, Liu, and Chou (2014) proposed the twin-space crowding GA, which efficiently explores multiple solutions. One of the main features of the  method is that it determines the competence between a child and a set of nearest parents and randomly replaces one of these parents with poorer fitness.
To the best of our knowledge, no research exists within the LC modeling domain dealing with the task of finding multiple optimal solutions and not just one single optimum. Finding multiple "nearly" comparable quality solutions may help to reveal hidden properties in the case under study, such as spatial variations of LC drivers. This is especially important in those cases where LC dynamics are characterized by divergent spatial parameters across a reference area. Identifying and revealing these divergences provides significant information in terms of LC. This information is concealed by all models based on one single global optimum calculated for the entire reference area.
In addition, multiple solutions constitute a step toward providing more robust solutions for future LC predictions.
Many LC models use an independent validation process such that the data used in the validation is different from the data used in the model calibration, by covering different time periods. For example, Mustafa, Heppenstall et al. (2018) calibrated their LC model using data for the 1990-2000 reference period and the calibration results were subsequently used to validate the model performance using 2000-2010 data. When divergent sets of spatial parameters are identified during the calibration of the LC model, these can be tested at the validation stage. If confirmed at the validation stage, such divergent spatial parameters should somehow be considered for forecasting future LC.
This article applies a CNGA to search for multiple configurations of spatial parameters to optimize the transition rules of LC model. Our CNGA is most similar to the work of Mahfoud (1992) and . Our LC model is a cellular automata (CA)-based model that is applied to simulate urban expansion in Wallonia, as an example case study, for time intervals 1990-2000 (calibration) and 2000-2010 (validation).

| DE SCRIP TI ON OF THE L AND CHANG E MODEL
In this article, a CA-based LC model is implemented. The model converts a number of non-urban cells into urban cells until it meets the required quantity of urban expansion. The conversion of cells at each time step is based on a transition rule.

| Definition of the transition potentials
The model allocates new urban cells based on a transition rule which has two components according to Wu (2002) and Mustafa, Heppenstall et al. (2018). The first component depends on a set of urbanization driving forces. The second component considers the local neighborhood characteristics. The transition potentials P for a cell ij changing its state are calculated as: (1) where (P d ) ij is the urban probability based on a set of urbanization driving forces, (P n ) ij is the neighborhood effect on the cell ij, σ is the relative importance of the (P n ) ij , and con(.) is the restrictive conditions for land use change.
Here, (P d ) ij is defined based on a set of forces that drive the change process and can be calculated as: where α is a constant, (X 1 , …, X n ) are the driving forces, and (β 1 , …, β n ) are the estimated parameters (weights) of the driving forces.
The value of (P n ) ij is calculated according to the following equation (Wu, 2002;Feng et al., 2011): where count(s = urban) is the number of urban cells amongst the Moore n × n neighborhood.

| Calibration
Model calibration is the process of finding a best-fit set of spatial parameters presented in Equations (1), (2), and (3).
Often, the driving force weights, Equation (2), are defined with a logistic regression (e.g., Achmad, Hasyim, Dahlan, & Aulia, 2015;Mustafa et al., 2017). However, some studies used optimization research algorithms to define the weights of driving forces (e.g., Feng et al., 2011). In our model, all parameters are calibrated through a CNGA in order to consider multiple optimal solutions. The details of the CNGA are presented in Figure 1.
At the beginning of the CNGA, a parent generation is created randomly. Thereafter, the GA evaluates every individual in the generation (population) and uses three operators to generate children with the same size of parent generation: selection, crossover, and mutation. The selection operator randomly selects a subset of parents and sets the best two as parents to generate two new children via crossover. Crossover randomly selects proportions of genes, one gene representing one parameter that requires calibration, from each parent (e.g., 70% from one parent, 30% from the other). The mutation operator randomly alters some genes in each child, which gives the algorithm the capability to widely explore the search space. However, the mutation operator reduces the convergence ability of the algorithm toward optimal places. In our algorithm, an adaptive mutation operator is used. It adds a random number from a Gaussian distribution with a center of zero and a specific standard deviation value at the first generation. The standard deviation is shrunk to zero linearly with generations. Thus, the CNGA explores much more search space at the beginning of the optimization process and ensures convergence by the end of the process.
The key to searching for multiple optimal solutions is using the crowding niching method to produce the new generation. For each child, C i , the algorithm finds the closest parent, P i ′, determined by a proximity function. We use the Euclidean genotype distance to determine the closest parent to every new child. The algorithm uses a two-stage replacement procedure to determine every individual in the new generation. Firstly, the algorithm replaces P i ′ with C i if C i is better than P i ′ in terms of fitness score. Next, if C i is skipped in the first stage, the algorithm finds all other parents located in a circle where the center is P i ′ and the radius is the distance between C i and P i ′.
C i randomly replaces one of the poorer-fitted parents in the circle. This step is a self-adaptive niche method as the algorithm dynamically selects the crowding space range for the survival by competence . If C i is skipped in the two replacement stages, the algorithm disposes of it and keeps P i ′ in the new generation.

| Fitness function and validation
The fitness function is the maximization of the fuzziness similarity rate between simulated and observed maps. The comparison considers only new urban transitions during the calibration interval (1990)(1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000). The fuzziness similarity rate is calculated as follows (Mustafa, Heppenstall et al., 2018): where FSR k (0 ≤ FSR k ≤ 100) is the fuzziness similarity rate for class k; I i k d is 1 if cell i k in the simulated map at zone d (0 ≤ d ≤ 4) has similar land use class to one cell at zone d in the observed map, otherwise it is 0; X k,sim equals the change amount of class k in the simulated map; and X k,actul equals the change amount of class k in the observed map. (4) F I G U R E 1 Crowding niching genetic algorithm

| Study area
Wallonia, the French-speaking region in Belgium, is proposed as the case study area to simulate the spatio-temporal dynamics of urban expansion between 1990 and 2010. Wallonia (Figure 2) occupies an area of 16,844 km 2 with 3,498,384 inhabitants in 2010 (Belgian Federal Government, 2013). Urban development has experienced a continuous expansion in the last two decades, which has taken place in relatively small, scattered plots. Urban development has traditionally been concentrated in the northern part of the region, the remainder of the region being more rural and less urbanized. The territory of the region is quite heterogeneous, as it is characterized by a significant difference between three basic configurations: a linear network of cities (Liège, Namur, Charleroi, Mons) following the industrial axis that developed along the main river channels; a southern area dominated by forests and dispersed rural settlements; and, finally, a dense peri-urban network of small and medium-sized cities developing in close relation with Brussels at the north of the industrial axis.

| Data
Two types of data are considered by the model. Firstly, Belgian cadastral data (CAD) is used to generate three urban maps for 1990, 2000, and 2010 using building construction dates. We rasterized the vector CAD at a cell dimension of 2 × 2 m and then aggregated the rasterized datasets up to 100 × 100 m raster maps following Mustafa, Heppenstall et al. (2018). Each 100 × 100 m cell represents a number of 2 × 2 m cells within it. We consider as urban cells all 100 × 100 m cells that contain 25 or more 2 × 2 m cells, corresponding to an average-sized building of 100 m 2 , as in Mustafa, Heppenstall et al. (2018). The second type of data is a set of various urbanization driving forces that are commonly used in urban expansion models. Table 1 lists all driving forces used in this case study.

The slope degree is computed based on the digital elevation map (DEM) made available by the Belgian National
Geographic Institute with a resolution of 10 × 10 m. Distances to the road network are derived from vector data F I G U R E 2 Study area: Wallonia (urban data is generated based on the Belgian cadastral data) provided by Navteq Co. Roads are categorized into four types: Road1 (highways), Road2 (main roads), Road3 (secondary roads), and Road4 (local roads). The major cities are all Belgian cities with a population of more than 90,000 inhabitants. All distance-based drivers are based on the Euclidean distance. The employment rate is provided by Wallonia Institute of Evaluation, Prospective and Statistics (Institute Walloon de L'évaluation, de la Prospective et de la Statistique) as a percentage for each municipality. The zoning map is derived from the Wallonia official zoning plan in a vector format. The map is developed by discriminating between zones where urban development is legally permitted (code 1) and those where it is not permitted (code 0). All driving forces were standardized and resampled to the same cell resolution of 100 × 100 m prior to model calibration.

| Settings
The urban configuration in our case study represents the entire urban system, suburbs, and rural areas. The urbani- Markov chain model (e.g., Yang, Zheng, & Chen, 2014), linear extrapolation (e.g., Mustafa, Heppenstall et al., 2018), and/or based on socio-economic factors (e.g., White & Engelen, 2000). Our main goal is to evaluate the allocation ability of the model, and therefore the amount of new urban cells equals the observed number of new urban cells for the study period divided by time steps.
In Equation (1), con(.) is 0 if cell ij is occupied by water (defined using the zoning plan) or 1 otherwise. In this case study, a 3 × 3 neighborhood window is used to represent neighborhood interactions (Equation 3).
The settings of various CNGA parameters are determined by undertaking empirical experiments on several values of the parameters with low number of generations and individuals. Accordingly, we set the parameters of the final run of CNGA as follows: (i) number of individuals (population size) 100; (ii) selection operator randomly picks a set of four parents and then selects the best two, according to the fitness score; (iii) crossover operator picks 60% of genes from the parent with better fitness score at random and 40% from the other parent to generate two children; and (iv) mutation operator alters one or more gene/s in each child by a random number selected from a Gaussian distribution with a standard deviation of 0.7 and linearly decreases the standard deviation to 0 on reaching generation 100. The number of generations is flexible. However, we set a convergence stopping criterion to force the algorithm to stop when the best fitness score for successive generations becomes similar. This has been defined as the moment when the change in the best fitness score over 25 successive generations is less than 0.001. We compared the performance of the CNGA with a standard GA which finds a single global solution. The settings of GA parameters (population size, selection, crossover, mutation operators, and number of generations) are identical to the CNGA parameters.

| Results and discussion
In this section, the calibration results and the validation of the model are presented and discussed. The model stopped working at generation 117 in the case of CNGA and at generation 109 in the case of GA. The average run time for each individual solution is 9.6 s in the case of CNGA and 9.2 s in the case of GA using a good PC (Intel Core i7-4700 CPU @2.4 GHz).
The CNGA discovered four different optimal solutions for Wallonia. Table 2 gives the parameter weights that result from optimization using the CNGA. It is noticeable that X 5 (distance to secondary roads), X 3 (distance to highways), X 6 (distance to local roads), X 7 (distance to major cities), and X 2 (slope degree) are the most divergent factors. Table 2 also shows the parameter weights of the global solution using the standard GA.
The performance of each candidate solution is reported in Table 3. The CNGA calibration reveals that solution #1 is the best-performing one. However, the other solutions show comparable performance, with very marginal FSR differences between 0.01 and 0.14. This indicates that each solution works well in specific locations across the study area, which enables us to explore the spatial heterogeneity in the relationship between the LC process and its driving forces. Table 3 also shows that the CNGA slightly outperformed the standard GA. This can be explained by the fact that the CNGA decreases the potential of trapping into a local optimum. existing cities, and near to roads. In contrast, the Solution #3 allocation process is mainly affected by the slope, the distance to major cities, and the number of local urban neighbors. Figure 3 confirms that Solution #3 works better within urban cores, whereas Solution #1 outperforms Solution #3 in rural areas.
In order to find the locations in which each solution outperforms the others, we adopted a moving window approach. The window dimensions were first estimated through an incremental spatial autocorrelation test. This test measures the global Moran's index at a series of increasing distances, measuring the intensity of clustering and/or dispersion. The higher positive z-score values when statistically significant, the higher the degree of clustering for a given distance (Tighe, Fillingim, & Hurley, 2014). Figure 4 depicts the incremental spatial autocorrelation peaking at a distance of about 12,000 m. Consequently, we set the dimensions of the moving window to 12,000 × 12,000 m (120 × 120 cell). Figure 5 shows that Solution #4 is the best solution within dense urban areas (i.e., Liege, Charleroi, and Mons).
Solution #3 better suits medium-sized cities (e.g., Namur), as well as smaller urban nodes. According to

| CON CLUS I ON S AND FUTURE WORK
This article has explored the advantage of implementing niching methods that extend the genetic algorithm to identify multiple solutions in LC models. We have implemented the CNGA to calibrate a cellular automata LC model. The CNGA allowed us to distinguish four divergent optimal solutions with comparable performance results, evaluated by means of the fuzziness similarity rate.
We believe that this work is important for many reasons. First, finding multiple optima is important to discover the relationship between specific subspaces and LC drivers. Our results provide evidence that the effect of each driver changes magnitude in the various subspaces. Second, the existence of competing solutions should be considered in future forecasting, as each solution may not be stable over time. The case study shows that Solution #2 performs better than other solutions during the period 2000-2010, while it was Solution #1 that performed best during the period 1990-2000. This opens an avenue toward sensitivity analysis, taking into consideration various solutions for future predictions. Third, the findings are important for policy makers, as they will help evaluate and adopt the location-based spatial planning strategies for a given region considering several subspaces and their specificities.
Many further research directions are raised by this study. We examined one of the common approaches to searching for several optimal solutions. As mentioned above, many other algorithms can be employed to discover several solutions. Future studies could examine other algorithms to identify their advantages and shortcomings.
In this study, we have used a moving window to define several zones and determine which solution works best for each subspace. The z score of the global Moran's index for incremental distances has been introduced to determine the appropriate window dimensions. More research on how to subdivide large areas into smaller geographic zones needs to be conducted in order to finetune the partition of a given space into subspaces where the various solution are most adapted.

ACK N OWLED G M ENTS
The research was funded through the European Regional Development Fund -FEDER (Wal-e-Cities Project).