## 1. Introduction

### 1.1. Weather monitoring planning as a location problem

Planning a network of monitoring stations is a topic that has attracted the attention of planners and researchers, as the advantages of proper planning and management of these facilities translate into both economic and scientific benefits. Two concurrent but incompatible main goals are identifiable in any planning action of such networks: the economic aspect favours a network with few stations, whereas an efficient network able to provide a good/accurate estimate of an observed variable is likely to require more stations. A balance between these two requirements must therefore be found. Even when this number is established, another important question is to locate an ideal set of stations that guarantee the best possible estimation results. These two questions, concerning the number and location of monitoring stations, have no immediate answer and can be difficult to resolve, given the large number of possibilities inherent to the combinatorial problems of this type, named normative location problems.

These problems, for which locational analysis deals with the formulation and solution methodologies, occur in many and varied contexts, addressing a number of questions such as the quantity, shape, size and internal relations with the facilities being located (Daskin, 1995; Church, 2002; Church and Murray, 2009). All normative location problems share some similarities, and diverse taxonomies have been suggested. These classifications explore the fundamental components that may be identified (Brandeau and Chiu, 1989; Laurini and Thompson, 1992; Hamacher and Nickel, 1998), and some of them extend the scope to include more than point location problems. In general, location problems are formulated in an economic context where *demand* and *offer* relate each other in a specific *space* and the location decision concerns the placement of offer while optimizing the cost and service function, called the *objective function* (a function describing the optimization objective in the problem). The most common objective functions are used to designate the various problems (covering, median, centre and many more) and classify them accordingly. The elements of location problems can emerge in diverse formats. Traditional location problems include previously known sets of offer and demand elements that relate themselves by some specified distance function, turning the problem into a combinatorial search for best solutions.

### 1.2. Background

There are several studies mainly focusing on methods of achieving an optimized network for monitoring natural or anthropogenic phenomena, many of which deal with networks for monitoring groundwater quality: in Prakash and Singh (2000), the selection of optimal locations to expand a network for monitoring groundwater levels in a region of India is tested minimizing the kriging variance of estimation and determining where the greatest estimation errors occur. Cameron and Hunter (2002) present an algorithm for optimizing networks monitoring groundwater quality based on kriging geostatistical estimators and their estimation variance. Their aim was to find spatial and temporal redundancy scenarios that can justify the exclusion of sampling locations and the consequent reduction of the network. Also aiming at reducing the spatial density of sampling sites of a network for monitoring groundwater quality, Li and Hilton (2005)—once again for budgetary reasons—apply an ant colony optimization-type algorithm (inspired by the behaviour of ant colonies) to spot cases of spatial redundancy. A more recent study by Yeh *et al.* (2006) shows how the combination of multivariate geostatistical analysis with genetic algorithms is effective in defining an optimal network for monitoring groundwater quality. This minimizes the estimation variance of the spatial factor and, in the authors' opinion, provides enough information to fully understand the spatial phenomenon. The broader issue of environmental monitoring networks has also generated a large number of studies. Caeiro *et al.* (2002) and Nunes *et al.* (2004a, 2004b) investigated the optimization of sediment sampling sites in the Sado River estuary, Portugal. They apply the simulated annealing (SA) optimization algorithm, searching for solutions that minimize both the variance and the average estimation error resulting from kriging interpolators.

Regarding the optimization of weather stations, several studies have been carried out, but with very distinct methodological approaches: Periago *et al.* (1997) develop a methodology to optimize the udometer network for rainfall observation in Catalonia. This methodology consists in building a model based on multiple linear regression (MLR) using several rainfall independent and explanatory variables. Regression residuals are analysed and the locations where they are higher are evaluated. Residuals are interpolated by kriging and the locations with the largest errors concerning estimation by MLR are obtained. The authors claim that some parts of the country are not well covered by the udometer network and there are few criteria for new locations. They also state that some of the criteria that point to a network with homogeneous spatial distribution become ineffective due to the complex topography of the country. Pardo-Igúzquiza (1998) also addressed this topic and presented a method that defines the optimal network for estimating rainfall events in river basins. By applying the SA optimization, he used two different approaches: first, he selected a subgroup of stations within a group of existing ones and then a second analysis was carried out to test the extension of that network. Each new solution was evaluated in terms of minimizing both the estimation variance by kriging and the costs of acquiring the data. The main purpose of the work was to select the location and number of pluviometers that provide the greatest estimation accuracy at the lowest cost. The author emphasizes the increase in the spatial variability for events lasting one day and its consequences when designing an optimal network. He concludes that the optimal configuration of a network for monthly observations is not identical to an optimal network for daily observations.

Geostatistical estimation is an issue that has attracted much attention. The uncertainty associated to spatial estimation can be derived from the estimation variance of estimates. However, this approach is based on two strong assumptions: the estimation errors are Gaussian distributed and are independent of the data values (see Goovaerts, 1997, pp. 259–262). Another approach is based on the estimation of the local probability function distribution conditioned to data values (see Goovaerts, 1997, pp. 262–264). These local distributions of the variables can be derived from stochastic simulations, which produce several realizations of the variable (see Goovaerts, 1997, pp. 262–264) or, simpler, estimated directly using an indicator approach.

The introduction of uncertainty of estimation by the indicator formalism has shown a high range of applications, from evaluating the distribution probability of a particular phenomenon to more complex calculations. To select optimal locations for environmental variables sampling, Pardo-Igúzquiza and Dowd (2004) apply the indicator co-kriging to quantify probability of occurrence simultaneously calculating estimation variances. Optimality is achieved by minimizing both the uncertainty of belonging to a cumulative distribution function class and the uncertainty of estimation variance. The authors emphasize the advantages of using the indicator formalism to enable uncertainty of estimation modelling as opposed to the traditional kriging, which only expresses the estimated values and the respective estimation variance.

A simpler approach involving less post-processing proposed by Bastante *et al.*, 2005 evaluates the uncertainty of estimation in the context of decision support and makes use of information obtained from geological surveys in areas classified as exploitable/nonexploitable. Through indicator kriging, they estimated the mathematical expectancy and determined the priority areas for intervention by probability of occurrence. A similar approach is developed by Diodato *et al.* (2004). The authors design a methodology based on indicator kriging for determining areas susceptible to soil degradation. They calculated and distributed probabilities through the indicator formalism, trying to determine the occurrence of erosive phenomena. The assessment of uncertainty allowed the delimitation of areas potentially affected by land degradation. Addressing a location problem where the goal is to establish equipment for toxic waste deposit, Rosenbaum *et al.* (1996) present a study on lithology classification based on samples collected from time to time. Following the indicator kriging methodology, the authors convert each lithological type into a binary variable, subsequently estimating the cumulative distribution function. The results obtained by these authors show not only the areas of uncertainty of estimation but also the probability of a particular type of lithology belonging to a determined class. Several authors are mainly concerned about the modelling of the probability cumulative distribution function and the aid it can provide when evaluating and solving several problems. The study by Amini *et al.* (2004) defines areas contaminated by heavy metals. This task generally involves an inherent uncertainty that must be taken into account when making decisions. This way, the authors estimate the probability of exceeding the limit for cadmium and lead concentration by the indicator formalism. The work by Fabbri and Trevisani (2005) raises the question of which methodology is appropriate for assessing the feasibility of exploitation for geothermal waters. These authors also propose an approach using indicator kriging describing its formality. They state that indicator kriging is a good estimator of the cumulative distribution function and afterwards simulates a calculation of the uncertainty of estimation. Although several references converge on the idea of this being a suitable approach for assessing the modelling uncertainty of various phenomena, there are others that draw attention to some constraints. Juang *et al.* (2003) designate the indicator kriging as one of the main methods of modelling the uncertainty of estimation. However, the authors stress the smoothing effect (overestimation of low values and underestimation of high values) it has in map production, emphasizing the particular case of pollutant modelling. In this work, the cumulative distribution function is calculated in the sample sites and then estimated by kriging and conditional simulation. When analysing the results, authors realized that kriging performs a much smoother distribution and the estimated values vary in the range of observed values. Lloyd and Atkinson (2000) evaluated three different geostatistical approaches—ordinary kriging, kriging with external drift (KED) and indicator kriging—to estimate the elevation. In this case, the authors noticed that indicator kriging consumed significantly more processing time when compared with other methods. However, despite a slower processing and a more elaborate formalism, this technique allowed the assessment of associated uncertainty of estimation, while the other methods could only be evaluated by their estimation variance.