## 1. Introduction

Statistical models for data collected over a spatial region are widely available and heavily used in an enormous range of applications. However, the majority of these models assume that the spatial region of interest is a straightforward subset of where Euclidean distance is the natural metric. One interesting example of spatial data which does not have these characteristics arises from measurements made over a network consisting of continuous, connected curved line segments. The sample space is intrinsically one dimensional, although embedded in two-dimensional space. River catchments are a particular, and commonly occurring, example of this. Fig. 1 illustrates both the network and a series of point sampling stations for the River Tweed, which spans the border between Scotland and England. (Note that the picture shows some apparently unconnected stream segments. This is simply because some small lochs and other types of water body are not shown.)

Models for this type of spatial data require different constructions. In particular, Euclidean distance needs to be replaced by ‘stream distance’, which was defined by Ver Hoef *et al*. (2006) as ‘the shortest distance between two locations, where distance is only computed along the stream network’. This approach has been used in geostatistical models for stream networks for some time, e.g. by Cressie and Majure (1997) and Gardner *et al*. (2003). However, Ver Hoef *et al*. (2006) showed that substituting stream distance for Euclidean distance in standard geostatistical theory does not produce a valid spatial covariance model except when the exponential covariance structure is used. Ver Hoef *et al*. (2006) and Cressie *et al*. (2006) used moving average constructs to define a much broader class of valid spatial covariance models which use stream distance as well as other information, such as flow volume and the flow connectedness of locations. One of the defining properties of these models is that they assign a correlation of zero to pairs of locations which are not flow connected. Ver Hoef and Peterson (2010) developed the theory that had been set out in these earlier papers by defining both ‘tail-up’ and ‘tail-down’ moving average constructions, to allow for correlation between pairs of locations which are not flow connected. A variety of applications have subsequently been built on this theoretical structure; see Peterson and Ver Hoef (2010), Peterson and Urquhart (2006), Peterson *et al*. (2006) and Garreta *et al*. (2010) for examples.

Covariance functions, and the use of kriging for prediction at locations which have not been monitored, provide a very well-established approach to the construction of statistical models for spatial data. However, in some applications the principal focus is on the presence and nature of underlying trends, created by effects such as land use, geological patterns, dominant weather patterns or other influences with a strong systematic component which persists over repeated sampling of the same spatial region. Linear trends can be accommodated easily in covariance function models but in environmental settings trends often take the form of more flexible, non-parametric patterns. An attractive approach is then to place the emphasis on the direct modelling of these trends, using suitable forms of flexible regression, incorporating appropriate forms of spatial error where necessary. This line of thinking is also well established and expressed, for example, in the geoadditive models of Kammann and Wand (2003) and the more general semiparametric and additive modelling frameworks that have been described by Ruppert *et al*. (2003) and Wood (2006) among others. Bowman *et al*. (2009) described a model of this type for spatiotemporal data.

The aim of the present paper is to develop methods of flexible regression for data over a network. In common with all spatial models, a regression approach allows estimates to be constructed over an entire spatial region from point located data, but it also provides a framework within which spatial, temporal and other covariate effects can be treated simultaneously. Smoothing techniques form the basis of flexible regression methods and these have been applied to a variety of data structures. However, the published literature shows very little evidence of their use in a network setting. The challenge is to devise methods that are built on the concept of ‘borrowing strength’ locally, while respecting the specific topology of a network and the additional complications of directionality and size of flow. A key issue in addressing these issues is how to deal with confluence points, where different branches of the network combine. It is shown below that successful treatment of these issues leads to significant improvements over more standard smoothing techniques in this setting. In particular, the estimators exhibit features, such as sharp changes which are often expected at confluence points, but which cannot easily be reproduced by more standard approaches.

Monitoring systems which are designed to collect data spatially also commonly record data over time. In fact, in many applications, the detection of changes over time is equally important as the identification of spatial pattern and this has motivated a large body of research in spatiotemporal modelling. However, very little of this work is directed at a network setting. This provides an opportunity for a successful network flexible regression approach to be extended into the spatiotemporal setting, where spatial, temporal and interaction terms can all be identified informatively.

Different approaches to flexible regression over a network, including local fitting and penalized methods, are discussed in Section 'Network smoothing'. A spatiotemporal model, including main effects and interactions, is constructed in Section 'Kernel functions' where a correlated error structure is also considered. Visualization of the complex nature of the interactions is also discussed here. Throughout the paper, the methods and models are applied to data from the River Tweed. Some final discussion is given in Section 'Discussion'.