Two‐dimensional P‐spline smoothing for spatial analysis of plant breeding trials

Large agricultural field trials may display irregular spatial trends that cannot be fully captured by a purely randomization‐based analysis. For this reason, paralleling the development of analysis‐of‐variance procedures for randomized field trials, there is a long history of spatial modeling for field trials, starting with the early work of Papadakis on nearest neighbor analysis, which can be cast in terms of first or second differences among neighboring plot values. This kind of spatial modeling is amenable to a natural extension using splines, as has been demonstrated in recent publications in the field. Here, we consider the P‐spline framework, focusing on model options that are easy to implement in linear mixed model packages. Two examples serve to illustrate and evaluate the methods. A key conclusion is that first differences are rather competitive with second differences. A further key observation is that second differences require special attention regarding the representation of the null space of the smooth terms for spatial interaction, and that an unstructured variance–covariance structure is required to ensure invariance to translation and rotation of eigenvectors associated with that null space. We develop a strategy that permits fitting this model with ease, but the approach is more demanding than that needed for fitting models using first differences. Hence, even though in other areas, second differences are very commonly used in the application of P‐splines, our conclusion is that with field trials, first differences have advantages for routine use.

and Wilkinson et al. (1983) and subsequently developed further by other authors to cover a wide spectrum of spatial variance-covariance models in common usage today (for a review, see Piepho et al., 2008). Spatial models allow making adjustments for irregular spatial heterogeneity and have been shown to improve efficiency in cases where a purely randomization-based mixed model cannot adequately capture the major heterogeneity across the trial area.
Splines provide another option for making spatial adjustments, though their application is in no way restricted to spatial smoothing. They can be used to model several quantitative covariates simultaneously, and perhaps one of their most important applications is in spatiotemporal modeling for large environmental datasets (Lee, 2010;Lee & Durbán, 2011;Lee et al., 2013;Wood, 2017). In agricultural field trials, splines have been used for modeling the joint effect of various quantitative treatment factors and their interaction with temporal development (Verbyla et al., 1999;Verbyla et al., 2018). The popularity of splines mainly stems from their great versatility and flexibility. Here, we will consider the application of splines for spatial smoothing of data from a single-field trial and traits assessed at a single point in time. Under this modest scenario, some simplifications become possible, and we are particularly interested in the effect of these simplifications on the outcome of the analysis and their relation to other spatial modeling options commonly used for field trials.
We will focus on a particular class of splines, so-called P-splines (Eilers & Marx, 1996). One of the earliest papers applying P-splines in field trials, though only in one dimension, is Currie and Durbán (2002). Recently, the use of P-splines has been suggested for two-dimensional spatial analysis of field trials (Rodríguez-Álvarez et al., 2018). In this framework, spatial covariance translates into smooth spatial trend. P-splines have a close connection with linear variance (LV) and random walk models (Boer et al., 2020). They are based on the use of B-spline bases for regression on covariates and provide a very flexible framework for smoothing of trends in multiple dimensions, and some of the most prominent applications for this type of modeling are spatial and spatiotemporal smoothing for large environmental datasets (Lee, 2010;Lee & Durbán, 2011;Lee et al., 2013). The simplest approach to spatial smoothing is by geoadditive models (Ruppert et al., 2003, p. 258), where each spatial dimension, that is, geographical longitude and latitude, is smoothed by separate terms in an additive fashion. Often, however, nonadditive extensions are needed to allow for interaction. A key challenge with environmental data is that measurements may be irregularly spaced, and hence, special care must be taken when modeling the interaction between dimensions. In field trials, matters are considerably simplified by the fact that plots are usually arranged on a regular grid (Appendix A.1).
Here we will focus on P-spline approaches that allow the partitioning of total variance in an analysis-of-variance (ANOVA)-type fashion (Lee & Durbán, 2011;Wood, 2017;Wood et al., 2013), which is also at the heart of the recently proposed P-spline-based approaches for field trials (Rodríguez-Álvarez et al., 2018). We seek to use a P-spline framework that is flexible enough for agricultural field trials, yet simple enough for straightforward implementation using mixed model packages. In fact, it is a key feature of P-splines that they have a mixed model representation, meaning that parameters are amenable to estimation by residual maximum likelihood (REML) (Patterson & Thompson, 1971). This makes them particularly appealing because the mixed model framework allows accommodating other features of plant breeding trial data, including genetic correlation among related breeding lines, which can be modeled using genetic marker data (Meuwissen et al., 2001), and random design effects arising from the randomization layout of the trial, as others have pointed out before us (e.g., Verbyla et al., 2018). Also, embedding in an REML framework allows a likelihood-based comparison with other mixed models accounting for spatial correlation.
We consider B-splines with a penalty term of the form , where is a penalty parameter, is a vector of regression coefficients with corresponding design matrix , and is a differencing matrix (Wood, 2017). In the simplest case, the B-spline basis provides a smooth in just one spatial dimension, but it may also be extended in two dimensions using the Kronecker product of bases for rows and columns (see Appendix A.1). The penalty determines the smoothness of the coefficients along the spatial dimensions, because it strives for small values of . The mixed model representation of P-splines rests on the fact that the singular penalty matrix = is formally equivalent to a precision matrix of a random effect . To exploit that equivalence when using a mixed model package for fitting P-splines, usually needs to be converted to a suitable variance-covariance matrix. In the mixed model representation, this leads to a fixed-effects component, representing an unpenalized part, and a random-effects component, representing the penalized part (Currie & Durbán, 2002;Lee & Durbán, 2011;Wand & Omerod, 2008). The resulting models are very similar to those proposed in Verbyla et al. (2018) for smoothing using two covariates based on an integrated squared second derivative penalty used in cubic spline smoothing; but, as will be explained, there are slight differences in the way the unpenalized terms are handled. As our main impetus for this paper is to provide a P-spline framework that is conveniently implemented in a general REML-based mixed model package, we closely follow the philosophy set out in Wood et al. (2013), primarily focusing on such penalties that have just a single parameter and upon conversion give rise to a variance-covariance matrix for the observed data that is linear in the parameters. It is acknowledged that some penalty matrices that have been proposed TA B L E 1 Notation for matrices and vectors used for P-splines for two-dimensional P-splines do not have this desirable property and therefore are not considered here in detail. In our discussion, we will briefly review a few important examples of such penalties in order to put our framework into perspective.
A particular focus will be on the choice of differencing matrix . In the context of field trials, the early development of NN methods was based on second differences (Papadakis, 1937;Wilkinson et al., 1983), but later the focus turned to first differences (Besag & Kempton, 1986;Kempton et al., 1994) and the closely related LV model (Williams, 1986), because they are simpler and often provide a good fit. Second differences were rarely used subsequently (but see Green et al., 1985). The recent proposal to use P-splines (Rodríguez-Álvarez et al., 2018) in field trials, however, has led to a revival of this option. In light of the new options that P-splines provide, a revisit of the long-standing question in field trials whether first or second differences are preferable is in order. This paper has several objectives: (i) to provide a thorough derivation of the two-dimensional P-spline representation for field trials as a mixed model, focusing on an ANOVA-type decomposition and paying particular attention to the separation between penalized and unpenalized terms, (ii) to show how such P-splines are intimately related to other commonly used spatial models, such as LV models, (iii) to compare first and second difference penalties in several field trial datasets, (iv) to compare P-splines empirically with other spatial models such as the first-order autoregressive (AR1) model (Gilmour et al., 1997), and (v) to provide guidance to researchers wanting to implement these methods in current mixed model packages. The rest of the paper is structured as follows. Section 2 considers a single column of plots to introduce key concepts and notation. This is extended to smooth marginal (main) effects for rows and columns in Section 3. Section 4 shows how interaction between row and column smooths may be added. The framework will be illustrated using four examples in Section 5 and compared to other spatial models. The paper ends with a discussion in Section 6.

SMOOTHING ALONG A SINGLE COLUMN OF PLOTS
In this section, we will set the stage considering a single column of k plots. Key elements of the notation that will be introduced step-by-step, also for use throughout subsequent sections, are summarized in Table 1 for ease of reference. Let be a × matrix of th degree (( + 1)th order) B-spline bases (Eilers & Marx, 1996) for the row-coordinates ℎ of the plots (typically the row numbers). Generally, the first interior knot will be placed at the first plot and the last interior knot at the last plot of the column of plots. Often, the remaining interior knots are placed at the other plots. Alternatively, the remaining interior knots can be placed at larger distances between the first and last plots, which reduces the number of random coefficients and hence the dimension of the mixed model equations. The number of B-spline bases, , will generally be given by the number of interior knots ( ) plus ( − 1). Thus, if the interior knots are placed at every plot, we will have = + − 1.
The trend down the column may be modeled by the regression term , where is a vector of regression coefficients. To penalize these, we employ the ( − ) × matrix of pth differences, from which the penalty term is , where is a penalty parameter and = is the penalty matrix. Classical spline smoothing adds this penalty to the residual sum of squares (Green et al., 1985). If a mixed model representation of P-splines is used, the penalty is added to the (residual) log-likelihood (Paul Eilers, in discussion of Verbyla et al., 1999). The penalty is recognized to be equivalent to the term in the log-likelihood for normal random effects, when = −2 , where 2 is a variance parameter, and hence, the precision matrix is −2 . To fit this with a mixed model package, the precision matrix must usually be converted to a variance-covariance matrix. This conversion is not unique because the precision matrix is singular. To resolve this issue, one may use the spectral decomposition = ( ) , where the columns of are the eigenvectors and is the vector of eigenvalues. Denote the subvector of containing the − positive eigenvalues by + and the corresponding eigenvectors by the columns of + . Further, denote the eigenvectors corresponding to the zero eigenvalues in by 0 . Then observing that = , the regression term may be reparameterized by a one-to-one transformation as where = 0 , = 0 , = + , and = + . Finally, using the fact that = + ( + ) + and replacing + by , the penalty becomes Thus, in the transformed model, only is penalized and is formally equivalent to a random effect, whereas , representing the null space of the penalty, is unpenalized and therefore is formally equivalent to a fixed effect in a mixed model. There are several ways to utilize these key results to fit the P-splines using a mixed model package. In all of these, as a direct consequence of the derivation in (1), it is required to fit the unpenalized fixed-effect term (Lee et al., 2021;Wood et al., 2013). As regards the random term, we may explicitly fit with variance-covariance matrix var( ) = 2 ( + ). This is equivalent to fitting as random with singular variance-covariance matrix var( ) = 2 + , where + = + ( −1 + ) + is the Moore-Penrose inverse of (Boer et al., 2020), because {var( )} + = ( + ) as in (2). In fact, it is proved in Lee et al. (2021) that we may use any generalized inverse of the precision matrix as our variance-covariance matrix and that this provides a unique fit if we simultaneously fit , representing the null space or unpenalized space of the penalty matrix. For example, if a positive-definite variance-covariance matrix is required by the mixed model package, one could use the g-inverse − = + + 0 0 . In what follows, we will use the Moore-Penrose inverse + , computed from the spectral decomposition, for convenience and refer to a unique fit whenever the null space of is fully represented by fixed effects. It is pointed out here that for > 1, 0 , and hence , is determined only up to an orthogonal rotation, that is, 0 can be replaced by 0 for any orthogonal rotation matrix obeying = = (Harville, 1997, §21.5f). In fact, statistical packages may differ in the particular form of 0 obtained in the spectral decomposition of a singular matrix. This indeterminacy is of no consequence, however, because the fit of by fixed effects, as well as the residual likelihood for the random effects, is invariant to the rotation . In later sections, we will consider two-dimensional extensions of such models that fit specific components of by random effects for smoothing, and in this case, the fit is not rotation invariant.
We note in passing that in addition to the smooth plot effect , we will generally fit an independently distributed residual vector with variance-covariance matrix 2 , corresponding to a "nugget" variance in geostatistical parlance (Piepho & Williams, 2010). The full mixed model then takes the usual form = + + , where the null-space term is integrated with the fixed effects and the smooth term is integrated with the random effects .

Special cases
(i) When = 1 (first differences), then √ 0 = 1 and from the properties of B-spline bases √ = √ 0 = 1 , corresponding to the general intercept. (ii) When = 2 (second differences), = 0 can be replaced with (1 ⋮ ℎ ), where ℎ = (1, 2, … , ), because (1 ⋮ ℎ ) is in the column space of (Lee, 2010;Lee & Durbán, 2011;Lee et al., 2021). Thus, to achieve a unique smooth, we may add a regression on the serial plot number in the fixed part of the model. Although the representa-tion of as a regression on plot numbers has some intuitive appeal and is therefore very popular, there is no strict need to use this representation, and one can just use the parameterization in terms of the original eigenvectors, that is, = 0 . We think that when it comes to a two-dimensional extension of the smoothing approach, it is, in fact, preferable to stick with this parameterization (see Section 4.1), because may form part of a term that needs to be smoothed. (iii) If = 1 and = 1, and the interior knots are placed at the plots, we have = + − 1 = and = . This model is equivalent to the LV model of Williams (1986) and also to the NN model based on first differences by Besag and Kempton (1986), as well as a first-order random walk model (Boer et al., 2020;Appendix A.2). Specifically, the spatial variance-covariance matrix of the LV model can be shown to be another generalized inverse of (see Boer et al., 2020). (iv) If = 2 and = 1, and the interior knots are placed at the plots, we have = + − 1 = and = . This model is equivalent to the second-differences NN model of Green et al. (1985). For a revealing link of this case with the second-order random walk model (Durbin & Koopman, 2001, p. 39), see Appendix A.2.

SMOOTH MARGINAL EFFECTS FOR ROWS AND COLUMNS ON A RECTANGULAR GRID OF PLOTS
Now assume that we have plots placed on a rectangular grid of rows and columns (possibly with some missing data). In row-column layouts, it is sensible to fit effects for rows and columns. First consider effects for rows. These may be smoothed using random effects. If observations are first ordered by rows and then by columns within rows, this corresponds to fitting random effects with the same specifications for the penalty as given in Section 2 with design matrix ⊗ 1 . To make the smooth unique, we require fitting the corresponding fixed effects ⊗ 1 . The same kind of effects may be fitted for columns. Thus, we fit random effects with design matrix 1 ⊗ and variance-covariance matrix 2 + . This will be unique if we add fixed effects 1 ⊗ , where = 0 . We may interpret the terms ( ⊗ 1 ) and (1 ⊗ ) as "smooth" marginal effects for rows and columns (Wood, 2017, p. 232). When = 1 (first differences), then the two unpenalized terms coincide, that is, ⊗ 1 = 1 ⊗ = 1 ⊗ 1 , corresponding to the overall intercept. When = 2 (second differences), the two unpenalized effects have an alternate representation which, apart from the overall intercept, involves a linear regression on row and column numbers with design matrices ℎ ⊗ 1 and 1 ⊗ ℎ , respectively. We reiterate here that while this representation is very popular and is fine for unpenalized terms, we do not advocate this representation for penalized terms as will be needed in the next section.
We note here that when the trial is randomized according to a resolvable row-column design, independent random effects for rows and columns within replicates would be routinely fitted in a randomization-based analysis. Thus, we may regard such a model as a baseline for any spatial extensions, including P-splines. If marginal smooths for rows and columns are added to a randomization-based baseline model, as suggested in this section, they compete with the independent row and column effects. Another way of looking at this is that the independent random effects for rows and columns capture any remaining nonsmooth trend not represented by the smooth. Also, the associated variances can be interpreted as nugget for rows and column effects in geostatistical terminology.

EXTENDING THE SMOOTH TO COVER INTERACTION
Analogous to linear models for two-way analysis of variance (ANOVA), we may augment the smooth marginal effects ( ⊗ 1 ) and (1 ⊗ ) with a term of the form ( ⊗ ) , which will expand the spline space to cover interaction (Wood, 2017). There are various choices for the penalty, and special attention needs to be paid to its null space because this may be quite large, for example, when the penalty is derived from a single Kronecker product of the two marginal penalty matrices and , as considered in Section 4.1. In Section 4.2, we consider a penalty involving a sum of two Kronecker products that is very popular in smoothing (Lee & Durbán, 2011;Rodríguez-Álvarez et al., 2018;Wood, 2017) and is closely related to the intrinsic autoregressive (IAR) model considered in Besag and Higdon (1999). Before going into details, we would like to stress that our philosophy here is to consider the interaction smooth and its null space as a point of departure. This view is different from other derivations that start with the marginal smooths and then from these derive the interaction smooth terms (e.g., Verbyla et al., 2018;Wood, 2017;Wood et al., 2013). We believe that it is helpful to consider the interaction term in isolation initially because this helps to understand the associated null space and make sure that it is fully accounted for in the overall model. As it turns out, in Section 4.1, the null space involves terms that are confounded with the smooth marginal terms introduced in Section 3. This fact makes the derivation a bit involved, but we think that it is necessary to take this approach to be clear about the origin and fate of all terms representing the null space of smooth terms for the interaction.

Penalties derived from a single Kronecker product of the two marginal penalties
In this section, we consider penalized differences , where = ⊗ , and hence the penalty (3) with = = ⊗ = ⊗ , corresponding to random effects with variance-covariance matrix 2 + , where + = + ⊗ + . The product nature of the penalty already hints that its null space is substantial, so substantial, in fact, that a smoothing of that null space may be useful, requiring additional penalties, as will be discussed later in this section. To determine the fixed effects needed to represent the null space of , note that ( ⊗ )( ⊗ ) = ⊗ and ⊗ = ( 0 ⊗ 0 ⋮ 0 ⊗ + ⋮ + ⊗ 0 ⋮ + ⊗ + ), such that using the same approach as in the one-dimensional case (see Equation ( (1)), the term ( ⊗ ) can be transformed one-to-one as where are the corresponding regression coefficients. Analogous to the derivation in Section 2, it emerges that only the term is penalized because it is associated with the positive eigenvalues + and + , whereas the terms 00 00 , 0 0 , and 0 0 represent the null space because they are associated with zero eigenvalues and hence need to be fitted as fixed effects to obtain a unique smooth. Note that the dimension of the column space of is ( − ) × ( − ), whereas the null space has dimension × − ( − ) × ( − ), which can be quite large (Wood, 2017, p. 232). This suggests that should be chosen as small as possible.
Importantly, the unpenalized terms 0 0 and 0 0 will generally show confounding with the marginal smooths introduced in Section 3. It should also be stressed that for > 1, the dimension of the space represented by these terms is larger than that of the marginal terms in Section 3. This is why we think that it is crucial in the derivation to initially consider the interaction smooth and its null space in its own right. If we insist that these fixed-effects terms be fitted to ensure uniqueness of the smooth , then the marginal effect smooths of Section 3 are absorbed into the fixed effects and hence may be dropped. Alternatively, it may be worthwhile to replace the fixed terms 0 0 and 0 0 by smooth equivalents, which are fitted as random. In this case, however, the smooth is no longer invariant to the choice of generalized inverse of the penalty matrix (Lee et al., 2021). Also, because of the confounding, the smooth marginal effects are absorbed by the smooth versions of 0 0 and 0 0 (and not vice versa!) and hence may also be dropped in this case. It follows that either way, with or without smoothing of 0 0 and 0 0 , there is no need to explicitly add the marginal-effect smooths described in Section 3; these are accounted for by the terms in the null space of the interaction smooth for ( ⊗ ) .
In what follows, we will consider a few important special cases paying particular attention to the null space of ( ⊗ ) and consider how these can be turned into penalized terms. Scrutiny of these special cases then leads to our suggested general approach for two-dimensional P-spline smoothing for field trials.

4.1.1
Special cases (i) If = 1, then √ = 1 , √ = 1 and √ 00 = 1 ⊗ 1 , coinciding with the general intercept. The design matrices √ 0 = ⊗ 1 and √ 0 = 1 ⊗ have dimensions ( − 1) and ( − 1), respectively, which are usually so substantial that smoothing becomes worthwhile, though this will make the smooth nonunique. For example, for design matrix 0 , we may consider random effects 0 with variance 2 0 ( −1 + ), which is equivalent to assuming design matrix ⊗ 1 and random effects 0 with variance-covariance matrix 2 0 + . This is recognized as the smooth marginal effect for rows (Section 3). Hence, in this case, both 0 and 0 may be absorbed into the smooth marginal effects for rows and columns, respectively. Conversely, if we insist that 0 and 0 are fitted as fixed, they will absorb the smooth marginal effects for rows and columns, respectively, meaning that the latter can be dropped. (ii) If = 1 and = 1, and interior knots are placed at the plots, we have = + − 1 = and = + − 1 = . Moreover, = and = . Assuming that the marginal terms 0 0 and 0 0 are modeled as fixed, this model is equivalent to the LV⊗LV model of Piepho and Williams (2010) and also to the NN model based on first differences along rows and columns by Kempton et al. (1994) (also see Appendix A.3; Besag & Higdon, 1999;Boer et al., 2020), and all of these models can be viewed as limiting cases of the separable AR1⊗AR1 model (Gilmour et al., 1997;Piepho & Williams, 2010). The terms 00 00 , 0 0 , and 0 0 correspond to general intercept and row and column effects, respectively. The row and column effects may be many, and it may therefore be worthwhile to fit these as random for recovery of interblock information (Lee et al., 2021), though it must be realized that this changes the model fit and sacrifices the invariance. Again, the corresponding smooth marginal effects in Section 3 may be regarded as representing these random effects already. However, when 0 0 and 0 0 are replaced by smooths, the smooth is no longer unique, and also there is no longer equivalence with the LV⊗LV model and two-dimensional first differences. Conversely, if we insist on fixed row and column effects to ensure uniqueness, the smooth marginal effects from Section 3 need to be dropped. (iii) If = 2, then the column space of 00 has dimension 2 = 4 and can be represented by fixed effects for regressions on the row and column numbers and their cross products (Lee & Durbán, 2011). The column spaces of design matrices 0 and 0 have dimensions 2 × ( − 2) and ( − 2) × 2, and smoothing is worthwhile. For example, 0 = ⊗ could be smoothed separately for the two columns of , which are in the same column space with 1 . Hence, the smooth would be confounded with the smooth marginal effect for rows. It has random effect vector with variance-covariance matrix ( −1 ) ⊗ 2 0 2 , which is equivalent to design matrix ⊗ with random effect 0 having variance-covariance matrix + ⊗ 2 0 2 . This smooth would represent the smooth marginal effect for rows. It is important to point out here that if was replaced by (1 ⋮ ℎ ), the covariance structure would change, that is, the model is not invariant to linear transformations with respect to , as is well known for random-coefficient models (Longford, 1993;Wolfinger, 1996). Also note that our suggestion here involves fitting a single penalty for both columns of , rather than two, as is commonly done (Verbyla et al., 2018;Wood et al., 2013). Fitting two separate penalties for 0 = ⊗ amounts to fitting the variance-covariance matrix ( −1 ) ⊗ ( 2 0(1) , 2 0(2) ) for the random effect 0 , where the variance 2 0(1) is associated with the first column of and the variance 2 0(2) is associated with the second column of . This structure may be compared to a random coefficient model with random intercepts and slopes. In such models, fitting a diagonal variance-covariance for intercepts and slopes means that the model is not invariant to linear transformations of the covariate. This is also the main reason, why we do not recommend replacing by (1 ⋮ ℎ ). Invariance is achieved if we allow a covariance between intercept and slope. Such a covariance will also ensure invariance to a rotation of 0 and hence of . Thus, we can use the variancecovariance matrix ( −1 ) ⊗ Σ 0 , where the two variances on the diagonal of Σ 0 are 2 0(1) and 2 0(2) and the covariance is 0(1,2) . It is also important to reiterate that for > 1, the null-space eigenvectors in 0 and 0 , and hence, the matrices and are only determined up to an arbitrary orthogonal rotation of the eigenvectors (see Appendix A.4 for the special case = 2).
(iv) If = 2 and = 1 (a special case of (iii)), the null space can be written in a form involving fixed-effects regressions on row numbers within columns and on column numbers within rows. If these effects are modeled as random for recovery of information, we obviously have a random-coefficient model (Longford, 1993), and this is known to require a covariance among intercept and slope to ensure invariance to linear transformations of the plot coordinates (row and column numbers). We may either model the intercepts and slopes as independent between rows and between columns, or we may use a smooth across the rows and across the columns.

General case
The confounding of the smooth marginal effect introduced in Section 3 and of effects in the null space of requires adjustment of our notation and actually allows some simplification. The general approach emerging here for ANOVAtype smoothing is to fit fixed effects with design matrix ⊗ , smooth row marginal effects with design matrix ⊗ and variance-covariance matrix + ⊗ 2 , smooth column marginal effects with design matrix ⊗ and variance-covariance matrix 2 ⊗ + , and smooth interaction effects with design matrix ⊗ and variancecovariance matrix 2 + ⊗ + . Note that we have relabeled the effects and variance components here to reflect the absorption of marginal-effect smooths of Section 3 by null-space terms arising from the interaction smooth. We also note that the whole smooth only has three variance components, regardless of the order of differencing, that is, one for rows, one for columns, and one for the interaction. For > 1, there are two further alternatives for smoothing the two marginal effects, which are described here just for the row smooth: In place of + ⊗ 2 with homogeneous variance corresponding to the columns of , we may either use the diagonal structure + ⊗ Ω where Ω = ( 2 (1) , … , 2 ( ) ) or + ⊗ Σ where Σ is an unstructured p-dimensional variance-covariance matrix. The former makes the smooth invariant to rescaling of the spatial coordinates (Wood, 2017, p. 236), but not to translations or rotations (Appendix A.4). The latter has the advantage of full invariance regarding the representation of and . The lack of invariance to translations or rotations of the diagonal model is particularly relevant, because for > 1, the null-space eigenvectors spanning and are determined only up to an orthogonal rotation. In difference to the smooth terms, the fixed-effects term with design matrix ⊗ is always rotation invariant for > 1. In summary, for any value of p, there are always four terms, that is, the fixed-effect ⊗ , the two marginal smooths for ⊗ and ⊗ , and the pure interaction smooth for ⊗ . What was just described as different options for > 1 can also be taken to subsume = 1 as a special case. For example, for general , the marginal smooth for ⊗ has variance-covariance matrix ( ⊗ )( + ⊗ Σ )( ⊗ ) = + ⊗ Σ . Then for = 1, this reduces to + ⊗ 1 1 2 . This same reduction also holds for the other two options for > 1, that is, diagonal and homogeneous. For = 1, this reduces further to + ⊗ 1 1 2 , and for knots placed at the plots, this corresponds to one of the marginal terms for rows of the LV⊗LV model (Piepho & Williams, 2010).
Despite the invariance of the unstructured model to linear transformations of and , the structure is notoriously difficult to fit, as is well known for random-coefficient models in general (Longford, 1993). This is because with poor scaling the variance-covariance matrix Σ or Σ can have correlations close to the boundary of the parameter space, and it can therefore be difficult to ensure that the matrix remains positive definite. Further problems arise when a variance converges to zero during iterations.
For = 2, we may address these issues by considering an orthogonal rotation of the null-space eigenvectors. To illustrate this, consider the design matrix = 0 (the subscript or is dropped here for simplicity) as needed on the marginal smooth for columns. One possible set of null-space eigenvectors 0 is (Lee & Durbán, 2011) where is an m-dimensional vector of 1′s, scaled to have unit length, and is the vector (1, 2, … , ), centered and also scaled to have unit length. This can be orthogonally rotated to any other permissible set of eigenvectors by 0 ( ), where for ∈ [0, 2 ]. When 0 is involved in a marginal smooth using the diagonal variance-covariance structure Ω, we may find the optimal rotation by a grid search over . This is equivalent to fitting a reparameterized version of the unstructured model Σ, with acting as a third parameter over which the likelihood is profiled. Thus, our proposed procedure for fitting the unstructured model is to perform a grid search identifying a rotation angle that at least nearly maximizes the residual likelihood for the diagonal model, and then use that rotation to fit the unstructured model. The optimization for the diagonal model will ensure that the correlation between the two regression terms associated with the columns of or will be close to zero and safely removed from the boundary of the parameter space. For this purpose, it is not crucial that the correlation is zero exactly, which is why we only need a good guess of the that would nullify the correlation.

4.2
Penalties derived from sum of Kronecker products Lee and Durbán (2011) consider a penalty of the form Wood (2017, p. 232) points out that penalties of the form (7) can provide smoother fits than those outlined in Section 4.1. A major advantage of (7) over the interaction smooth in Section 4.1 is that its null space is ⊗ and only has dimension 2 . For = 1, the null space corresponds to just the intercept (Dutta & Mondal, 2015), whereas for = 2, it also comprises the regression on the row and column numbers and their products (Rodríguez-Álvarez et al., 2018). These null spaces can always be accommodated in the fixed-effects part of the mixed model at virtually no cost. A further important consequence of the low dimensionality of the null space is that there is no confounding with the marginal smooths introduced in Section 3, which may be seen as a major advantage of the model. Interestingly, for = 1, = 1, and knots placed at the plots, taking the conditional expectation of , , the fertility value for the interior plot in the ith row and jth column (i.e., the i,jth element of vector ), given all other ′ , ′ -values, one obtains with 2var( , | ⋯) = 1∕( 1 + 2 ), where = 2 ∕( 1 + 2 ) (modified from Dutta & Mondal, 2015). This is recognized as an NN model, where the central plot is regressed on the nearest row and column neighbors (Julian Besag, in discussion of Bartlett, 1978;Kempton & Howes, 1981). Besag and Higdon (1999) refer to this as the IAR model. Despite this appealing connection with NN models, the penalty (7), as well as variations considered for field trials (Mao et al., 2020;McCullagh & Clifford, 2006), is more difficult to translate to a standard mixed model framework because more than one parameter is involved, and hence, the inverse of the precision matrix is not linear in the parameters (Wood et al., 2013, p. 345). A convenient option to fit (7) using a mixed model package is by profiling the residual log-likelihood for α, that is, via a grid search over ∈ [0, 1] (Besag & Kooperberg, 1995). It must just be kept in mind that when α maximizes the residual likelihood at either = 0 or = 1, the penalty in (7) changes to one with a single penalty parameter having a much larger null space, thus sacrificing its desirable properties. This is perhaps the main disadvantage of the IAR model. To obviate the problem at the boundary, a constraint may be imposed such that 0 < < 1, as is done in the separation of anisotropic penalties (SAP) algorithm of Rodríguez-Álvarez et al. (2015). Rodríguez-Álvarez et al. (2018) consider a simplified version of (7), with an associated simplification of (8), which only has a single penalty parameter, allowing this to be fitted easily with a mixed model package. The models implemented in SpATS involve adding marginal smooths for rows and columns with diagonal variance-covariance structures as described in Section 4.1. Durbán et al. (2003) consider a trial with 252 barley lines laid out as a resolvable row-column design with two replicates, and eight rows and 34 columns per replicate (Figure 1a). The replicates were adjacent so that the trial had = 16 rows F I G U R E 1 Heatmap of barley data of Durbán et al. (2003): (a) raw data and (b) smooth trend fitted by model M26 (Table 2) and = 34 columns in total. We reanalyze this data using several special cases of our P-spline approach. In all models with spatial covariance, we allow the spatial covariance to extend across replicates. For P-splines with both = 1 and = 2, and also for other models, we fit row and column coordinates and their product as fixed regression terms so that the deviance and Akaike information criterion (AIC) based on the residual likelihood, using twice the number of variance parameters as penalty term, can be used to compare all models (Wolfinger, 1996). Table 2 shows several models' fits for = 2, = , and = , that is, interior knots are placed at the plots. Model M12 with = 3 and diagonal structures (Ω) for both marginal smooths was used to perform a grid search determining rotation angles maximizing the deviance approximately ( Figure 2). These values were then used to fit the marginal smooths with unstructured covariance (Σ). We considered = 3 (cubic splines) and = 1 (ordinary second differences; Green et al., 1985). A few observations are as follows:

A barley trial
(i) The unstructured model is confirmed to ensure translation invariance, as opposed to the models with diagonal or constant variance. This can be seen by comparing models M3-M6 to M6-M8, where only is fitted. The diagonal models M6 and M8 fitted quite well compared to the invariant unstructured models (M3-M5). When also fitting , we were not able to fit models M16-M17 with unstructured Σ and Σ because positive definiteness of these matrices could not be enforced during iterations. Fixing the covariance in Σ at zero, convergence was achieved. Note, however, that M16-M17 are equivalent to M15, which we were able to fit and which fitted best among the models in Table 2. Durbán et al. (2003) and wheat data of Stroup et al. (1994) using P-spline approach with = 2 (second differences), = , = , and the penalty in (3) for . All models have fixed effects for replicates, genotypes, row numbers ℎ , column numbers ℎ , and the product of row and column numbers

F I G U R E 2
Deviance profiled for and for barley data using model M12 with = 3 and diagonal structures (Ω) for both marginal smooths (Table 2). Minimum at = 27 • and = 11 • TA B L E 3 Analysis of barley data of Durbán et al. (2003) and wheat data of Stroup et al. (1994) using P-spline approach with = 1 (first differences) and the penalty in (3) for . All models have fixed effects for replicates, genotypes, row numbers ℎ , column numbers ℎ , and the product of row and column numbers. The marginal smooths use var( ) = + ⊗ 2 , = 1 , var( ) = 2 ⊗ + and = 1 for all models (ii) M20 could not be fitted with = 3 (again fixing the covariance in Σ at zero allowed convergence), but fitted second best with = 1, giving a fit that was almost identical to M15. (iii) Using the 1 ⋮ ℎ as the representation of in the marginal smooth gave poorer fits and also was more difficult to get to converge. (iv) There is only a very small difference between fits for = 1 and = 3, suggesting that the B-splines with > 1 are not strictly needed and we can revert to simple first or second differences (Eilers, 2003) as in classical NN (Besag & Kempton, 1986;Wilkinson et al., 1983;Williams, 1986). Table 3 shows results for first differences = 1. These are quite competitive with second differences ( = 2). Reducing the number of knots had hardly any effect as expected (Eilers & Marx, 1996). The best fit is obtained for = 1 and knots placed at the plots (M26) (see Figure 1b for the smoothed trend). Table 4 shows the fits obtained with the IAR model with penalty (7) (Section 4.2) for a step size of 0.1 for in the grid search. Again, first differences seemed to do better than second differences. Model M41 with = 1 fitted best among all models considered for this dataset. For = 2, we also fitted TA B L E 4 Analysis of barley data of Durbán et al. (2003) and wheat data of Stroup et al. (1994) using the IAR model with the penalty in (7) for with = and = with and without marginal smooths added. All models have fixed effects for replicates, genotypes, row numbers ℎ , column numbers ℎ , and the product of row and column numbers. Fits obtained by a grid search over = 0, 1(0.1) : obtained using near optimal orthogonal rotation (Figures 2 and 4). b Fit for fixed = 0.5; using ⌢ and ⌢ for and , respectively. c Fit for fixed = 0.5; using ⋮ and ⋮ for and , respectively (Rodríguez-Álvarez et al., 2018). smooth marginal terms involving the full and , though this would not be necessary to represent the null space of the interactions smooth. We faced the same problems when trying to fit the marginal penalty for columns with unstructured Σ . When fixing the covariance to zero, convergence was easy to achieve. With the diagonal models for the marginal smooths, convergence was also unproblematic. For this model, we also report the fit for fixed = 0.5 (M45 and M46), corresponding to the penalty in (9), for the case = 3. This model is the one used in Rodríguez-Álvarez et al. (2018) and implemented in the SpATS R package, available from CRAN (https://CRAN.R-project.org/package=SpATS). The model is doing quite well in terms of AIC, but several others reported in Tables 3 and 4 fare slightly better. The best model was often obtained for = 1.0, in which case the null space is not fully represented by the model; these models would need to be extended to fully cover the null space, which we have not pursued here. We also fitted the LV⊗LV model (Piepho & Williams, 2010), which is linear in the parameters, and the AR1⊗AR1 model, which is nonlinear (Gilmour et al., 1997) ( Table 5). Both models also have good fits, with an edge in favor of the latter when a nugget was added. That model (M56) had a very similar fit to the best P-spline model in Table 3 (M26) but was inferior to the best model in Table 4 (M41). Stroup et al. (1994) report a trial for 56 wheat varieties laid out in = 11 rows and = 22 columns (Figure 3) (this is the "Alliance" trial in their paper). The layout was a resolvable incomplete block design with a single blocking factor nested within replicates. The dataset displays a strong spatial trend, which is described by Stroup (2002) in these terms: "The TA B L E 5 Analysis of barley data of Durbán et al. (2003) and wheat data of Stroup et al. (1994) using other common models. All models have fixed effects for replicates, genotypes, row numbers ℎ , column numbers ℎ , and the product of row and column numbers pattern is typical of spatial variability. In this case, relatively low yields tend to be clustered in the northwest corner. This pattern is explained by a low rise in this part of the field causing increased exposure to winter kill from wind damage, and hence depressed yield." We here use the version of the dataset that is available in the "agridat" package of R (Wright, 2021; https://CRAN.R-project.org/package=agridat), which has the yield expressed in bushels per acre. The four replicates are not arranged as a rectangular array, with some rows divided up between two replicates. We fitted the exact same models as for the barley data (Tables 2-5). The general picture emerging from this second example is rather similar to that from the first, so details will not be dwelt on here. It is worth mentioning that M46, which corresponds to SpATS (Rodríguez-Álvarez et al., 2018), gives a good fit in terms of AIC, landing in the midrange of the other models reported in Tables 3 and 4. The unstructured models for the marginal smooths could only be fitted with the rotation determined by a two-dimensional grid search for the diagonal model ( Figure 4). Even with this rotation, it was not easy to achieve convergence, mainly due to a variance component approaching zero. All other models were easy to fit. The results for = 2 do confirm that the diagonal model is not invariant to rotations but the unstructured model is. One conclusion from this example is that first differences work quite well.

Two uniformity trials with barley
To shed further light on the relative merits of first and second differences, we consider two uniformity trials. A uniformity trial has the same treatment on all plots, and it is mainly used to explore different design options, which can be superimposed onto the trial layout (Petersen, 1994). The first experiment (Roseworthy; Williams & Luckett, 1988) had = 15 rows and = 48 columns. The second (Ihinger Hof; Piepho & Williams, 2010) had = 30 rows and = 36 columns. A subset of the models used in the first two examples is considered here. As the data are from uniformity trials, the models have no treatment or replicate effects. We used models M25 and M26 with first differences ( = 1) and models M1, M2, M15, and M20 with second differences ( = 2). In addition, we used the models M53 and M54 as a baseline. For the Roseworthy trial, there was a systematic pattern in the column effects resulting from the three-plot seeder used. It was observed that the first of the three plots sown simultaneously yielded systematically below the other two (Williams & Luckett, 1988). Thus, we fitted a fixed effect with two levels to account for this systematic pattern, that is, one level for columns 1, 4, 7, and so on, and a second level for all other columns (Rodríguez-Álvarez et al., 2018). The results in Table 6 show that the P-spline models led to an improved fit compared to the baseline and also that first differences are quite competitive compared to second differences. It is noteworthy that the baseline model M54 with independent random effects for rows and columns fits best with the Ihinger Hof trial, showing that there is no one-fits-all spatial model.

DISCUSSION
We considered a framework that allows the use of P-splines to model spatial trends in field trials (Rodríguez-Álvarez et al., 2018). The methodology requires making several choices, that is, the order of differencing ( ), the degree of the B-spline basis ( ), and the number of knots ( , ), which in field trials are always placed on a regular grid. As the two examples illustrate, there is indeed a very large number of options, and this means that model choice could easily become an overwhelming exercise. It would be advantageous for routine use if a limited set of models could be identified that F I G U R E 3 Heatmap of wheat data of Stroup et al. (1994): (a) raw data and (b) smooth trend fitted by model M26 (Table 2) works well over a broad range of settings, not requiring an "intricate process of tailoring models to individual datasets" that always entails "an element of subjectivity which will be difficult to eliminate" (Rob Kempton, in discussion of Verbyla et al., 1999). A natural starting point for the placement of knots are the plot locations, and this is our general recommendation. Our results for the barley and wheat data suggest that there is little gain from trying other options. We also found little difference in goodness-of-fit for different choices of the degree of the B-spline basis, with the simplest choice of = 1 (implying the simplification = ) working quite well, corresponding to the classical NN modeling with first or second differences for = 1 and = 2, respectively. The examples further showed that first differences seemed to be sufficient for smoothing the spatial trend. Certainly, more empirical experience is needed to further provide guidance for routine use of the framework. We do conjecture based on our experience so far that a viable recommendation is to stick with the most parsimonious options, such as = 1, = 1, = , and = . These models are particularly easy to fit. This option is also very closely related to earlier approaches to NN analysis of field trials and so will be easy to communicate to researchers wanting to use P-splines. For practical implementation, one may either use the specialized SpATS R package, which implements a subset of the models considered in this paper, or a general linear mixed model package such as SAS, for which sample code is provided in the Supporting Information. Careful consideration needs to be given to the null space of the interaction smooth, especially for the models described in Section 4.1. As detailed there, this null space can have a rather larger number of dimensions for = 2 than for = 1 when the penalty in (3) is used. If one insists on representing this null space by fixed effects to make the fit invariant to F I G U R E 4 Deviance profiled for and for wheat data using model M12 (  Piepho & Williams, 2010) using P-spline approach with = 1 (first differences) for models M25 and M26 (see Table 1 for details) and = 2 (second differences) for models M1, M2, M15, and M20 (see Table 2 for details). All models have fixed effects for intercept, row numbers ℎ , column numbers ℎ , and the product of row and column numbers. All models have = 1, = , and = . For the Roseworthy trial, we fitted a fixed effect with a common level for every third column starting with the first (1, 4, 7, . . . , 46) and a second level for the remaining columns (details on the background are in the main text)

Roseworthy
Ihinger the choice of g-inverse for the singular precision matrix rc , efficiency of the analysis may be adversely affected, and it is therefore worthwhile to fit these effects as random. In particular, the P-spline framework allows smoothing these terms, combining them with the marginal smooth for rows and columns as prescribed by the ANOVA-type approach to spline smoothing (Wood, 2017;Wood et al., 2013). It must be acknowledged, however, that this sacrifices the invariance to the choice of g-inverse for rc . We note in this context that the LV⊗LV model gives the same fit as the P-spline approach for = 1, = 1, = , and = , when the null space is modeled by fixed effects for rows and columns, but not when these effects are modeled as random (Boer et al., 2020), or are smoothed (Tables 3 and 5). An advantage of the IAR model with the penalty in (7) is the small size of the null space for the interaction smooth, even for = 2, with a simple fixedeffects regression on row and column numbers to cover that null space. Thus, there is no confounding with the marginal smooths, and so, these can be considered in their own right, there being no strict need to fit them. Care is needed with the interaction smooth, however, when converges to either 0 or 1 (this happened with both examples), because in that event, the null space changes, increasing considerably, which may be seen as a disadvantage of the IAR model. One way out is to constrain the grid search so that stays clear of the lower and upper bound, or use specialized procedures such as the SAP algorithm outside standard mixed model packages (Rodríguez-Álvarez et al., 2015), and implemented in the SpATS R-package. We have included fixed-effects regression terms for row and column numbers and their interaction in all models, even those that do not involve P-splines, and also for P-splines with first-difference penalty, where these terms are not needed to represent the null space of the smooth. This was done so we could use the REML-based likelihood for comparing all models on the same basis. These terms only cost three degrees of freedom in the fixed part of the model, and it may therefore be worth including them in their own right to capture global trends, even if they are not needed to represent the null space of the smooth terms.
A general advantage of the P-spline framework is that most of the variance-covariance structures are linear in the parameters, which makes them easy to fit and minimizes convergence problems that may befit spatial variance-covariance models that are nonlinear in the parameters (Piepho et al., 2015;Velazco et al., 2017). This plays out most when the random terms are all independent, that is, the smooth terms correspond to a variance components model with no covariances among random effects. As we have shown, when > 1, the marginal smooth associated with the penalty (3) for the interaction may require allowing a covariance among random intercept and slope terms, a key result that seems to have been overlooked by others, and this can be difficult to fit on account of the need to ensure positive definiteness of the variance-covariance structure. Such difficulties pose an obstacle for routine use, and they speak in favor of the simpler Pspline options based on first differences. In this regard, it is interesting to observe that the AR1⊗AR1 model with a nugget variance (M56), which fitted quite well, had relatively large correlations along rows and columns (0.8598 and 0.8952 for the barley data and 0.7488 and 0.9064 for the wheat data), in which case the covariance is well approximated by an LV model (Piepho & Williams, 2010;Williams, 1986), corresponding to the simpler P-spline for = 1, = 1, = , and = .
The two-dimensional P-splines considered for anisotropic spatial modeling in this paper are relatively simple ones. A limitation of P-splines based on tensor products in other spatial contexts is their dependence on the orientation of the coordinate axes and the quick increase of the number of basis functions given by the tensor product. But what may be seen as a limitation in other contexts is a necessity in the context of a field trials, where the natural orientation of axes for anisotropic models are the row and column dimensions. This is because all management operations (e.g., tillage, sowing, fertilization, spraying of pesticides, harvesting) are performed along the rows or along the columns. These operations are a major anthropogenic source of spatial variation. Due to their presence, an anisotropic model will usually be the most plausible, even if the nonanthropogenic variation is deemed reasonably close to isotropic (McCullagh & Clifford, 2006). A further point to consider is that plots are often longish rather than square. Rotational invariance and parsimony could be achieved by employing isotropic models, for example, by the use of (low-rank) radial smoothers (Ruppert et al., 2003, Chapter 13), as used, for example, for Kriging. But for the reasons just given, an isotropic model is not expected to hold in agricultural field trials, which is why such models are rarely if ever used in that area.
In this paper, we were concerned with the implementation of P-splines for field trials using mixed model software. This required converting a singular precision matrix into a variance-covariance matrix. When the variance-covariance matrix is singular, mixed model packages such as the MIXED procedure of SAS use a variant of the mixed model equations, due to Harville (1976), which involves the singular variance-covariance matrix (Piepho et al., 2012). These methods work well for plant breeding trials, which typically involve between several hundred up to 2000 or 3000 plots, which is a modest size compared to some other applications with a much larger number of observational units. If P-splines are implemented using Bayesian methods, the model may be formulated directly in terms of the singular precision matrix (Dutta & Mondal, 2015;Lang & Brezger, 2004;Selle et al., 2019), but this does not annihilate the need for identifiability constraints (Goicoa et al., 2018). Also, Henderson's (1950Henderson's ( , 1975 h-likelihood method allows frequentist inference even when the precision matrix is singular as described in Dutta and Mondal (2015). Both the Bayesian and h-likelihood methods may be computationally advantageous with much larger datasets.
On a final note, we would like to stress the importance of design, which often does not receive the attention it deserves. Sometimes, the large number of modeling options for spatial analysis may raise the false impression that design does not matter, and that a sophisticated analysis takes care of everything. Nothing could be further from the truth. Certainly, given the large number of options for analysis, the design issue becomes more challenging and is far from resolved. Rather than focusing the design on a specific route of analysis, one may consider searching for a good design without a specific analysis in mind , striving for robustness to model choice.
Trials laid out according to a classical randomized design may be analyzed by two-dimensional P-splines as illustrated in this paper. The randomization design should be taken into account in such analyses. In our second example, the design was a resolvable incomplete block design. Thus, our models always included a fixed effect for complete replicates. The models did not, however, include an effect for incomplete blocks. We would normally also include such an effect to fully account for the randomization layout and consider spatial components as an add-on (Piepho & Williams, 2010). Here, we omitted this effect to facilitate comparison between examples having different blocking structures and also because the effect competes with spline components in modeling spatial trend within replicates. We note that an analysis omitting an incomplete block effect is still covered by the randomization layout (Speed et al., 1985). One way of interpreting these analyses is that we let the spline components take up the task to model the heterogeneity within replicates and hence to recover the interblock information. The design in the first example had rows and columns nested within replicates as incomplete blocks. Again, all models had an effect for complete replicates. Any smooth effects for the row and column blocks are taken up by the spline components. Model M54 fits random effects for rows and columns, thus fully representing the randomization layout for this example. This model is outperformed by spline models, however, which model row and column effects using the spline components. Although model M54 (Table 5) assumes that row and column effects are independent random effects, the spline-based models assume that these random effects are spatially correlated along the row and column dimensions, and the AIC values reported in Tables 2-3 show that these models fit better than the classical randomization-based model M54. The results for the Ihinger Hof uniformity trial, where M54 fits better than the smooth models, hint that it may be generally important to account for nonsmooth effects separately, and also to choose a design that caters for such effects .

A C K N O W L E D G M E N T S
We thank two reviewers and the Associate Editor for their very helpful comments.

C O N F L I C T O F I N T E R E S T
The authors have declared no conflict of interest.

D ATA AVA I L A B I L I T Y S TAT E M E N T
All data is made available with the supplementary files.

O P E N R E S E A R C H B A D G E S
This article has earned an Open Data badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. The data is available in the Supporting Information section. This article has earned an open data badge "Reproducible Research" for making publicly available the code necessary to reproduce the reported results. The results reported in this article could fully be reproduced.

R E F E R E N C E S S U P P O R T I N G I N F O R M AT I O N
Additional supporting information may be found in the online version of the article at the publisher's website. Row-wise Kronecker products Eilers and Marx (2003) proposed the use of row-wise Kronecker products, also known as box products or tensor products of B-spline bases for nonadditive smoothing. A salient feature of field trials, however, is that plots are usually placed on a regular grid, for which the tensor product coincides with the usual Kronecker product (Lee & Durbán, 2011). Thus, if ⊗ denotes the row-wise Kronecker product and ⊗ denotes the ordinary Kronecker product, then for a B-spline basis matrix with k rows and arbitrary number of columns corresponding to regularly spaced knots and a B-spline basis matrix with s rows and an arbitrary number of columns (knots), we have This important fact can hardly be overemphasized for those wanting to use P-splines for analysis of spatial data on a regular grid in general and on a regular grid of field trial plots in particular, but drawing on a literature that seeks to embrace irregularly gridded data as the dominant application case, for example, in longitudinal studies. In the case of field trials, this means that a row-wise Kronecker product can be written as an ordinary Kronecker product of matrices associated with rows and columns (Verbyla et al., 2018), which greatly simplifies the notation and also clarifies the relationship with other spatial modeling approaches for field trials, such as the LV⊗LV model (Piepho & Williams, 2010) and the AR1⊗AR1 model (Gilmour et al., 1997). We make use of the important Equation (A1) as a design matrix throughout this paper to model spatial interactions between rows and columns. Readers less familiar with P-splines should be aware that the simplified notation arising from (A1) may look somewhat unfamiliar compared to results for P-splines applicable to irregular spatial grids (Lee & Durbán, 2011;Wood, 2017, p. 232), even though the regular case is just a special version of the irregular one.

A.2
Relations with state-space models The use of differences among neighboring plots (Besag & Kempton, 1986) has close ties with state-space models (Durbin & Koopman, 2001;Harvey, 1989), which, in turn, can also be represented as mixed models (Piepho & Ogutu, 2007;Tsimikas & Ledolter, 1997). Specifically, we may assume that the response on the ith plot is related to a latent level according to = + , where ∼ (0, 2 ). If we further assume that the state on the neighboring (i+1)th plot is related to that on the ith plot by +1 = + , where ∼ (0, 2 ), then follows a first-order random walk model, which is a simple form of state-space model. For first differences, we then have +1 − = ∼ (0, 2 ), demonstrating the connection between the use of first differences and state-space models as used mostly for time series. By comparison, the AR1 model is known as damped model in state-space modeling and written as +1 = + with ∼ (0, 2 ), where 0 < < 1 is the autocorrelation (Harvey, 1989, p. 46).
The first-order random walk model can be extended to what is known in state-space modeling as local linear trend model. The starting point is a simple linear regression of the form = 0 + 1 . One idea to make this deterministic trend stochastic is to let both 0 and 1 vary randomly, but this would lead to discontinuities at the plots (Harvey, 1989, p. 37). A better option is to work directly with the current level, in place of the intercept 0 . Noting that the deterministic trend can be rewritten as +1 = + 1 with 0 = 0 , we may consider updating both the current level and the slope term by a random walk as follows: where is the random slope at the ith plot with 0 = 1 . Thus, on the ith plot, we have linear trend , and this is updated by a random increment 1 , that is, by a random walk, as we move to the (i+1)th plot. The intercept at the ith plot is = − , and hence +1 = − (1 − ) + 0 . Essentially, we have randomly varying regression lines, pieced together at the plots. Due to the update 0 , however, there are discontinuities at the plots (Figure 1). In the limiting case when 2 0 = 0, the trend is a polygon with breaks at the plots but no discontinuities. Interestingly, the model then is equivalent to +1 − 2 + −1 = 1 , which is a second-order random walk and involves second differences (Durbin & Koopman, 2001, p. 39). To see this, we may subtract the state equation 0 = − + −1 + −1 from +1 = + , yielding +1 = 2 − −1 + − −1 = 2 − −1 + 1, −1 . By comparison, the first-order random walk model for can be depicted as horizontal regression lines with jumps at the plots ( Figure A1). It is worth observing that both the local linear trend and F I G U R E A 1 Simulated trend for three state-space models. RW1, first-order random walk; RW2, second-order random walk; LLT, local linear trend; 0 = 0 = 0; 2 0 = 2 1 = 0.1 its limiting case with 2 0 = 0, which is equivalent to use of second differences, have the fixed trend component 0 + 1 , corresponding to the null space of P-splines with second differences penalty. Thus, the state-space view provides a very natural explanation why a fixed intercept as well as a fixed slope are needed with second differences.

A.3
Two-dimensional first differences Kempton et al. (1994) formulate a correlation structure that is separable between rows and columns. Their assumption is that first differences along rows and along variances have variances = 2 and = 2 , respectively, and that plot differences calculated across both rows and columns, that is, , − +1, − , +1 + +1, +1 , where , is the plot value in the ith row and jth column, have variance rc = (2 + )(2 + ) 2 . In what follows, we will relate this model to the LV⊗LV model (Piepho & Williams, 2010) and P-splines with = 1, = 1, = , and = . We may collect plot values for spatial trend into a random vector . Following Kempton et al. (1994), the first differences across rows can then be assumed to have variance In line with our P-spline approach, the variance-covariance matrix for may be assumed to take the form var ( ) = ⊗ 2 + ⊗ 2 + ⊗ 2 (A7) for suitable variance-covariance matrices and . This can also be expressed in terms an ANOVA-type decomposition = ⊗ 1 + 1 ⊗ + (A8) with var( ) = 2 , var( ) = 2 and var( ) = ⊗ 2 . We now need to find and such that = − and = − . For , for example, we can meet these assumptions using = + or = + + 0 0 , corresponding to our P-spline approach, but this does not yield the desired form of . One (but not the only) choice that will yield the desired form of is 2 = = ( − 1) − , where is a × matrix with (i 1 , i 2 )th element equal to | 1 − 2 |, corresponding to the LV⊗LV model (Piepho & Williams, 2010). To see this, we first use (A7) to find

A.4
Orthogonal rotations of the null-space eigenvectors for = 2 It is interesting to observe the connection of a diagonal variance-covariance model with rotation parameter with an unstructured model. Let Ω be a diagonal matrix