Genotype × environment interaction analysis of North American shrub willow yield trials confirms superior performance of triploid hybrids

Development of dedicated bioenergy crop production systems will require accurate yield estimates, which will be important for determining many of the associated environmental and economic impacts of their production. Shrub willow (Salix spp) is being promoted in areas of the USA and Canada due to its adaption to cool climates and wide genetic diversity available for breeding improvement. Willow breeding in North America is in an early stage, and selection of elite genotypes for commercialization will require testing across broad geographic regions to gain an understanding of how shrub willow interacts with the environment. We analyzed a dataset of first‐rotation shrub willow yields of 16 genotypes across 10 trial environments in the USA and Canada for genotype‐by‐environment interactions using the additive main effects and multiplicative interactions (AMMI) model. Mean genotype yields ranged from 5.22 to 8.58 oven‐dry Mg ha−1 yr−1. Analysis of the main effect of genotype showed that one round of breeding improved yields by as much as 20% over check cultivars and that triploid hybrids, most notably Salix viminalis × S. miyabeana, exhibited superior yields. We also found important variability in genotypic response to environments, which suggests specific adaptability could be exploited among 16 genotypes for yield gains. Strong positive correlations were found between environment main effects and AMMI parameters and growing environment temperatures. These findings demonstrate yield improvements are possible in one generation and will be important for developing cultivar recommendations and for future breeding efforts.


Introduction
If dedicated bioenergy crops are to play a significant role in climate change mitigation strategies, a clear understanding of yield potential on a regional basis must be established. Development and delivery of highyielding, well-adapted crops is a key underlying assumption in the estimation of bioenergy production capacity in the USA (U. S. Department of Energy, 2011). Crop management and breeding will play crucial roles in meeting the challenges of producing more biomass on limited land in a sustainable manner (Karp & Shield, 2008). Shrub willow (Salix spp.) grown in short rotation has shown promise as a viable, regionally based feedstock for marginal land due to its adaptability to cool, moist climates with short growing seasons (Kuzovkina & Quigley, 2005) and large potential for genetic improvement through breeding (Smart et al., 2005;. This substantial yet mostly underexploited genetic variability among taxa is expected to provide a basis for developing key traits desirable for sustainable bioenergy production, both through traditional breeding and with advanced molecular techniques (Karp et al., 2011).
Willow breeding for biomass production has been most extensively researched in the UK (Lindegaard & Barker, 1997) and Sweden (Larsson, 1998). A thorough account of the global breeding history of shrub willow is provided by Kuzovkina et al. (2008). In North America, shrub willow breeding efforts began at the University of Toronto in the early 1980s with collection of native species and exchange of plant material with the UK and European countries (Zsuffa, 1990), but efforts were focused largely on hybridizations between North American native species (Mosseler, 1990). In the USA, acquisition of plant material from collaborators in Canada, as well as China, Japan, New Zealand, Ukraine and Sweden provided the basis of a breeding program focused largely on novel interspecific hybridizations displaying heterosis for biomass yield traits (Kopp et al., 2001;Smart et al., 2005;Smart & Cameron, 2012). Polyploidy is common in Salix and novel triploid species hybrids have been linked to improved yields and biomass quality in early selection trials Serapiglia et al., 2014Serapiglia et al., , 2015. Cultivar development, however, is a multistage screening process requiring substantial investments in time and resources prior to commercialization (Hanley & Karp, 2014). An accurate evaluation of cultivar yield potentials is ultimately assessed through testing on multiple sites with a diverse range of environmental characteristics.
Biomass yield is an important factor determining the environmental and economic impacts associated with growing shrub willow. Life cycle analyses have shown that yield assumptions can significantly impact net energy ratios and greenhouse gas balances (Heller et al., 2003;Keoleian & Volk, 2005;Caputo et al., 2014), as well as economic returns on investment and production costs (Buchholz & Volk, 2011;Hauk et al., 2014). There have been numerous research trials conducted over the past two decades in regions of the northeastern USA and Central and Eastern Canada that have quantified yields. Some studies have analyzed the differential response of genotype to contrasting environmental conditions (Labrecque & Teodorescu, 2003;Wang & Macfarlane, 2012;Serapiglia et al., 2013;Mosseler et al., 2014a); however, the limited number of test sites and use of diverse cultivars of various levels of genetic improvement makes it difficult to generalize specific genotypic responses to larger growing regions. Others have summarized mean yields from multiple test sites across geo-graphical regions (Kiernan et al., 2003;Volk et al., 2011), but due to unequal representation of genotypes among trials, little insight into the genotypic contribution to yield variability can be gained. Furthermore, efforts to model yields across broad geographic ranges may use general yield estimates from obsolete cultivars or ones that may not be well adapted for a particular region (Walsh et al., 2003;Wang et al., 2015). Plant breeders and agronomists are therefore challenged with assessing genotypic sensitivity to certain edaphic and climatic conditions. This process is also important for assuring continued improvement within a breeding program and for providing a basis for recommending cultivars for broadscale production.
These recommendations become complicated when significant crossovers occur in genotype (GEN) yield rankings, as a response to contrasting environments (ENV). These genotype-by-environment (G9E) interactions are prominent and important phenomena in agriculture, which present both challenges and opportunities for plant breeders and agronomists. Selection and deployment of elite cultivars must be based on results of rigorous testing through coordinated multilocation trials, combined with appropriate statistical analyses for assessing the adaptability of genotypes and predicting performance in untested ENV (Annicchiarico, 2002). A long-standing theme in plant breeding is to focus the search for GEN that exhibit stable yields or broad adaptability across ENV in the targeted growing region. This concept was popularized by Finlay & Wilkinson (1963) and Eberhart & Russell (1966), who developed regression parameters that seek to identify superior yielding GEN that maintain stable performance across ENV. Selection on the basis of stability can help to minimize the complicating effects of G9E interactions, adding efficiency to the selection process by focusing resources on material that has the best promise for widespread optimal performance (Eberhart & Russell, 1966). Selection based on stability should also guard against a potentially detrimental tendency to select GEN based on greater yields in only favorable ENV (Simmonds, 1991;Annicchiarico, 2002). However, when the G9E component is strong and meaningful, there is also a counter argument that contrasting performance among GEN can be capitalized upon, and breeding efforts should focus on specific adaptation (Cooper & Hammer, 1996;Piepho & M€ ohring, 2005). Thus, G9E interactions are viewed as a useful and informative aspect of cultivar testing, and subdividing a growing region into smaller areas and targeting GEN at those areas can improve overall yields (Gauch & Zobel, 1997). This argument is reinforced by the fact that much of the world's crop production occurs on land that is less favorable than that where the crops were developed (Simmonds, 1991;Gauch & Zobel, 1997). As dedicated bioenergy crops are presumably targeted for marginal lands (Richards et al., 2014;Stoof et al., 2015), perhaps this concept is particularly relevant. The relative merits of these two perspectives, exploiting only broad adaptations or else both broad and specific adaptations, vary from case to case and depend substantially on the relative magnitudes of genotype main effects and G9E interaction effects.
We present an analysis of G9E interactions in firstrotation yields across a network of shrub willow yield trials in North America covering 16 GEN and 10 ENV. The objectives of this study were (1) to identify shrub willow genotypes with broad adaptability in biomass yield across target growing regions in North America, (2) analyze G9E interactions for identifying and characterizing specific adaptation of certain genotypes and (3) identify edaphic and climatic variables that are most closely associated with G9E interaction patterns. This represents the most comprehensive analysis of North American shrub willow yields to date and will serve as a basis for making cultivar-site matching recommendations and will inform future breeding efforts.

Breeding material and yield trial network
Foundational breeding material used in developing the GEN tested in these yield trials was obtained from the University of Toronto and from accessions of native and naturalized species collected in the northeastern USA and eastern Canada in the 1980s to mid-1990s (Kopp et al., 2001;. Crosses were performed between 1998 and 1999 at SUNY-ESF, and after initial family screening in field trials for biomass yield traits, a group of genetically diverse individuals were deployed in regional yield trials. Between 2005 and 2011, 23 trials were established mainly in the northeastern and Midwestern USA and contained between six and 30 genotypes. To provide an unbiased comparison of genotype yields, we restricted our analysis to 16 cultivars ( Table 1) that were all present in each of 10 environments ( Table 2). The yield trials were planted between 2006 and 2009 and hosted by institutions located in six US states and two Canadian provinces, and the cultivars have been placed into diversity groups based on pedigree. Trials were established and maintained generally in a consistent manner across sites according to a standardized protocol and methods followed those described in Serapiglia et al. (2013) and Volk et al. (2011). Conventional tillage was used to prepare the sites in either the fall or spring prior to planting, which generally occurred between May and June. All trials were planted by hand using dormant 25 cm cuttings sourced from nursery beds at the Tully Genetics Field Station of SUNY-ESF in Tully, NY. Trials were laid out in a double-row configuration with 1.52 m between double rows, 0.76 m within the double rows and 0.61 m between plants along the row, for a planting density of 14 400 plants ha À1 . Within each yield trial, genotype was the experimental treatment and the experimental units were plots consisting of three double rows, each 13 plants long, with the outer double rows serving as border rows. Genotypes were replicated four times in a randomized complete block design. Preemergence herbicides, generally oxyfluorfen (1.1 kg ai ha À1 ) and simazine (2.2 kg ai ha À1 ), were applied prior to budbreak, except at Boisbriand, QC, where no herbicide was used. Periodic Lower case letters indicates that these are subgroups of diversity group 6. *Diversity group refers to Species/Pedigree. †Ploidy level estimated by flow cytometry (Serapiglia et al., 2015). ‡Collected in Ontario Canada, but possibly the cultivated hybrid S. viminalis 9 (S. caprea 9 S. cinerea) (Stott, 1991). mechanical or spot chemical weed control was performed as needed. After the first year of growth, all stems were cut back during dormancy to promote coppice regrowth the following spring, at which time 112 kg N ha À1 was applied, normally as ammonium sulfate, except for the trial located in Saskatoon, SK, where no fertilizer was applied. Harvests were conducted three years postcoppice (4-year-old root systems) during dormancy, except at Brimley, MI, which was harvested in July at 3.5 years postcoppice due to extreme moisture conditions in the field. Two to four plants on each end of the middle double row of each plot were excluded from harvest measurements to avoid plot-toplot edge effects, resulting in 18-22 plants available for measurements. All stems from the 18-22 plants from the middle double row of each plot were cut with brush saws or cut and chipped with a mechanical harvester and weighed in a bin with weigh cells in the field. A subsample from each plot consisting of either whole stems or chips was collected, weighed fresh, dried at 65°C to a constant weight and reweighed to determine moisture content. Moisture content was used to calculate dry matter yield for each plot based on the area occupied by the harvested plants across a 3-year rotation. All yields reported here are expressed as oven-dried Mg ha À1 yr À1 . Survival data was also recorded on the harvested plants at the time of harvest, except Boisbriand, QC, where no survival data was collected, but survival was >90% for each plot (M. Labrecque, Personal observation) and Savoy, IL where survival was assessed in the middle of the second rotation, but most of the mortality occurred early in the first rotation (G. Kling, personal observation).

Site environmental characteristics
Daily temperature and precipitation data for all 4 years of the harvest cycle were obtained from weather stations nearest to each trial location with the most complete records using publically accessible databases, National Oceanic and Atmospheric Agency, National Centers for Environmental Information (NOAA NCEI, 2015) for US trials and Canadian National Climate Data (Environment Canada, 2015) for CA trials. Daily solar radiation estimates at a 1 o by 1 o grid scale were obtained from National Aeronautics and Space Adminstration, Prediction of Worldwide Energy Resource (NASA POWER, 2015). Soil samples were generally collected at the time of planting, or occasionally soon after harvest. Due to some differences between soil extraction methods, only pH (1 : 1 soil/water by weight) and % organic matter (typically by loss on ignition) were considered for data analysis.

Dataset refinement
Initially, our dataset was comprised of 640 independent observations where the harvested plot was the experimental unit. While overall mean survival was >90% across the yield trial network, some trials and individual plots experienced greater mortality. Two trials, Escanaba, MI, and Saskatoon, SK, had mean survival values below 80%. Damage from herbicide drift appeared to be the main cause of mortality in Saskatchewan (Amichev et al., 2015), while at Escanaba mortality could not be associated with any particular issue (R. Miller, personal observation). In total, 39 independent observations (experimental units) with >65% mortality at the time of harvest were removed from the original dataset of 640 observations. Data were also inspected for extreme moisture content values, which can impact dry matter yield calculations. Two likely sources of error were 1) premature removal of samples from the drying oven, resulting in excessively low moisture content estimates, and 2) extraneous moisture contamination at harvest during wet conditions, leading to excessively high moisture values. Moisture values that were greater or less than three standard deviations of the mean across all samples were considered outliers and were removed from the dataset. To retain a yield value for the particular observation, the mean of the remaining three replicates was applied to the plot fresh weight of the outlying moisture content. This was performed for 15 samples in total from four of nine trials, which represented 2.3% of the total observations considered in this analysis. For the Boisbriand, QC trial, a single wood sample was collected for each cultivar and the moisture content was applied to all four replicates of that cultivar. The mean moisture content of these samples was reported at 32.7% (SD 2.90%), and it was assumed that all samples had not dried sufficiently, considering the overall mean moisture content across all other trials was 46.9% (SD 4.77%). A correction factor equaling the difference between these two mean values was added to each of the observations originally reported for the Boisbriand trial. This value was 14.2%, which was also very close to the value of 3 standard deviations of the overall mean (14.3%). After accounting for survival and moisture content adjustments, 601 observations were available for analysis from the original 640 from 16 GEN in 10 ENV.

Statistical analysis
We chose to analyze G9E interactions in our first-rotation yield dataset using the additive main effects and multiplicative interactions (AMMI) model. The AMMI model is a combination of analysis of variance (ANOVA) and principal components analysis (PCA), where the G9E interaction, contained in the residual of the additive GEN and ENV main effects model, is subjected to PCA and the interaction sum of squares (SS) are partitioned into a series of interaction principal component (IPC) axes, where IPC1 is the first interaction PC axis, and so on. The AMMI0 model indicates that no IPC axes are included and the model is equivalent to the additive (main effects only) ANOVA. AMMI1 includes the first IPC axis, and so on, while AMMIF is the full model that includes all axes and is equal to the raw data. The maximum number of IPC axes for a given dataset is one less the minimum number of GEN or ENV (Gauch, 1992;p. 85), but typically the majority of the interaction signal is captured in the first few IPC axes, with later IPC axes containing increasing amounts of interaction noise, and decreasing amounts of signal (Gauch, 2012). Therefore, a more parsimonious model containing a lower number of axes is favored and the remaining axes are relegated to a pooled residual. The statistical significance of each axis is often assessed with an F-test where the degrees of freedom (DF) for each axis are calculated according to Gollob (1968) (see also Gauch, 1988). More recently, Piepho (1995) demonstrated that the traditional F-test can be too liberal and suggested a more conservative F R -test for determining the significance of each IPC axis. Also, Gauch (1992, p. 147) suggested a simple test for model diagnosis and selection, which involves estimation of the G9E noise SS by multiplying the G9E DF by the mean square error. Consequently, the signal G9E SS can be estimated by subtracting the G9E noise SS from the G9E total SS. Therefore, IPC axes can be assessed by the amount of G9E signal SS that are recovered, instead of the total G9E SS. Selection of higher order models that include greater numbers of axes must be weighed not only in terms of statistical significance, but also in terms of practicality, because higher order AMMI models and especially the noisy AMMIF tend to have a large roster of genotypes that win in at least one environment, and hence, such models produce an unmanageable number of mega-environments (Gauch, 2013). Parsimonious models are often preferred, and AMMI models are most useful when the multiplicative terms have agricultural interpretability (Gauch, 2013). Equation 1 gives the general form of the AMMI model: where Y ger is the yield of the gth genotype in the eth environment for the rth replicate, l is the grand mean, a g is the genotype g mean deviation (genotype mean minus grand mean), b e is the environment e mean deviation, k n is the singular value for nth IPCA axis, c gn is the genotype g eigenvector value for IPCA axis n, d en is the environment e eigenvector value for IPCA axis n, q ge is the residual, and e ger is the experimental error. The AMMI analysis was performed using MATMODEL V3.0, an open source statistical program designed specifically for the analysis of G9E interactions (Gauch & Furnas, 1991;Gauch, 2007). MATMODEL performs the combined ANOVA/ PCA and also delineates mega-environments, which allows for the exploration of specific adaptation to particular edaphic or climatic conditions (Gauch & Zobel, 1997).
MATMODEL does not analyze the experimental design; hence, we first analyzed the experimental design using PROC MIXED in SAS â version 9.4 (SAS Institute Inc., 2013), with block nested within ENV, and coded as a random effect. The main effects of GEN and ENV and the G9E interaction term were considered fixed effects. The TYPE3 option in PROC MIXED was invoked to obtain expected SS for the fixed effects. Tukey's studentized range (HSD) post hoc test was performed for means separation among GEN. The least squared means from the SAS output were then supplied to MATMODEL to perform the PCA of the interaction and for the mega-environment analysis. The random error variance from the SAS PROC MIXED output was used to calculate the F-tests in the AMMI analysis. Finally, the environment main effects and IPC scores were used in correlation analyses with the various environmental variables using PROC CORR in SAS â version 9.4.

ANOVA and genotype main effects
The grand mean for first-rotation yields for this base case dataset based on the mixed model analysis of vari-ance was 6.96 Mg ha À1 yr À1 . The main effects of ENV and GEN and the G9E interaction were all highly significant (P < 0.0001; Table 3) and accounted for 82, 6 and 12% of treatment SS, respectively. The blocking effect (nested within ENV) accounted for a relatively large amount of random error variance (Table 3).
Although the main effect of genotype accounted for only 6% of the treatment SS, it contains important information about patterns of broad adaptation. The cultivar Salix viminalis 9 S. miyabeana 'Fabius', a triploid hybrid in diversity group 8 (DG8) was the overall greatest yielding GEN across the 10 ENV tested, with a mean yield of 8.58 Mg ha À1 yr À1 (Table 4). Two other Salix viminalis 9 S. miyabeana triploid hybrids in DG8, 'Otisco' and 'Tully Champion', ranked 2nd and 4th in overall yield, respectively (Table 4, Fig. 1a). Two of the top five GEN were tetraploid S. miyabeana (DG5), including 'Canastota' a progeny selected from a cross between 'SX61' and 'SX64'. Two triploid S. purpurea 9 S. miyabeana cultivars from DG9, 'Oneida' and 'Millbrook', were ranked 6th and 7th overall, respectively (Fig. 1a). Triploids showed the greatest relative yields (GEN yield/ENV mean yield averaged over all ENV), with the exception of two cultivars, 'Truxton' and 'Owasco', which performed at or below ENV means (Fig. 1b). The cultivars S. miyabeana 'SX61', 'SX64' and S. 9 dasyclados 'SV1' were cultivars that showed promise in earlier trials, and subsequently were used as check cultivars in this yield trial network. The top two improved cultivars, 'Fabius' and 'Otisco', per-formed better than all three check cultivars, and four other improved cultivars performed better than the check clone mean yield of 7.14 Mg ha À1 yr À1 . The GEN with the lowest mean yields were diploid S. eriocephala 'S25' (DG4), a North American native willow species, and two hybrid cultivars of S. koriyanagi 9 S. purpurea 'Onondaga' and 'Allegany' (DG6b). ENV mean yields ranged from 2.57 to 11.3 Mg ha À1 yr À1 , with the greatest yields in eastern USA and Canada, and the lowest yields occurring in the Upper Peninsula of MI, USA, and Saskatoon, SK, CA.

AMMI analysis
The AMMI2 model ANOVA revealed that the main effects of GEN and ENV plus the first two IPC axes accounted for 95.5% of the treatment SS with only 68 (of 159) treatment DF (Table 5). Using the error variance obtained from the SAS PROC MIXED output (Table 3), both the F-tests and the more conservative F R -tests proved IPC1 and IPC2 to be significant (P < 0.0002); however, all subsequent axes were not significant and were therefore pooled into a discarded residual. The G9E interaction term is inherently a mixture of true signal in the data (i.e., systematic rank changes amount GEN) and noise. The amount of G9E noise can be estimated by multiplying the G9E DF by the mean square error (Gauch, 2013). Using the mean square error variance obtained from the SAS PROC MIXED output (Table 3), the G9E noise SS for this dataset was estimated to be 373 (46%), while the G9E signal accounts for 438.5 (54%) of the total G9E SS. IPC1 captured 291 SS, or 66.4% of the G9E signal SS. The cumulative SS accounted for by the first two IPC axes is 507, which is greater than the estimated G9E signal SS, suggesting that IPC2 contains a combination of mostly signal and some noise. Therefore, AMMI1 does a reasonable job of capturing the majority of the G9E signal in a parsimonious model. However, in terms of accurately describing our data, AMMI2 might be slightly better, although the more important consideration is that AMMI1 (and AMMI2) is considerably more accurate than the raw data AMMIF as its additional IPC3 and higher components (which are combined in the residual in Table 5) capture a SS of 304, which is mostly noise. A useful tool for interpreting the AMMI analysis is the AMMI1 biplot, where the additive main effects of GEN and ENV are plotted on the x-axis in units of yield (Mg ha À1 yr À1 ), and the IPC1 scores are plotted on the y-axis, which have units expressed as the square root of the yield (Gauch, 1992;p. 85). By representing both the main effects and the majority of the interaction signal in one projection, the AMMI1 biplot captures 92.3% of the treatment SS, and the relationships between main effects and interaction can be observed simultaneously (Fig. 2). The vertical reference line represents the grand mean yield of the dataset, and the horizontal line is placed at zero for the IPC scores, where points farther away from this line indicate GEN or ENV with larger interactions. Genotypes that occur close to the horizontal line can be regarded as having relatively stable yields across ENV. Genotypes and ENV having the same sign for their IPC1 scores have positive interactions, whereas opposite signs give negative interactions. For instance, 'Fabius' was the highest yielding GEN, and consequently, it is farthest along the right side of the x-axis. It also had one of the largest IPC1 scores, generating a positive interaction in Storrs, CT, but a negative interaction in Skandia, MI. Indeed, the lowest yield for 'Fabius' of 1.95 Mg ha À1 yr À1 occurred at that location, in comparison with Storrs, CT, where 'Fabius' yielded 13.27 Mg ha À1 yr À1 (Table 4). The IPC1 scores for 'SV1'  Table 1. Bars in (a) with different letters indicate significant (P < 0.05) differences according to Tukey's studentized range test (HSD). Error bars in (a) are AE one standard error of the mean. Proportions in (b) were calculated as the yield of a genotype divided by the environment mean, averaged across the 10 environments. and 'Tully Champion' are similar and were the most extreme negative scores out of the 16 GEN, so their interaction patterns are the opposite of 'Fabius'. However, 'Tully Champion' has a substantially greater mean yield compared to 'SV1', and it was the top-yielding cultivar in many low-yielding ENV (Table 4).
When IPC2 is significant (Table 5), the AMMI2 biplot can be used to investigate the interaction pattern in IPC1 and IPC2 together (Fig. 3). The reference lines in Fig. 3 are drawn through zero for both axes. The crossing of these two lines in the middle of the graph indicates no interaction, and therefore, a GEN close to this point would be characterized as having stable yields across ENV, and for this dataset, the triploid 'Otisco' and the tetraploids 'SX61' and 'SX64' are closest to the origin. 'Fabius' and 'Tully Champion are on the opposite ends of IPC1 scores, but have very similar IPC2 scores. In contrast, 'Onondaga' and 'Canastota' were on the opposite extremes of IPC2 scores, with 'Canastota' having a strong positive interaction with Boisbriand, QC in terms of IPC2, as their IPC scores are of the same sign. Unlike Fig. 2, there is no information about main effects in Fig. 3, only interactions, but it is useful for understanding the relationships between AMMI1 and AMMI2.

Mega-environment delineations
Another useful graphical representation of AMMI1 analysis is a plot of the AMMI1 nominal yields (AMMI1  model yields for each GEN, without the ENV deviation, see equation 1), against the ENV IPC1 scores (Gauch & Zobel, 1997). Figure 4 shows nominal yields for the 16 GEN across the 10 ENV plotted as straight lines, and the vertical positions of the lines relative to one another at a given ENV IPC1 score indicate the yield rankings. This plot is very useful for visualizing GEN crossovers, and for ease of interpretation, only the most relevant GEN have been highlighted and their names provided. For instance, 'Fabius' had the greatest yields in five   The degree of environmental sensitivity of a GEN can also be inferred by observing the slopes in Fig. 4. 'Fabius' and 'Tully Champion' have steep, but opposite slopes due to their opposite interactions. 'Otisco' was ranked 2nd in overall yield and was relatively insensitive to ENV, as evidenced by its near zero slope (Fig. 4). Also, similarities among GEN can be observed where slopes are nearly parallel. The line just below 'Fabius' on the right side of Fig. 4 is that of 'Oneida', which is similar to 'Fabius', suggesting that they reacted to ENV similarly. However, the mean yields of 'Fabius' were considerably greater than 'Oneida', making 'Fabius' an obvious superior choice. Similarly, 'SV1' had an interaction pattern that mirrored 'Tully Champion', but again, the predicted AMMI1 yields were lower for 'SV1'.
AMMI can also be used to delineate mega-environments, which are defined as subdivisions of a crop's potential growing range that share similar genotype winners, presumably due to similar biotic and abiotic stresses, but that are not necessarily contiguous (Gauch & Zobel, 1997;Yan et al., 2000). Subdivision of a crop's growing region can help to improve yields by targeting GEN with specific adaptation to ENV where they are likely to perform best (Gauch & Zobel, 1997), especially if there are identifiable environmental or biological patterns associated with the IPC analysis that can be extended to ENV beyond those where crop yields have been tested (Gauch, 2013). MATMODEL uses key switch points in nominal yields among winning GEN across IPC1 scores to delineate mega-environments. For this shrub willow yield dataset, the main switch point for the AMMI1 model is clearly illustrated in Fig. 4, where 'Fabius' and 'Tully Champion' switch between first and second rank at a value of about À0.44 along the ENV IPC1 range. Therefore, AMMI1 defines two winners which happen to divide the 10 ENV equally between them, where 'Fabius' is declared the overall winner, but 'Tully Champion' is superior in the lower yielding ENV.
As the AMMI analysis consists of a family of models, with higher order models incorporating greater numbers of interaction components, higher order models generally result in an increased number of mega-environments and winners (Gauch, 2013). To illustrate this point, Table 6 shows the top five rankings for three AMMI model family members. The ENV are listed by IPC1 scores. AMMI1 divides the growing region into two mega-environments as described above, with only two-first place winners, 'Fabius' and 'Tully Champion'. 'Fabius' takes second place in most ENV where 'Tully Champion' was ranked first. 'Oneida' ranks behind 'Fabius' in more favorable ENV, while 'Otisco' ranks third in many of the less favorable ENV. The main differences between AMMI1 and AMMI2 are that Boisbriand, QC is declared as a separate mega-environment with 'Canastota' the top-ranking GEN, and to some degree, 'Fabius' is relegated to lower ranks in the less favorable ENV. The AMMIF model, which is equivalent to the raw data, is also included for comparisons. *ENV, Environment. Names are truncated to the first four letters. Full names can be found in Table 2. †Genotype names are truncated to four letters, with full names presented in Table 1. The overall top-yielding cultivar, 'Fabius' is underlined throughout the table, and the second mega-environment winner from AMMI1, 'Tully Champion' is italicized throughout the table.
Should the full data be accepted as most accurate, it would result in a very complex array of mega-environments with seven different cultivars being declared winners in seven narrowly defined subregions. AMMI1 with its simplified mega-environment winner pattern is likely best for making cultivar selections far less complicated. AMMI2 is likely the most accurate choice for modeling overall genotypic performance in this dataset, but the inclusion of one winner in one ENV complicates recommendations and justification for its inclusion would likely need to be verified with more testing.

Edaphic and climatic variables
The AMMI analysis of the G9E interaction operates solely on yield data, with an implicit interpretation that general patterns in yields are a reflection of underlying environmental characteristics of test locations. However, it may be desirable to associate yield patterns with known environmental characteristics, and IPC scores can be used in simple correlations when environmental variables for test locations are available (Van Eeuwijk, 1995). ENV yield was significantly (P < 0.05) correlated with mean annual growing degree-days and marginally significantly (P < 0.10) associated with longitude and growing season precipitation (Table 7). ENV IPC1 scores were also significantly correlated with growing degree-days, mean annual minimum temperatures and latitude (Table 7), which would covary with each other, but suggest a strong relationship between interaction patterns and temperature. IPC2 scores were positively correlated with elevation (Table 7).

Expanded datasets and analyses
As noted earlier, our yield trial network contained more GEN and test ENV than those included in the above AMMI analysis (601 observations of~1530 available). However, the MATMODEL program allows for imputation of missing treatment cells in the two-way table of means using the expectation maximization (EM) algorithm (Gauch & Zobel, 1990). To more fully explore the yield database, two additional datasets were assembled that expanded the original complete set of 16 GEN in 10 ENV. The first dataset had an additional 6 GEN that included 10 missing treatments (4.5%) of a possible 220, for a total of 22 GEN in 10 ENV. The second dataset added 6 ENV with 15 missing treatments (5.9%) of 256. It has recently been shown that using the EM method for data imputation with missing data proportions of 5-10% do not significantly reduce the predictive ability of the AMMI model, especially when a small number of IPC axes (one or two) are chosen (Rodrigues et al., 2011;Paderewski & Rodrigues, 2014). The ANOVA tables and mega-environment winners for the two expanded datasets are provided in Table S1. The ANOVA for the two expanded datasets show greater overall variance in the treatments, as expected, but the G9E signal SS were similar at 38.7% for the increased GEN and 36.7% for the increased ENV scenarios (Table S1). The IPC1 axes accounted for 82.4 and 76% of the G9E signal SS for the expanded GEN and expanded ENV scenarios, respectively. Interestingly, the mega-environment analyses remain remarkably similar to those of the base case scenario (Table S2). Even with six additional GEN, 'Fabius' remains the winner in six ENV and 'Tully Champion' is the winner in four of the lowest yielding ENV. When six ENV were added for a total of 16, 'Fabius' wins in seven and 'Tully Champion' in eight (Table S2). 'SV1' wins in one environment, Belleville, NY, but 'Tully Champion' is a close second, and often combining a small mega-environment with a larger one occupied by a close winner is justified (Gauch, 2013). The addition of these two scenarios reinforces the superiority of 'Fabius' and 'Tully Champion' relative to the cultivars tested in the analysis. Incidentally, MATMODEL can also provide the Finlay-Wilkinson joint regressions analysis (Zobel et al., 1988;Gauch, 2007), which we applied to the original dataset of 16 GEN in 10 ENV. The joint regression analysis only accounted for 23.2% of the G9E SS, compared to 35.9% for AMMI IPC1 (Table S3). Based on GEN slopes, the joint regression analysis of the GEN with the best combination of stable slopes and greater yields would be as 'Otisco' and 'SX64', with the high yields and slopes slightly above 1, and 'Tully Champion' and 'Oneida' with somewhat lower yields and slopes slightly less than 1 (Table S3). These results are somewhat in agreement with AMMI, but selection based on stability would likely rule out the potential gains from including 'Fabius' in optimal locations.

Discussion
Our analysis of this North American shrub willow yield trial network data has demonstrated the importance of G9E interactions in short-rotation woody crop productivity. Our evaluation of first-rotation yields using the AMMI model has accomplished the main objectives of the study, which were to identify GEN with broad adaptation, to define subregions where particular GEN exhibit specific adaptation and to identify environmental variables that help to explain patterns in yield. Despite the proven success of AMMI in accurately diagnosing G9E interactions from a range of agronomic and horticultural crops, we have not seen any other published reports of its use in short-rotation coppice research. AMMI helped to resolve signal from noise, and we selected the more parsimonious model, AMMI1, as the best representation of the data, simplifying the mega-environment analysis with just two winning GEN, 'Fabius' and 'Tully Champion', as opposed to the raw data with seven of the 16 GEN in our dataset winning in at least one environment (Table 6). In further support of the AMMI1 analysis winners, inclusion of additional GEN or additional test locations did not change the mega-environment analysis outcomes for AMMI1 (Table S1).
'Fabius' was the top-yielding cultivar with an overall mean of 8.58 Mg ha À1 yr À1 , setting a standard for optimal yields and broad adaptability. Also, because of its positive interaction with high-yielding ENV, it is well adapted to favorable growing conditions. 'Tully Champion' had an opposite pattern of interaction and appears to be adapted to less desirable growing conditions. 'Tully Champion' and 'SV1' interacted with the test ENV similarly (Fig. 4), but 'Tully Champion' consistently outperformed 'SV1' make it a much better check cultivar for evaluating future breeding material. Similarly, 'Otisco' and 'SX64' had fairly stable yields, showing little interaction with environment regardless of quality, but 'Otisco' was ranked second in overall mean yield. All three of these superior cultivars ('Fabius', 'Tully Champion' and 'Otisco') are of the triploid pedigree of S. viminalis 9 S. miyabeana, and they outperformed the mean of the check cultivars by 5 to 20%, demonstrating that yield gains can be achieved through novel hybridization in shrub willow. This affirms the findings of Serapiglia et al. (2014) for early selections of triploids on a single site, but extends it more broadly across a large set of diverse ENV.
It should be noted that 'Fabius' and 'Tully Champion' are siblings from the same cross, which makes their divergent patterns in yield across our sites rather remarkable. These two were selected after an evaluation of the family in a replicated trial at a single location in central NY . There was considerable variability in growth potential across this family, and clearly the single test location was not adequate to detect differiential response to broad environmental conditions. The cultivar with the most stable yields in our dataset, 'Otisco', shares the same S. viminalis mother, 'SV2', with 'Fabius' and 'Tully Champion' (Table 1). 'Owasco' and 'Truxton' are also siblings that belong to the Salix viminalis 9 S. miyabeana pedigree (DG8). They share the same father with 'Otisco', 'SX64', but their overall yields were much lower, suggesting superior combining ability of 'SV2' for yield traits. This suggests that interspecific triploid hybridization in Salix can confer substantial gains in yield, and future breeding efforts will continue to exploit this potential.
By correlating environment mean yields and IPC scores with edaphic and climatic variables from the test sites, we were able to demonstrate that increased temperatures were positively correlated with both mean yields and IPC1 scores, while the correlation between precipitation and mean yield was marginally significant. We have observed that the triploid hybrids tend to retain a leaf canopy much later in the season compared to other pedigrees. This could be an indication of lower sensitivity to frost or an ability to exploit solar resources at the tail end of the growing season. Labrecque & Teodorescu (2003) examined yields of two willow species at two contrasting sites in southern Qu ebec and suggested that growth in that region may be limited by precipitation, but not likely growing season temperatures, based on reported optimal growing conditions in Sweden. Wang & Macfarlane (2012) analyzed poplar (Populus spp.) and shrub willow yields of cultivars sourced from Ontario and New York collections at two locations in Michigan and estimated that the difference in growing degree-days between northern and southern locations could explain 60% of the variation in yield. In the UK, Aylott et al. (2008) analyzed yields of three willow GEN of differing pedigrees in relation to edaphic and climatic variables from 49 locations. In general, temporal rainfall patterns were found to be the significant factors affecting yields, but the response varied by GEN, with one being particularly sensitive to soil pH. We were unable to find significant correlations with soil pH or organic matter; however, because of the broad geographic range covered in this analysis, perhaps the greatest amount of variability among the test sites was dominated by climatic factors. The subject warrants additional investigation and testing in locations with higher temperatures and moderate rainfall may reveal a greater geographic production range than previously proposed (Walsh et al., 2003).
G9E interactions have been reported elsewhere in recent studies examining willow yields in multiple locations. In eastern Canada, Mosseler et al. (2014a) studied biomass traits of multiple clones from natural accessions of two species, S. discolor and S. eriocephala, planted on three sites of contrasting fertility. They found no significant rank changes between the two species groups, but large differences in genotypic sensitivities among the GEN within species, suggesting a high degree of intraspecific genetic variability in adaptation that could be exploited for biomass yield improvement. In Denmark, Larsen et al. (2014) studied yields of Swedish and UK commercial cultivars across five locations and reported significant G9E interactions, but there was one clear winner, 'Tordis' (S. viminalis 9 S. schwerinii), with a mean yield of 6.7 Mg ha À1 yr À1 . In earlier work involving numerous GEN of three willow species across a large number of sites in Sweden, R€ onnberg-W€ astljung & Thors en (1988) used a variant of the Finlay-Wilkinson regression to analyze G9E interactions. As in our analysis, linear regressions did not always adequately explain clonal response to ENV. The authors identified stable GEN using regression coefficients and mean yields, but they did not address specific adaptability. In poplar, Zalesny et al. (2009) performed an extensive analysis of 53-79 GEN across four sites and multiple ages in the upper Midwestern USA. They assessed genotypic rankings and segregated GEN based on generalists (consistently high rankings) and specialist (especially variable rankings). More recent attempts to analyze G9E interactions in poplar have incorporated some of the more contemporary statistical techniques used in plant breeding and agronomy. In an analysis of first-year growth of nine GEN in five locations in Spain, Sixto et al. (2011) applied the genotype and genotype x environment (GGE) biplot (Yan et al., 2000), which has some similarities with AMMI biplots and is used to visualize interactions (Gauch, 2006a;Gauch et al., 2008). More recently, Sixto et al. (2014) analyzed biomass production at the end of a 3-year rotation for nine GEN in four of the locations from their previous study. They applied mixed model variants of stability parameter models including Finlay-Wilkinson and Eberhart-Russell and identified patterns of broad and specific adaptation among GEN and pedigrees. These mixed model approaches often consider ENV as random and have the advantage of handling unbalanced datasets (Smith et al., 2005).
The importance of developing cultivars with specific adaptation can be emphasized when willow cultivation is placed into broader biological and geographical contexts. In the UK and Swedish breeding programs, improved S. viminalis and hybrids such as S. viminalis 9 S. schwerinii are highly productive and have demonstrated superior yields in numerous trials compared to other pedigrees (Aylott et al., 2008;Lindegaard et al., 2011;Larsen et al., 2014). In the USA, potato leafhopper (Empoasca fabae Harris) has proven to be extremely damaging to imported European S. viminalis cultivars . Hybrid crosses of S. viminalis with S. miyabeana have introduced varying degrees of resistance to potato leafhopper, in addition to improved yields. We have observed more severe damage to 'Tully Champion' compared with 'Fabius' in NY locations , which may have limited its yield potential on sites of more southern latitude and more eastern longitude. Potentially lower potato leafhopper pressure in northerly ENV may explain the superiority of 'Tully Champion' and 'SV1' at those sites. In Canada, S. eriocephala and S. discolor have been evaluated as North American native species appropriate for biomass and phytoremediation applications (Mosseler et al., 2014a,b). However, testing in the USA has shown that natural accessions as well as improved cultivars of S. eriocephala can be highly susceptible to leaf rust (Melampsora spp) Serapiglia et al., 2013). The improved S. eriocephala cultivar 'S25' in our analysis had the lowest overall yield, and it is possible that rust infection in combination with deer browse contributed to low performance. In Europe, Melampsora rust is a major growth-limiting pathogen for multiple willow species and species hybrids ( Ahman, 1998;McCracken & Dawson, 2003;Aylott et al., 2008). Salix purpurea and hybrid S. koriyanagi 9 S. purpurea cultivars are also susceptible to rust, but based on visual observations in the field and the yields of 'Oneida' and 'Millbrook' in this analysis, the S. purpurea 9 S. miyabeana diversity group likely has increased rust resistance. Breeding for rust resistance is an important focus in the USA given the European experience.
Inclusion of some ENV in this analysis caused overall yields to be lower compared to other studies, but collectively they provided strong contrasts for discriminating among GEN. In turn, this analysis likely provides a more realistic assessment of yield potentials on marginal lands, given some of the exceptionally low-yielding ENV. As trials are often performed at experiment stations, where growing conditions are optimized or at least less variable, it may seem obvious that many reported yields are often on the high end and they may not be representative (Simmonds, 1991). This becomes even more relevant in the realm of dedicated bioenergy crops, as the premise is that these are purposed for marginal lands; more specifically, where the economic returns are marginal for production of traditional field crops (Stoof et al., 2015), and thus, there is reduced competition between food and energy production. Evaluations of shrub willow yield potentials in the USA have been largely geographically restricted to regions where institutional knowledge exists, and the true extent of production potential has likely not be adequately tested (Walsh et al., 2003). Given the positive correlations of yield and IPC1 scores with factors relating to increased temperature, perhaps growing willow at lower latitudes will produce greater yields. But genotypic variation in water-use efficiency and drought resistance will be important and may help to inform breeding for improved adaptation to warmer climates (Bonosi et al., 2013).
This study has demonstrated the importance of identifying genotypic adaptability for developing cultivar recommendations and guiding future breeding efforts. Genotypes with high yields but varying sensitivities to environmental conditions have been identified as important check cultivars for testing the next generation of promising genotypes, which is currently underway. More work is needed in exploring the underlying environmental causes for the observed yields, and this will be the focus of a future analysis, which will incorporate an expanded dataset of North American willow yields.

Supporting Information
Additional Supporting Information may be found in the online version of this article: Table S1. Additive Main effects and Multiplicative Interactions (AMMI) ANOVA tables including the first IPC axis for six additional genotypes and six additional environments. Table S2. Additive Main effects and Multiplicative Interactions (AMMI) mega-environment analysis for 6 additional genotypes and 6 additional environments. Table S3. Joint regressions ANOVA table and regression coefficients for 16 shrub willow genotypes in 10 environments.