Association genetics in Corymbia citriodora subsp. variegata identifies single nucleotide polymorphisms affecting wood growth and cellulosic pulp yield


Author for correspondence:
Shannon Dillon
Tel: +61 2 62464834


  • Wood is an important biological resource which contributes to nutrient and hydrology cycles through ecosystems, and provides structural support at the plant level. Thousands of genes are involved in wood development, yet their effects on phenotype are not well understood. We have exploited the low genomic linkage disequilibrium (LD) and abundant phenotypic variation of forest trees to explore allelic diversity underlying wood traits in an association study.
  • Candidate gene allelic diversity was modelled against quantitative variation to identify SNPs influencing wood properties, growth and disease resistance across three populations of Corymbia citriodora subsp. variegata, a forest tree of eastern Australia.
  • Nine single nucleotide polymorphism (SNP) associations from six genes were identified in a discovery population (833 individuals). Associations were subsequently tested in two smaller populations (130–160 individuals), ‘validating’ our findings in three cases for actin 7 (ACT7) and COP1 interacting protein 7 (CIP7).
  • The results imply a functional role for these genes in mediating wood chemical composition and growth, respectively. A flip in the effect of ACT7 on pulp yield between populations suggests gene by environment interactions are at play. Existing evidence of gene function lends strength to the observed associations, and in the case of CIP7 supports a role in cortical photosynthesis.


The development of woody tissue, which is comprised largely of the secondary walls of xylem cells, has been fundamental to the evolution of land plants, most notably trees (Spicer & Groover, 2010). Trees are foundation species in ecosystems that cover vast areas of the earth’s surface, and the formation of wood contributes substantially to primary production across climatically diverse environments (Petit & Hampe, 2006). The physiological and ecological importance of wood is reflected in the abundance of natural variation for wood traits among and within tree species and populations, arising from both genetic and environmental factors. Wood is a complex structure and thousands of genes have been identified as being important to its development (Allona et al., 1998; Paux et al., 2005; Pavy et al., 2005; Cato et al., 2006; Yuan et al., 2007; Qiu et al., 2008; Li et al., 2009), yet the specific genetic variants resulting in variation in wood phenotypes are only starting to be understood.

Quantifying genetic variation causal to phenotypic differences has traditionally been achieved by characterizing quantitative trait loci (QTLs) via pedigree based mapping (Grattapaglia & Kirst, 2008). However, the markers identified via QTL studies typically have low resolution with the causative genetic variant, because linkage disequilibrium (LD) in pedigrees is inherently high. Consequently, QTLs have had limited application in breeding and gene functional studies. Genomic selection (GS) is a recently pioneered approach which uses genome scale molecular data to estimate breeding values for trait selection (Hayes et al., 2009). GS has proved to be highly advantageous in breeding programmes in diverse species including trees (Grattapaglia & Resende, 2011; Resende et al., 2012), as it potentially captures all available QTLs, and hence a large proportion of the variance for a given trait. However, this approach similarly relies on populations with extended LD. As such, resolution between the marker and gene is likely to be low, and linkage with the QTL can be lost through recombination in advanced generations. For the same reason, markers identified by GS are often not transferable between populations with different genetic backgrounds (Resende et al., 2012).

LD (or association) mapping applies populations of unrelated individuals to map gene variants (SNPs) that effect phenotype, or are closely linked to the causative variant (Neale & Savolainen, 2004). This is achieved via statistical inference of co-segregation of genotype and phenotype data (Oraguzie et al., 2007). The approach has been applied widely in plants and animals and is well suited to forest tree populations, many of which exhibit abundant genetic diversity and rapidly decaying genome-wide LD (Neale & Savolainen, 2004; Thumma et al., 2005; Külheim et al., 2009). Low LD demands the use of dense SNP marker sets, which in trees have been practicably constructed using a candidate gene approach, resulting in tight linkage of the marker and qualitative trait nucleotide (QTN) (Neale & Savolainen, 2004). This is advantageous for breeding application, as the predictive ability of a marker will be robust through generations and across populations. High-resolution mapping via association studies also provides important insights into gene function. Confirmation of association in one or more additional populations, or ‘validation’, is practised commonly in human association studies (Hinohara et al., 2009; Pasche & Yi, 2010; Konig, 2011), but has to date been reported in only a few studies in plants (Thumma et al., 2005, 2009; Dillon et al., 2010). This approach is a valuable tool in studies where false positive rates as a result of multiple testing, or other biases, are expected to be high, even when statistical corrections are applied (Greene et al., 2009).

To date, SNPs affecting wood properties have been identified in several tree species using this approach. The cinnamoyl CoA reductase gene in Eucalyptus nitens (Thumma et al., 2005) and an MYB transcription factor and pectin methyltransferase in Eucalyptus pilularis (Sexton et al., 2010, 2011) were shown to influence variation in wood quality traits including cellulose microfibril angle, wood collapse and radial shrinkage. Recently, an SNP in a cobra-like gene in E. nitens was found to be part of a cis-acting regulatory element that is directly associated with cellulose content and pulp yield (Thumma et al., 2009). In conifers, SNPs from five cell wall genes have been shown to influence variation in growth, density, microfibril angle (MFA), early wood specific gravity and late wood proportion in Pinus taeda (Yu et al., 2006; Gonzalez-Martinez et al., 2007); two lignin biosynthetic genes, phenylalanine ammonia lyase and phenylcoumaran benzylic ether reductase, were associated with wood density in multiple populations of radiata pine (Pinus radiata) (Dillon et al., 2010); and in a study of over 500 genes in Picea glauca, 13 cell wall genes were associated with nine wood traits (Beaulieu et al., 2011). Recent advances have also been made in Populus, where SNPs from 11 cell wall genes were associated with lignocellulosic traits in Populus trichocarpa (Wegrzyn et al., 2010).

In the present study, we extend the investigation of wood developmental genetics into the genus Corymbia. The spotted gum, Corymbia citriodora subsp. variegata (CCV), is a large tree (growing up to 50 m) that is locally abundant throughout its range. It occurs naturally along the subtropical Australian coast east of the Great Dividing Range between Maryborough (south-east QLD: 25°32′S, 152°42′E) and Taree (northern NSW: 31°46′S, 152°26′E) in a replacement series with several closely related species (Corymbia maculata, Corymbia henryi and Corymbia citriodora subsp. citriodora) (Brooker & Kleinig, 1999; McDonald et al., 2000; Shepherd et al., 2008).

The spotted gums, including CCV, are emerging as important forestry species (Lee et al., 2010). In Queensland, CCV is a source of hardwood, as both natural stands and plantations, and has become the most commonly harvested native in the state (Bacles et al., 2009; Lee et al., 2010). The species has been widely planted as an ornamental in many regions of the world and commercial plantations have been established in South America, southern China, India, Sri Lanka, Congo, Kenya and most countries in southern Africa. Provenance trials indicate that CCV performs better compared with other eucalypts and corymbias for growth, wood properties, survival and tolerance to pests and diseases (Lee et al., 2010). There is considerable interest in developing CCV for plantation forestry in marginal zones because of its tolerance to cold and drought (Larmour et al., 2000; Lee et al., 2010).

Inheritance of wood, growth and disease traits in CCV has been examined to assess its potential for improvement via breeding programmes (Lee et al., 2009; Brawner et al., 2011). High wood density and cellulosic pulp yield indicate that CCV may be well suited for pulpwood production. Inheritance of pulping properties was recently examined in multiple CCV trials, indicating ample phenotypic variation with high heritability, implying a sizeable genetic component (J. T. Brawner et al., unpublished). Growth has been extensively studied in this species, and CCV trees exhibit variation within and between provenances, although heritability for growth is typically lower (0.20) relative to density or pulp yield (0.50 and 0.30) (Lee et al., 2009). Lastly, CCV exhibits heritable variation for resistance to Quambalaria piterika, a fungus causing Quambalaria shoot blight (QSB) that is endemic to coastal forests of eastern Australia (Brawner et al., 2011; Pegg et al., 2011a). QSB infects the leaves, stems and woody tissue of seedlings and young trees. The disease severely affects the growth and form of infected plants and reduces wood quality (Pegg et al., 2011b), and variation in the severity of the disease may be related to variation in wood properties in addition to gene-for-gene mediated resistance.

High heritabilities and ample phenotypic variation suggest that there is potential for improvement of CCV wood phenotypes though breeding. In addition to quantitative selection, there is an opportunity to explore genetic variation underlying traits at the gene level via association mapping, providing insights into the genetic mechanisms underlying wood development and identifying robust markers for phenotypic selection in tree breeding programmes. Using such an approach, we explore SNP variation underlying wood properties, growth and disease resistance for the first time in natural populations of CCV. We examine correlations between SNP variation within candidate genes and phenotypes in first-generation CCV provenance progeny trials established at three locations in southeast Queensland. Candidate genes involved in cambial division, differentiation, photosynthesis, expansion and secondary cell wall biosynthesis were identified from a previous gene expression study in developing xylem in E. nitens (Qiu et al., 2008). The application of three populations allowed verification of detected associations in trees growing under different conditions, and strengthens the argument for gene–phenotype association in several compelling cases.

Materials and Methods

Genetic material

A ‘discovery’ population was sampled from a Corymbia citriodora subsp. variegata (CCV) provenance progeny trial (Bakers Logging Area) (Table 1). Up to five genotypes were sampled from c. 250 families derived from four provenances that occur near the centre of the species range (Brooyar (40), Home (70), Wolvi (104) and Woondum (619)) (Fig. 1).

Table 1.   Description of progeny provenance trials sampled for Corymbia citriodora subsp. variegata
LocationEstablishedMaterialaDesignbLatitudeLongitudeTreesRain (ml)dRain (ml)eAlt (m)SlopeSoilAspect
  1. RP, Red Podzolic; YP, Yellow Podzolic.

  2. aNumber of provenances (open pollinated families).

  3. bNumber of replications, incomplete blocks per replication and trees per contiguous family plot.

  4. cMonthly average rainfall – Bauple, QLD, Bureau of Meteorology.

  5. dMonthly average rainfall – Mungar J, QLD, Bureau of Meteorology.

  6. eMonthly average rainfall for the year in which each trial was established.

Tiaro CampMarch 19993 (38)8, 5, 225.78° S152.63° E16493.28c113.25c65RPNorth
BakersJuly 20004 (203)7, 21, 425.76° S152.68° E83393.28c75.07c40RP, YPNorth
St MaryMay 20024 (44)5, 12, 425.67° S152.52° E13189.14d.69.6d.60RPWest
Figure 1.

Location of the native provenances and three provenance progeny trials of Corymbia citriodora subsp. variegata (CCV) that were employed in this study. The natural distribution of CCV is shown above (blue shading) (Brooker & Kleinig, 1999).

Leaf material was initially sampled from 202 CCV individuals (from 200 different families) (Brooyar (12), Home (16), Wolvi (25) and Woondum (149)). DNA was extracted from silica gel-dried leaves using a Qiagen 96 plant kit following the manufacturer’s instructions. An aliquot of the purified DNA for each individual was bulked in equimolar proportions and used as a template for PCR amplification of candidate gene sequences.

The remaining 631 CCV trees in the discovery populations were sampled 15 months later. The final population consisted of 833 individuals from 203 families and four provenances (Brooyar (40), Home (70), Wolvi (104) and Woondum (619)). Bark shavings were collected from each individual by scraping a thin layer of tissue from the outer chlorophyllous layer of photosynthetic tissue in the bark, into labelled 5 cm × 10 cm paper envelopes, and dried on silica gel. Genomic DNA was extracted using a Qiagen 96 plant kit, following a modified protocol, from 10 to 20 mg of dried bark shavings.

Two ‘validation’ populations located at different sites were also sampled (Tiaro Camp and St Mary; Table 1). Genotypes were sampled across 38 families represented by three provenances at Tiaro Camp (Curra (29), Wolvi (11) and Woondum (124)), and 44 families represented by four provenances at St Mary (Brooyar (8), Home (26), Wolvi (37) and Woondum (60)). Genetic material was extracted from bark scrapes following the method outlined in the previous paragraph. The purpose of including these two populations was to allow independent validation of associations discovered in the discovery population.

Candidate gene sequencing, SNP discovery and genotyping

Twenty-nine candidate genes were selected for sequencing to identify SNPs segregating in the discovery population. These candidates were identified from an expression study in wood-forming tissues of E. nitens (Qiu et al., 2008). Candidate genes were amplified from the CCV DNA bulk via touch-down Polymerase Chain Reaction with Phire® Hot Start DNA Polymerase (Finnzymes, Vantaa, Finland) using primers designed from the 5′ and 3′ ends of E. nitens cDNAs generated by Qiu et al. (2008) (Table 2). Amplified fragments were visually checked on 1% agarose gels, and quantified using a Qbit® fluorometer (Invitrogen) before pooling in equimolar quantities for high-throughput sequencing from a single library using the 454 platform (Roche). Assembly of amplified products was performed de novo using the CLC Genomics Workbench (CLC Bio, Aarhus, Denmark). Contigs were subsequently aligned with the corresponding E. nitens cDNA reference sequence. Gene orthology was identified by reciprocal best hit of the corresponding Eucalyptus grandis sequence (Phytozome: Eucalyptus grandis Genome Project 2010; against the TAIR (The Arabidopsis Information Resource, database. Contigs were annotated based on previously annotated orthologues in public databases, GenBank and TAIR. Most contigs were partial sequences, representing between 70 and 100% of the full-length gene in each case.

Table 2.   List of Corymbia citriodora subsp. variegata genes targeted for single nucleotide polymorphism (SNP) discovery and association testing
Gene abbreviationGene nameSNPsaSNPsbSNPscReferences
  1. COP1, Constitutive Photomorphogenic 1; NBS-LRR, Nucleotide Binding Site -- Leucine Rich Repeat; np, nonpolymorphic.

  2. aSNPs selected for genotyping.

  3. bSNPs successfully genotyped.

  4. cSNPs yielded genotype data suitable for statistical analysis (30 SNPs failed, 36 were monomorphic and 10 had excessive missing data).

EnACT7Actin121212An et al. (1996)
EnADH1Alcohol dehydrogenase875Hoeren et al. (1998)
EnBTF3Basic transcription factor873Freire (2005)
EnCel2Endo-1,4-beta glucanase443Yung et al. (1999)
EnCesA3Secondary cell wall cellulose synthase330Daras et al. (2009)
EnCIP7BCOP interacting protein11109Persson et al. (2005)
EnCSLA9Mannan synthase220Davis et al. (2010)
EnCSLA9Mannan synthase321Davis et al. (2010)
EnDehydRPDehydrin887Rorat (2006)
EnEXT-1Extensin642Merkouropoulos & Shirsat (2003)
EnFAH1Ferulic acid 5-hydoxylase 1510Aguade (2001)
EnGH17Glycoside hydrolase family310Wu et al. (2008)
EnGH5Glycoside hydrolase family650Wu et al. (2008)
EnLIM1Lim transcription factor200Papuga et al. (2010)
EnMT1Metallothionein830Zimeri et al. (2005)
EnMYB83myb transcription factor887McCarthy et al. (2009)
EnNAC1NAC domain containing protein510Riechmann et al. (2000)
EnNAC8NAC domain containing protein330Riechmann et al. (2000)
EnRAB6RAB GTP-binding protein16169Bednarek et al. (1994)
EnUDG1UDP-D-glucose dehydrogenase555Klinghammer & Tenhaken (2007)
EnACL5S-adenosyl-l-methionine dep methyltransferase662Panicot et al. (2002)
EnESK1ESKIMO1410Lefebvre et al. (2011)
EnZnf1Zinc finger protein11119Tague & Goodman (1995)
EnLOV1NBS-LRR protein300Lorang et al. (2007)
EnHB8Homeobox gene 8np00Pullen et al. (2010)
EnHB1Homeobox gene 1np00Ruberti et al. (1991)
EnCesA1Secondary cell wall cellulose synthasenp00Chen et al. (2010)
EnFLA71BFasiclin-like arabinogalactan protein 7np00MacMillan et al. (2010)
ss121Fasiclin-like arabinogalactan protein 17np00MacMillan et al. (2010)
Total 15012074 

Candidate gene contigs were scanned for biallelic SNPs with minor allele frequencies ≥ 5%. A preliminary set of 200 SNPs were selected using criteria that included minimum distance between SNPs of 200 bp to reduce redundancy (Eucalyptus typically exhibits low LD that decays within 500 bp (Thumma et al., 2005; Külheim et al., 2009)), minimum coverage of 100 reads, and ensuring representation of both silent and nonsynonomous sites. A Sequenom® genotyping assay was performed using MassArray software (Sequenom Inc., San Diego, CA, USA), giving a final set of 150 SNPs that were subsequently typed across all DNA samples. LD between unphased SNP genotype data in the discovery population was assessed using HelixTree software 6.3.6 (Golden Helix Inc., Bozeman, MT, USA).

Phenotypic variation

Twelve phenotypes were measured on both the discovery and validation populations. These included six key wood, growth and disease traits, namely, Quambalaria response (QRES), wood density (DEN; g cm−3), modulus of elasticity (MOE; N m−2), cellulose microfibril angle (MFA; degrees), Kraft pulp yield (KPY; %, correlated with wood cellulose content) and diameter at breast height (DBH; cm) at the time of the study. Six additional growth-related traits were measured, namely, height (m) at 1, 3 and 4–6 years (Height1, Height3 and Height4–6, respectively), and DBH (cm) at 3, 4–6 and 7–8 years (DBH3, DBH4–6 and DBH7–8, respectively).

Wood swarf samples were collected from the outer 5 cm of the under-bark wood, using a 16-mm spade drill bit drilled into the trees at c. 1.3 m. Previous optimization of sampling protocols (data not presented) had determined that consistently sampling outer wood from one side of the stem provided repeatable estimates of wood quality traits. A laboratory-based Near Infrared (NIR) spectrometer (Bruker MPA; Bruker Optik, Ettlingen, Germany) was used to acquire full NIR spectra in the range 4000–10 000 cm−1 (1000–2500 nm) at 8 cm−1 spectral resolution on each of the swarf samples. NIR spectral analysis was performed using a dedicated multivariate data analysis package, The Unscrambler v9.8 (Camo A/S, Oslo, Norway). Partial least squares (PLS) calibration models employed transformation using a Savitzky–Golay second derivative (Savitzky & Golay, 1964). The KPY and MFA models were previously described for multiple hardwood species (Downes et al., 2010; Meder et al., 2010). The calibration models used for density and MOE are unpublished but follow the methodology of Thumm & Meder (2001) and the summary statistics for the calibrations are given in Supporting Information Table S1. The calibrations were developed using 72 CCV samples taken from the Bakers Logging Area trial at 7.5 years of age.

QRES was measured using a qualitative scoring system where individuals were assessed for percentage of tips (young leaves and shoots) damaged by Quambalaria piterika on a one- to six-point scale. The six-point scale used for the QRES score in all three trials classified the percentage of tips damaged by Q. piterika as: 1 = 76–100%; 2 = 51–75%; 3 = 26–50%; 4 = 12–25%; 5 = 1–11% and 6 = 0%.

Narrow-sense heritability estimates for each trait in each population were calculated using the following formulae in asreml (VSN International, Hemel Hempstead, UK), which is the ratio of additive genetic variance to the within provenance phenotypic variance:

inline image where inline image is the between-family variance, inline image is the error variance, and inline image is the phenotypic variance with each variance specific to trial i or trait o.

Principal components describing the uncorrelated variation represented among six traits (QRES, density, MOE, MFA, KPY and DBH) were defined using StatistiXL 1.8 software (StatistiXL, Broadway, WA, Australia), applying an Eigen value cut-off of 1.

Genetic structure

Structure among the Corymbia citriodora subspecies was previously reported to be low based on putatively neutral markers (McDonald et al., 2000; Shepherd et al., 2008). Cluster analysis using Structure software V2.1 (Pritchard et al., 2000) was employed to test population genetic structure among the five CCV provenances across four trials using all available SNP genotype data that passed quality and frequency thresholds. Proportional membership in hypothetical ancestral clusters ranging from 1 to 10 was estimated for individuals and provenances across three replicate Markov Chain Monte Carlo (MCMC) runs. Proportional memberships were maximized over 100 000 MCMC steps following an optimized burn-in of 10 000 steps. Population admixture was assumed. Allele frequencies were treated as independent between populations, even though divergence was expected to be low, to avoid overestimation of K. Population and individual proportional memberships (Q-matrices) were computed from three replicates using Clumpp (Jakobsson & Rosenberg, 2007), and plotted using Distruct (Rosenberg, 2004). Loge(PrK) was also averaged over three replicates, and plotted as a function of K to provide an ad hoc indicator of the number of populations according to Pritchard et al. (2000). The modal value for ΔK, as described by (Evanno et al., 2005), was also calculated over the three replicates and plotted as a function of K. This served as an alternative ad hoc indicator of the true population number.

Because both the discovery and validation populations contain multiple individuals sampled from the same family across replicates, pedigree structure, or kinship was incorporated into the association analyses. We initially performed analysis of pairwise kinship coefficients among the same 833 trees and 74 SNPs in Spagedi (Hardy & Vekemans, 2002), using the Loiselle estimator (Loiselle et al., 1995), and imported this matrix into the mixed model. However, given the limitations of using a small number of markers to estimate kinship, we also generated a kinship matrix from known relationships among the half-sib families, where the coancestry or coefficient of kinship between sibs within the numerator relationship matrix is equal to 0.25. This relationship matrix was similarly used for association analysis with the mixed model.

Association tests

Associations between 12 traits and 74 SNP markers in the discovery population (833 individuals) were tested via a least squares fixed effects general linear model (GLM), and a mixed linear model (MLM), both implemented in Tassel (Bradbury et al., 2007). The statistical model for the GLM is described by y =  + e, where y is a vector for the observed dependant variable (trait); β is a vector containing independent fixed effects, including genetic marker and population structure matrices; X is the known design matrix; and e is the unobserved vector for the random residual (error) (Henderson, 1975). Significant population structure was not detected in Structure, and therefore no population matrix was applied. Analysis under the GLM was performed both with and without perturbation of the data set to adjust for multiple testing error (1000 permutations). Under the MLM, association tests incorporated a kinship matrix, where y = Xβ + Zu + e, with X and Z being known design matrices and u an unknown vector of random additive genetic effects from multiple background QTLs. Replicate effect was included in both models as a factor (coded by replicate number, i.e. 6) in an attempt to account for phenotypic covariation with site conditions. Nine significant SNP–trait associations identified in the discovery population were independently assessed using the same methods in the validation populations at Tairo Camp and St Mary. The raw genotype and trait files are presented in Notes S1.


Phenotypic variation

Ample quantitative variation was observed for all traits measured over the 1128 individuals in the discovery population. Analysis of heritability for density, growth and disease resistance in this trial previously indicated that the observed variation in these traits has a significant genetic component (Brawner et al., 2011). Phenotypic values for density, MOE, MFA, KPY, DBH and height was distributed normally (Fig. S1). The distribution of disease scores for QRES was skewed towards the lower and upper ends of the disease damage scale, suggesting a single dominant gene model for disease resistance, as nearly 75% of individuals scored between 5 and 6. The coefficient of variance for all 12 traits was moderate, ranging between 0.04 and 0.22. Trait variances within populations are presented in Table S2.

Variation in phenotypic values among the five CCV provenances for QRES, density, MOE, MFA, KPY and DBH was significant for at least some combinations of populations in each case by one-way ANOVA (Fig. S2). Three principal components were identified explaining 80% of the combined variance in QRES, density, MOE, MFA, KPY and DBH. Graphical distribution of the case-wise principal components analysis (PCA) scores for individuals are presented in Fig. S3. Component loadings indicate that the main traits contributing to each component were: PCA1 (MOE, MFA and KPY); PCA2 (QRES and DBH); and PCA3 (density).

Candidate gene sequencing, SNP discovery and genotyping

Of 150 SNPs typed across all individuals, 74 yielded genotype data suitable for statistical analysis (30 SNPs failed, 36 were monomorphic and 10 had excessive missing data). Repeat assays independently tested for 87 genotypes indicated a success rate of between 95 and 100% (with 81 genotypes having correspondence rates > 99%). The overall level of LD between SNPs in the data set was low, with < 1% of pairwise correlations between sites (r2) exceeding 0.2. Hardy–Weinberg equilibrium (HWE) was tested for all 74 loci and revealed that 27 markers departed the equilibrium expectation (< 0.05) in the discovery population. It is possible that these cases represent a degree of genotyping error, though this is countered by the above-mentioned high level of success for repeat assays. Alternatively, a low level of cryptic population structure could explain the departure, as in most cases this was attributable to a deficiency of heterozygotes. Although there was no evidence for genetic structure detected in this sample (see section ‘Genetic structure’ below), the bar plot of individual cluster assignments was suggestive of two genetic groupings (data not shown). No departure from HWE was observed (< 0.05) in the three cases where a significant association was detected (Table 3). Allele frequencies for 74 loci were also compared among the three populations via a χ2 test, and no significant differences were detected when a threshold type 1 error rate of = 0.01 was applied.

Table 3.   Significant associations called in the Bakers Logging Area Corymbia citriodora subsp. variegata discovery population
SNPGeneTraitSubP-valueP-valueaFSTHeHWER2% changeMAFType
  1. HWE, χ2 statistic for significant (< 0.05 and > 0.01) departures from Hardy–Weinberg proportions; ns, not significant; % change, percentage change in phenotypic value (compared with the mean of the other genotype classes) resulting from the favourable allele; MAF, minor allele frequency determined from genotype data; sub, number of times associated out of 20 when tested in a random subsample of 150 individuals; nc, noncoding site; syn, synonymous site; PCA, principal components analysis; QRES, Quambalaria response; DEN, wood density; MOE, modulus of elasticity; KPY, Kraft pulp yield; COP1, Constitutive Photomorphogenic 1; DBH, diameter at breast height; SAM, S-adenosyl-l-methionine; dep, dependant.

  2. aAfter permutation.

  3. bSignificant in validation population (P < 0.05).

SNP7bActin (AtActin7)KPY40.0010.0020.0160.136ns0.0168%0.061Silent (syn)
SNP38bCOP1 interacting protein (AtCIP7)DBH340.0020.0090.0400.465ns0.0143.4%0.388Silent (syn)
SNP39bCOP1 interacting protein (AtCIP7)DBH340.0020.0070.0370.462ns0.0143.4%0.376Silent (syn)
SNP105SAM dep methyltransferase (AtACL5)DBH40.0030.0050.0020.161ns0.0130.086Silent (nc)
SNP105SAM dep methyltransferase (AtACL5)DBH4-630.0050.0280.0020.161ns0.0120.086Silent (nc)
SNP14Alcohol dehydrogenase (AtADH1)MOE40.0050.0210.0070.482ns0.0130.432Silent (syn)
SNP3Actin (AtActin7)DEN10.0050.0250.0310.4404.990.0130.426Silent (nc)
SNP30Endo-1,4-beta glucanase (AtCEL2)QRES40.0070.0690.0120.1216.470.0120.087Silent (nc)
SNP38COP1 interacting protein (AtCIP7)PCA2 (DBH, QRES)0.0110.0410.0400.465ns0.0110.388Silent (syn)

Genetic structure

Genetic structure among the five provenances of CCV in this study, inferred from the parameters and proportional memberships generated in Structure, was not significant. Despite ample phenotypic structure among provenances, the absence of any signal in the delta K plot, and low FST for individual markers (FST between 0 and 0.065; median of 0.01) support the presence of a panmictic population among the provenances sampled based on SNP data. In addition, mean FST values compared for associated and nonassociated SNPs for the whole data set, as well as for a random sample with similar heterozygosity, were not significantly different (ANOVA;  0.137–0.583). This indicates that subtle structure at individual loci was unlikely to account for associations identified in this study.

This supports findings based on other marker systems (McDonald et al., 2000; Shepherd et al., 2008), and long-distance gene flow associated with parrot and flying fox pollinations reported for this species (Bacles et al., 2009). Consequently, there was no need to consider sample-wide structure to avoid detection of spurious associations. The absence of genetic structure is an appealing feature for any species being used in an association study, as genetic stratification can otherwise lead to substantial increases in the false positive rate, and cannot always be accounted for (Zhao et al., 2007).

Association tests

Correlations of uncorrected P-values generated for individual SNP–trait association tests in the discovery population using three different statistical approaches were high (R2 = 0.82–0.93). The results indicate that association tests which accounted for family-based structure (mixed model) delivered similar outcomes to the standard F-test (GLM) (R2 = 0.92). Results from the mixed model incorporating the pedigree matrix were also similar to the GLM (R2 = 0.85). The generalized linear model, including replicate effect, was applied as the final test for association in all populations because this model permitted adjustment of P-values following permutation to account for multiple testing error.

In total, 64 associations were found to be significant under this model before permutation, and subsequent adjustment of P-values reduced the list of candidate SNP–trait associations to nine (Table 3), suggesting a high false positive rate. There was a strong degree of agreement between these results and both of the mixed models, with only one association (SNP14–MOE) not appearing when the pedigree file was applied, possibly indicating a spurious association attributable to family structure in this case. The cumulative P-value distribution between 0 and 1 plotted as a density histogram revealed a skew in P-values near zero compared to the null distribution (not shown), although q-values estimated for these associations were not significant. Individual SNP markers accounted for a small proportion, between 1.3 and 1.6%, of phenotypic variance for seven traits. Negative correlations between SNPs yielding significant associations and growth were not observed. When association tests were performed using case-wise scores for principal components derived from six traits (QRES, density, MOE, MFA, KPY and DBH), the results revealed only one SNP association (Table 3).

Direct correspondence of significant associations in the discovery population and validation populations was observed in three cases at Tiaro Camp (Table 4) (validation population 1); however, no SNP associations were validated in the St Mary population (validation population 2). Ranking of P-values for associations tested in Tairo were similar to Bakers (R2 = 0.93), but ranks for these populations correlated poorly with St Mary (R2 = 0.3–0.5). Trait variances for the St Mary population were substantially lower than in Tairo and Bakers for Height1, Quambalaria and DBH3 (30–110% lower). This may have reduced the likelihood of detecting associations with DBH3 and SNPs 38 and 39 in St Mary. Trait heritabilities (h2) for both validation populations were on average lower than for the discovery population. Heritabilities at St Mary (on average 37% of discovery h2) were lower than those detected at Tairo Camp (on average 74% of discovery h2), potentially reducing the ability to detect associations of small effect at St Mary. The reason for reduced h2 at St Mary is not clear, but could be attributable to a number of factors including smaller sample size (20% smaller than Tairo Camp). Environment may also have been a factor; the St Mary site receives lower long-term rainfall compared with the other sites and has a west-facing aspect (Table 1).

Table 4.   Associations repeated in the Tiaro Camp and St Mary Corymbia citriodora subsp. variegata populations
SNPGeneTraitP-valueaP-valuebFSTaHeaHWEaR2a% changeaMAFGTaType
  1. HWE, χ2 statistic for significant (< 0.05 and > 0.01) departures from Hardy–Weinberg proportions; ns, not significant; % change, percentage change in phenotypic value (compared with the mean of the other genotype classes) resulting from the favourable allele; GT, genotype minor allele frequency; syn, synonymous site; QRES, Quambalaria response; DEN, wood density; MOE, modulus of elasticity; KPY, Kraft pulp yield; COP1, Constitutive Photomorphogenic 1; S-SAM, S-adenosyl-l-methionine; dep, dependant; DBH, diameter at breast height.

  2. aTairo Camp.

  3. bSt Mary.

  4. cFailed genotyping.

SNP7Actin (AtACT7)KPY0.0250.5880.0220.1495.790.04840.056Silent (syn)
SNP38COP1 interacting protein (AtCIP7)DBH30.0170.1510.0030.474ns0.05250.398Silent (syn)
SNP39COP1 interacting protein (AtCIP7)DBH30.0460.5160.0040.468ns0.03820.387Silent (syn)
SNP105SAM dep methyltransferase (AtACL5)DBH0.1130.5990.0170.174
SNP105SAM dep methyltransferase (AtACL5)DBH4-60.1090.6130.0170.174
SNP14Alcohol dehydrogenase (AtADH1)MOE0.2010.8090.0120.493
SNP3Actin (AtActin7)DEN0.8030.3330.0020.474
SNP30Endo-1,4-beta glucanase (AtCEL2)QRES0.341c0.0610.214

The validated SNPs originate from two genes, actin (SNP7) and COP1 (Constitutive Photomorphogenic 1) interacting protein 7 (CIP7) (SNPs 38 and 39) (Tables 3, 4; Fig. 2). These associations were highly significant following permutation testing in the discovery population. In Tiaro Camp, associations were significant before high-stringency permutation testing only. The percentage of phenotypic variation explained by each SNP was larger in the validation population, ranging from 3.7 to 6.7%, probably as a result of increased variance in phenotypic values caused by the smaller population size. The effect of CIP7 SNPs 38 and 39 on DBH appears to be age dependent. Although stem growth tended to be higher for the heterozygote classes in older trees (i.e. ages 4–6 and 7–8 years) (data not shown), the trend was only significant in trees aged 3 years.

Figure 2.

Box and whisker plots illustrating trait medians and distributions for each allelic class for three single nucleotide polymorphism (SNP) markers from two genes significantly associated in both the Corymbia citriodora subsp. variegata (CCV) discovery population and the Tiaro Camp validation population. Upper and lower edges of each box represent the first and third quartiles of the distribution, and the central line indicates the median. Error bars indicate maximum and minimum values. CIP7, COP1 (Constitutive Photomorphogenic 1) interacting protein 7; KPY, Kraft pulp yield.

The allelic effect of SNP7 on KPY was reversed at Tairo Camp compared with Bakers Logging Area. When a detected association is real, modelling experiments have shown that reversals in allelic effect can reflect complex associations that are influenced by interaction between the associated SNP and other factors (Lin et al., 2007; Greene et al., 2009; Shibata et al., 2009). In the presence of such interactions, the variance of a trait is expected to become different between genotype classes at a single locus (Shibata et al., 2009). To test for evidence of interactions, the variance among trait values for each genotype was compared according to Shibata et al. (2009) for all seven SNPs found to be associated in the discovery population. The analysis revealed significant differences in genotype–trait variances for three SNPs, including SNP7 (Table 5).

Table 5.   Trait variances for each of three genotypic classes for all seven single nucleotide polymorphisms (SNPs) associated in the Bakers Logging Area Corymbia citriodora subsp. variegata discovery population
SNPVariance by genotype classP-valueValidateda
  1. Trait variances and significance calculated according to Shibata et al. (2009).

  2. 0, homozygote 1; 1, heterozygote; 2, homozygote 2; –, trait variances did not differ.

  3. aSignificant in two populations.



Associations with wood quality

Genes influencing wood properties have been identified in at least nine independent association studies, in six tree species including eucalypts, pines, poplar and spruce (Thumma et al., 2005, 2009; Yu et al., 2006; Gonzalez-Martinez et al., 2007; Dillon et al., 2010; Sexton et al., 2010, 2011; Wegrzyn et al., 2010; Beaulieu et al., 2011). These studies reveal a list of SNPs from diverse gene families. Collectively the results might be used to better understand gene networks underlying complex wood traits. For example, cellulose MFA has been associated in diverse taxa with genes from lignin and cellulose biosynthetic pathways, and those encoding cytoskeletal proteins and cell wall proteins (Thumma et al., 2005; Gonzalez-Martinez et al., 2007; Beaulieu et al., 2011). Conversely, wood density has only been linked with variation in lignin biosynthetic genes (Gonzalez-Martinez et al., 2007; Dillon et al., 2010).

The present study revealed nine significant SNP–trait associations, with six genes that have diverse roles in cambial development that have not previously been associated with wood or growth traits (Table 3). This equates to 0.9% of all tests performed, or c. 10% of all SNPs, which is similar to association studies in Pinus taeda (Gonzalez-Martinez et al., 2007), Pinus radiata (Dillon et al., 2010) and E. nitens (Thumma et al., 2005, 2009). In Picea glauca and E. pilularis the numbers of significant associations were lower (c. 1–3% of all SNPs; Sexton et al., 2010; Beaulieu et al., 2011). The effects attributed to individual SNP alleles were small (1–5%), consistent with earlier studies. Small effects are also consistent with the quantitative mode of inheritance assumed for many wood traits, which are expected to be influenced by many genes (Neale & Savolainen, 2004). The small number of markers typically identified in candidate gene association studies, and hence the proportion of phenotypic variation accounted for, has been a limitation for application of SNPs in tree breeding (Resende et al., 2012). Next-generation sequencing and genotyping technologies are allowing construction of denser candidate gene SNP sets (1000s to 100 000s), that in combination with existing results promise more practical outcomes for tree breeders.

Associations with several SNP markers were detected in two of the three populations examined. This approach, known as validation, has become a gold standard for assessing statistical results from association studies with large numbers of independent tests (Greene et al., 2009). The majority of associations detected in the discovery population were not repeatable, and may represent false positive associations. However, these tests are likely to be underpowered because of small validation population sizes (Purcell et al., 2003; Gordon & Finch, 2005). Indeed, when validation was replicated by subsampling 150 individuals from the larger Bakers population (20 times), the power to detect the same eight associations, estimated as the proportion of replicates in which an effect is detected, dropped by 80%. It may not therefore be possible to conclusively ‘rule out’ associations on the basis of no validation in this study. However, the presence of validation can assist in ‘ruling in’ associations, and provided additional evidence for SNP effects in three cases.

COP1 interacting protein 7 (CIP7)

A candidate with homology to the nuclear regulatory gene CIP7 (Yamamoto et al., 1998) harboured two SNPs (SNPs 38 and 39) associated with growth in CCV trees at 3 years of age. Both SNPs were found to be associated with the same trait in the Tiaro Camp validation population. In both cases, increased stem growth was associated with the heterozygous genotype, suggesting a co-dominant mode of gene action. The two loci exhibit moderate LD (R2 < 0.53), and occur in a linkage block spanning 1563 bp with SNPs 43 and 35 (Fig. 3), suggesting that the effects of SNPs 38 and 39 represent a single functional variant within CIP7. This is supported by a lack of linkage with CIP7 SNPs flanking this region, and the overall low level of LD expected in CCV based on rapid LD decay in eucalypts (Külheim et al., 2009).

Figure 3.

Position of the two single nucleotide polymorphism (SNP) loci, 38 and 39, which were significantly associated with growth within the gene model for COP1 (Constitutive Photomorphogenic 1) interacting protein 7 (CIP7) (based on the Eucalyptus grandis genome). The two SNPs are in moderate linkage disequilibrium (LD) with each other (R2 = 0.53), as well as two additional SNPs which were not associated with any trait (35 and 43). Allele frequencies for these four SNPs are high, with minor allele frequency (MAF) ranging between 0.26 and 0.39. Several intervening loci occurring in this region are not part of the linkage block; however, low diversity at these loci (MAF 0.01, 0.05, 0.0.7, 0.08, 0.09 and 0.17) suggests that the linkage block is likely to be recombinationally driven, with unlinked lower frequency alleles reflecting recent mutations that have not reached equilibrium.

Association of the CIP7 gene and tree growth is supported in the context of existing gene functional and expression studies. The translated gene product has been shown to function as a positive regulator of light-related genes involved in photosynthesis (Yamamoto et al., 1998). CIP7 is a single-copy gene in Arabidopsis thaliana and E. grandis (as determined by BLAST against the genome sequence: Eucalyptus grandis Genome Project 2010; CIP7 was first identified in seedlings and adult leaves of A. thaliana, but is highly expressed in stems compared with other tissues (The Bio-Array Resource for Plant Biology: Winter et al., 2007). The gene has since been detected in leaves (Eucalyptus camaldulensis; B. R. Thumma et al., unpublished) and branches (E. nitens; Qiu et al., 2008) of two Eucalyptus species. The occurrence of this gene in diverse tissues suggests an important role in photosynthetic regulation plant-wide that underlies the association with growth.

The apparent age dependence of this association in CCV might reflect temporal limitations on CIP7 activity specific to the wood. Photosynthesis occurs in the bark and living cells of wood in a range of plant species (Pfanz, 2008). Fixation of CO2 in the wood is important for the maintenance of stem internal CO2 produced by respiration and contributes to growth (Aschan et al., 2001; Pfanz, 2008; Teskey et al., 2008; Saveyn et al., 2010). Respiration is typically higher in stems of young, rapidly growing plants (Pfanz et al., 2002), hence the demand on CIP7 activity may be higher in young trees compared with later developmental stages, and variations in the effect of CIP7 on growth more penetrating. Similarly, the amount of light penetrating the periderm, which is related to bark thickness and is age dependent (Aschan et al., 2001; Pfanz et al., 2002), could lead to lower rates of CIP7 activity and penetrance of SNP effects in older trees.

If CIP7 is contributing to photosynthesis in the woody tissues or leaves of CCV, DNA variants that affect the abundance or activity of CIP7 may produce variations in growth. Both CIP7 SNPs are positioned in exonic regions, but do not code an amino acid change. At the sequence level, SNP38 lies within a predicted cis-acting regulatory element which is part of a conserved rbcA-CMA1 array involved in light responsiveness in plants (Arguello-Astorga & Herrera-Estrella, 1996; Lescot et al., 2002; Janaki & Joshi, 2004), providing a possible mechanism for manipulation of CIP7 gene expression via SNP38.

Actin cytoskeletal protein

A single marker (SNP7) from an actin gene family member (ACT7) was associated with cellulosic pulp yield (KPY) in both the discovery population and the Tiaro Camp validation population. Higher plants contain families of actin-encoding genes with specialized roles in cytoskeleton formation (McDowell et al., 1996). ACT7 appears to have an important function in plant vascular development as it is expressed not only in developing xylem of Eucalyptus sp. (Qiu et al., 2008) but also in stems of A. thaliana and the wood xylem of Populus (The Bio-Array Resource for Plant Biology: Winter et al., 2007). A potential functional link between actin and pulp yield lies in the role of actin filaments in intracellular trafficking of cellulose synthase complexes (CSCs). In developing xylem, CSC-containing Golgi move along actin cables which direct the insertion of CSCs into the cell membrane during primary and secondary cell wall biosynthesis, depolymerization of which results in cessation of Golgi movement and uneven distribution of cellulose synthesis around the cell cortex (Crowell et al., 2009; Gutierrez et al., 2009; Wightman & Turner, 2010). SNPs that affect the function or amount of actin may impact the amount or distribution of cellulose in the cell wall, and hence KPY (Kien et al., 2009). SNP7 is a synonomous variant positioned within the second exon, but does not overlap with known regulatory motifs at the DNA sequence level or functional domains of the translated protein (based on the NCBI Conserved Domains Database). Low LD between SNP7 and other flanking polymorphisms typed in this region suggest that it may be the functional variant.

In the discovery population the actin SNP7 GG homozygote was associated with decreased pulp yield, suggesting a dominant mode of gene action (Fig. 2). The same mode is observed in the Tiaro Camp validation population; however, the SNP effect is ‘flipped’, where the GG homozygote affords higher pulp yield. Reversal of allele effect, or ‘flip-flop’, has been reported in numerous human association studies. One explanation is that the detected effect is not real. Such reversals have also been attributed to truly heterogeneous effects of an SNP resulting from interaction with other factors that vary between populations, such as genetic background (G × G) or environment (G × E) (Lin et al., 2007; Greene et al., 2009; Shibata et al., 2009). This phenomenon has been examined using theoretical models, and the probability of identifying a flip-flop association by chance is low when studies are underpowered (Clarke & Cardon, 2010).

The possibility of an interaction between the SNP7 locus and another factor is suggested by the variance contrast test of Shibata et al. (2009), which found phenotypic variance for the GG homozygote to be large and significantly different from the other genotype classes (CC and CG). Similar reversals have been observed in genes influencing KPY in E. nitens where environmental conditions varied significantly between populations (Southerton et al., 2010). In the case of CCV, variation in long-term environmental parameters between trials is limited because of their close range (Fig. 1, Table 1). The Tiaro Camp and Bakers populations were established in years with variable rainfall (113.25 ml at Tiaro Camp vs 75.0 ml at Bakers Logging Area; Bureau of Meteorology), forming a possible basis for the interaction. Despite their close proximity, other conditions also vary between the sites, including altitude, slope and soil (Table 1). It is unlikely that the reversal resulted from a change in allele frequency at an interacting locus because allele frequencies for SNPs typed in both populations were statistically similar.


Progress has been made towards understanding the genetic mechanisms contributing to variation in wood phenotypes via association studies in several tree species. The associations identified in CCV add to this, specifically providing insights into the contribution of two genes, actin and COP1 interacting protein. From an applied perspective, this study reveals SNPs that have potential in marker-assisted selection. SNPs from CIP7 could be applied to select for growth at 3 years of age, affording increases in growth associated with the favourable genotype of between 2 and 5%. Because these SNPs are moderately linked, and probably reflect a single functional variant, their effects are not expected to be additive. In the case of SNP 7 (actin) the percentage increase in pulp yield associated with the beneficial allele was between 4 and 8%. The fact that this SNP flip-flopped between the Baker and Tairo Camp populations, together with similar observations in E. nitens, highlights the need to examine the behaviour of SNPs in contrasting environments before applying them operationally.


The authors would like to sincerely thank Paul MacDonald and Paul Warburton for invaluable field support; Guanghua Huo for technical assistance with gene fragment amplification; Helen Wallace and team from the University of the Sunshine Coast for their involvement in the project; and the Department of Employment, Economic Development and Innovations (QLD), National and International Research Alliances Program (NIRAP)/Research-Industry Partnerships Program (RIPP) for supporting this work.