Genome-wide association study and network analysis of in vitro transformation in Populus trichocarpa support key roles of diverse phytohormone pathways and cross talk

in amenability to transformation and regeneration (TR) among many


Introduction
By enabling the introduction, transfer and manipulation of genetic material within and across genomes, plant transformation has become an important tool for scientific research and plant improvement.Agrobacterium is commonly used as a vector to introduce foreign DNA to plant genomes, followed by in vitro tissue culture methods to promote regeneration of whole plants.These treatments often employ a high-auxin, low-cytokinin media to promote the formation of dedifferentiated and pluripotent 'callus,' followed by high-cytokinin, low-auxin media to promote redifferentiation of pluripotent tissue into shoot (Thorpe, 2007).However, effective treatments often vary widely across species and genotypes, a challenge that remains a limiting factor in the application of plant transformation and regeneration (TR; Altpeter et al., 2016).
Similarly to other developmental processes in plants, regeneration depends on phytohormones and complex polygenic networks involving transcriptional regulation, protein-protein interaction, post-translational modifications, and other means of gene-gene interactions functioning both upstream and downstream of the phytohormones (Ikeuchi et al., 2018).Auxin is a canonical regulator of cell expansion, long believed to function largely via downstream proton pumps that influence pHdependent action of enzymes catalyzing loosening of cell walls (Hager, 2003;Majda & Robert, 2018).In contrast to auxin's patterns of localization (Taiz et al., 2015), cytokinin is a canonical regulator of cell division via mechanisms including regulation of cyclin expression (Argueso & Kieber, 2024).Cross talk between auxin and cytokinin signaling includes both antagonistic and cooperative relationships through numerous points of interaction spanning hormone synthesis, transport, and downstream regulatory cascades, encouraging a view of the complete auxin-cytokinin axis as a regulator of both cell division and expansion (Kotov & Kotova, 2023).
When plant tissue is wounded, either in nature or in vitro tissue culture, the auxin/cytokinin axis can be activated by upstream wound response signaling directed by jasmonates (JA; Zhang et al., 2023) in concert with ethylene (Bouchez et al., 2007), while salicylic acid (SA) signaling antagonizes this process (Li et al., 2019).Highly relevant intersections between the JA/ethylene/SA axis and cytokinin/auxin axis include regulation of ENHANCER OF SHOOT REGENERATION 2 and an auxin biosynthesis gene by ETHYLENE RESPONSE FACTOR 1 (Mao et al., 2016;Ikeuchi et al., 2018).Competing and antagonistic roles of JA/ethylene and SA likely evolved in response to a need for distinct defense mechanisms against biotrophic and necrotrophic pathogens, in both cases with precise control of necrosis of infected and nearby tissues being essential for the plant to contain infection (Spoel et al., 2007).In the context of in vitro regeneration, the wounding of explants (excised plant tissues) may lead to either necrosis or regeneration at the site of wounding, with the patterns of necrosis and regeneration being highly genotypedependent, including in poplar (Ma et al., 2022a,b).Moreover, a third key phytohormone axis involves the competitive actions of gibberellins and abscisic acid (ABA), often studied in the context of their roles in seed and vegetative development, as well as in abiotic stress response (Shu et al., 2018).This axis is also highly relevant to in vitro regeneration for reasons including the action of downstream DELLA transcription factors that mediate cross talk with the auxin-cytokinin and JA/ethylene/SA axes (Davière & Achard, 2016).
The genetic discovery of additional MRs, particularly those involving phytohormone cross talk and signaling outside of the auxin-cytokinin axis, may be essential for TR systems that are robust across diverse species, genotypes, and tissue culture environments.Association mapping, such as with genome-wide association studies (GWAS), is a prominent approach to discover genetic variants controlling important agronomic traits (Cortes et al., 2021).This approach has been used to a limited extent to study rates of in vitro regeneration; however, population sizes and thus the statistical power of those studies has been limited (Tuskan et al., 2018;Lardon & Geelen, 2020;Nguyen et al., 2020;Zhang et al., 2020;Dai et al., 2022), and none have studied regeneration of transgenic tissues.
Network analysis methods involving integration of multi-omic datasets can provide powerful insights into regulatory modules and hubs, such as was demonstrated by a recent study in wheat that revealed epigenetic regulators and transcription factors had major influences on hormone response during TR (Liu et al., 2023).The study of transformation introduces additional challenges beyond those of regeneration alone, as the use of gene transfer treatments such as wounding-associated Agrobacterium co-cultivation followed by antibiotic selection produce additional complex biological responses.In addition, transgenic tissues often develop slowly through multiple stages (e.g.microcallus, macrocallus, and shoot), thus phenotypes must be determined over time while maintaining sterile conditions.In addition, transgenic tissue size and frequency is challenging to quantify, especially in early stages of development.For these reasons, we spent several years developing a phenomic system for measuring transgenic tissue growth and regeneration.We previously reported a GWAS of in planta regeneration of Populus trichocarpa (black cottonwood) (Tuskan et al., 2006) absent transformation, utilizing a red, green, and blue (RGB) computer vision system for high-throughput measurement of callus and shoot during regeneration (M.F. Nagle et al., 2024).In the present work, we built upon this RGB-based methodology by adding fluorescenthyperspectral imaging, which enabled the efficient use of fluorescent proteins as positive markers of transformation and thus enabled us to determine when regenerating tissues were either transgenic or escapes from negative selection (i.e. from antibiotics used in tissue culture media).
Populus trichocarpa represents an ideal model to study in vitro TR in a GWAS context due to a wealth of genomic resources, as well as detailed in vitro culture (Kang et al., 2009;Ma et al., 2022a,b) and transformation methods (Han et al., 1997;Ma et al., 2004;Song et al., 2006;Li et al., 2015).A diverse population of 1323 re-sequenced poplar clones, collected from across its natural range in northwestern North America, provides high levels of genetic diversity and statistical power for analysis.A single-nucleotide polymorphism (SNP) dataset for this population features over 7 million SNPs with polymorphism in at least 5% of genotypes and nonmissing data in at least 90% of New Phytologist (2024) www.newphytologist.comgenotypes.The density of these SNPs relative to genes is also high, with a mean frequency of one SNP every c.50 bases on contiguous assembled chromosomes.In addition, linkage disequilibrium (LD) decays rapidly to 0.2 (R 2 ) within 2 kb, enabling powerful and high-resolution association mapping.Portions of this study population have been previously used to study diverse traits including in vitro callus regeneration (Tuskan et al., 2018), in planta callus and shoot regeneration (M.F. Nagle et al., 2024), cell wall development (Furches et al., 2019), and metabolic traits (Zhang et al., 2018), among others (Muchero et al., 2018;Bdeir et al., 2019;Chhetri et al., 2019Chhetri et al., , 2020;;Weighill et al., 2019;Nagle et al., 2023).
Here, we report over 400 candidate genes, with gene ontology overrepresentation of transcriptional regulators and proteinbinding proteins (among other categories), providing evidence for the role of a complex genetic regulatory network (GRN) that controls regeneration of transgenic tissues.Integration of published high-throughput datasets, including differentially expressed genes, promoter binding, and protein-protein interactions, indicated that over 200 of our candidate genes are members of a GRN involving diverse phytohormone signaling pathways and cross talk thereof.Our network analysis, alongside epistasis tests that help to validate and extend our GWAS results, indicate that many candidate genes are members of contiguous GRNs that span diverse pathways involving hormone, wound and biotic stress response.

Plant materials
We examined in vitro TR in clones from a re-sequenced clone bank of P. trichocarpa (Tuskan et al., 2018;Zhang et al., 2018;Bdeir et al., 2019;Chhetri et al., 2019Chhetri et al., , 2020;;Furches et al., 2019;Weighill et al., 2019;M. F. Nagle et al., 2024), with 1307 clones available at a field location in Corvallis, OR.Cuttings from the field were propagated in glasshouse conditions, and plants grown in the glasshouse were used to provide explant material to undergo TR (Supporting Information Methods S1).Stem explants underwent sterilization (as detailed in Table S1) and were transformed using an enhanced Green Fluorescent Protein (eGFP) reporter plasmid with a minimal matrix attachment region (Fig. S1), introduced via a proprietary strain of Agrobacterium (LBA4404-derived) featuring thymidine auxotrophy (thy À , provided courtesy of Corteva Agrisciences).All Agrobacterium was treated with an induction media featuring acetosyringone (AS), and each genotype was cultured both with and without AS added to the subsequent co-cultivation media.Following co-cultivation, stem explants underwent callus induction and shoot induction media treatments (Methods S2).Given the large number of genotypes, tissue culturing and phenotyping were conducted across 69 phases spanning roughly 20 months.Contamination of tissue cultures was an ongoing challenge that required our sterilization procedures to evolve over the course of this multiple-year study.Phase identifiers were recorded such that they could be used as covariates in downstream statistical models (Table S1; Methods S1-S9).

Phenotyping with RGB and hyperspectral computer vision
We developed a computer vision workflow where semantic segmentation of RGB images provided maps of tissue identity (e.g.callus, shoot and unregenerated stem) and CUBEGLM analysis of fluorescence hyperspectral data provided maps of eGFP signal.These two data layers were aligned, stacked, and used to provide an array of TR statistics via the GMODETECTOR workflow (https://github.com/naglemi/GMOnotebook). We collected images using a custom-designed RGB and hyperspectral imaging platform (macroPhor Array imager from Middleton Spectral Vision, Middleton, WI, USA).Fluorescent-hyperspectral images were collected using a single-frequency 488 nm laser as an excitation source to enable excitation of reporter proteins (such as eGFP used here, and others) as well as chlorophyll (Methods S3).In addition, our models included a 'noise' coefficient that controlled for autofluorescence and enhanced our ability to distinguish low levels of reporter protein signal from autofluorescence, which might otherwise lead to false positives when relying solely on fluorescent microscopy with filters.RGB image segmentation was enabled by a training set prepared using our previously reported IDEAS web-based image annotation interface (Yuan et al., 2022) and by transfer learning to retrain a DEEPLABV3+ model (Chen et al., 2018) previously used in our GWAS of in planta regeneration (M.F. Nagle et al., 2024).The retrained model was then deployed to segment 19 961 Petri dish images into background, callus, shoot, contaminated/necrotic tissue, and unregenerated stem explant material.Of note, we believe that the category of unregenerated stem includes microcalli that are not visible at the macroscopic scale of imaging.
To provide an additional filter for contaminated material, we also trained a DENSENET binary classification model (Huang et al., 2018) to filter markedly contaminated explants and spots with no explants (removed during subculturing due to contamination).eGFP signal was quantified on a per-pixel basis using our CUBEGLM PYTHON package (https://github.com/naglemi/cubeglm/) to project hypercubes (3D array representations of hyperspectral images) onto design matrices of eGFP, chlorophyll, and noise components represented by standard spectra obtained from KEMOQUANT (Middleton Spectral Vision).
To enable cross-referencing of RGB and hyperspectral data collected from different cameras, these two image layers were aligned using a homography transformation.This transformation involved the imaging of grids with both RGB and hyperspectral cameras, followed by the selection of matching points on complementary images and the computation of transformation matrices using the selected indices.Once computed, these transformation matrices were applied on a high-throughput scale for alignment of all images collected with the same camera positions and settings.
We sought to classify given tissues and explants as transgenic or not based on whether hyperspectral signal indicative of a fluorescent reporter protein met a given x threshold in at least y pixels of hyperspectral images.To determine appropriate parameters for x and y, we performed fluorescent microscopy to examine 4423 explants that underwent transformation and recorded whether each appeared to display fluorescent protein or not.Hyperspectral Ó 2024 The Authors New Phytologist Ó 2024 New Phytologist Foundation New Phytologist (2024) www.newphytologist.comweights were computed as previously described and images were divided into 12 portions, as each plate featured 12 explants in specific positions.We next constructed a cost function based on the difference between true positive rates and false positive rates and performed a parameter sweep, finding that parameters of 3 pixels meeting a reporter threshold of 38 performed well with a true positive rate of 79.0% and false positive rate of 10.4%.Other parameter sets with similarly low cost are given on the GitHub; the ideal parameter set for the user depends on their level of interest in very small areas of signal (few pixels with fluorescent reporter signal) and tolerance for false positives.Manual inspection of putatively errant classifications, via microscopy and false-color visualization of hyperspectral data with CUBEGLM, indicated that all observed disagreements were due to explant overlap, human error, or microcalli that were more readily visible under microscopy than with hyperspectral imaging (Fig. 1).
The various modules for image analysis were integrated to produce the GMODETECTOR pipeline for high-throughput TR phenotyping (Fig. 2; code and documentation at https://github.com/naglemi/GMOnotebook).The types of trait statistics obtained from this workflow are described in Table 1.Out of 19 961 samples, 16 107 passed quality control filtering criteria and were deemed suitable for downstream analysis.A total of 1266 genotypes were represented in at least one phenotype dataset (Table 1), while the remainder of the 1307 genotypes from the Corvallis replicate of the clone bank were not represented for reasons including failure to thrive in the glasshouse, significant fungal/bacterial contamination of all replicates, and/or technical errors during data collection.Further details for phenotyping are given in Methods S4.Visual examples of regeneration phenotypes across all experimental timepoints (week 3, week 7 and week 10) are given in Fig. S2.

Statistical analysis and network inference
We conducted a principal component analysis (PCA) across five trait groupings, employing unit vector scaling or Pareto scaling as appropriate.PCA was performed using PRCOMP in R (Methods S5) and scree plots (Figs S3-S7) were consulted to determine which principal components (PCs) explained major portions of variance and were thus used for downstream association mapping.Association mapping of computer vision traits (Table 1) and PCs derived from them was performed by mirroring methods from our prior studies (Nagle et al., 2023;M. F. Nagle et al., 2024).Our first study using these methods, for GWAS of in planta regeneration in poplar, includes a detailed explanation of the methods and comparisons of their relative advantages and disadvantages (M.New Phytologist (2024) www.newphytologist.comÓ 2024 The Authors New Phytologist Ó 2024 New Phytologist Foundation deployed using high-performance cluster resources provided by the NSF XSEDE and ACCESS programs.To identify and rank significance of quantitative trait loci (QTLs), we applied multiple-testing corrections using the Bonferroni and Benjamini-Hochberg false discovery rate (FDR) methods, as well as applying the Augmented Rank Truncation (Vsevolozhskaya et al., 2019) to identify associations represented by relatively weak signals (Methods S6).To obtain annotation data for candidate genes, we sourced gene annotations from PHYTOZOME (Tuskan et al., 2006), Arabidopsis homolog information from The Arabidopsis Information Resource (TAIR), and predicted lincRNA data from the GreeNC database (Di Marsico et al., 2022).
We performed gene ontology (GO) and pathway tests for groups of candidate genes categorized by their association with early or later TR stages, and used the PANTHER tool for overrepresentation tests (Mi et al., 2019;Thomas et al., 2022).Additionally, KEGG metabolic pathway enrichment tests were conducted using the CLUSTERPROFILER package in R (Methods Most shoot in this image is transgenic, although an 'escape' leaf with no eGFP is visible (orange arrow).(c) Segmentation output given RGB image data (segments: shoot (green), callus (blue), unregenerated stem (red), contamination (yellow)).(d) Cross-referencing of results from RGB-based segmentation and hyperspectral analysis to measure transgenic shoot.In phenomics outputs (b-d), shoot tissue is circled and highlighted with a manually added circle to aid visualization.A grid was also manually added to all these figures to aid visualization.
Table 1 Statistics collected from phenomics workflow and used for downstream analysis.A given trait vector used for downstream analysis represents a specific trait (a, b) for a specific tissue or tissue group (c) at one of three timepoints (d), either with or without the acetosyringone (AS) treatment.For example, the first row corresponds to two trait vectors: (1) frequency of transgenic stem/ callus at week 3 with AS; or (2) without AS in co-cultivation media.eGFP, enhanced green fluorescent protein.
Ó 2024 The Authors New Phytologist Ó 2024 New Phytologist Foundation New Phytologist (2024) www.newphytologist.comS7).To construct a network model to inform candidate gene prioritization based on multi-omic lines of evidence, we integrated data from published studies (Table 2) involving Arabidopsis homologs of candidate genes.Using IGRAPH in R, we computed statistics for each node including degree and a Random Walk with Restart (RWR) score (Methods S8).The network was visualized using CYTOSCAPE, focusing on paths connecting candidate gene orthologs.
To support epistasis tests, we first used PLINK to produce a filtered SNP set in which 400 candidate genes were represented by c. five of the most significant corresponding SNPs.Epistasis tests were conducted using the epistasis function of the FAST-LMM PYTHON package (Lippert et al., 2011) (Methods S9).

PCA informs contrasts across stages and types of regeneration
For each of five PCA groupings, we obtained two components appearing before 'elbows', all of which accounted for > 10% of variance.For the three groupings of callus regeneration, callus transformation, and shoot transformation, inspection of loadings indicated that the first components of each represent a general positive trend across input traits, while the second components represent contrasts (e.g. between treatments, or frequency and size of tissues; Figs S3, S4, S6).For instance, for the callus regeneration trait PCA, PC1 displayed similar values across the four traits of callus size and callus frequency with and without added acetosyringone (AS) treatment, although callus size traits appear to contribute slightly more than callus frequency traits.PC2 displayed a clear contrast between callus size traits (positive values) and callus frequency traits (negative values), enabling genetic mapping of the contrast between initial callus emergence and subsequent proliferation.
For the two groupings with traits spanning multiple tissue types (transgenic frequency batch and comprehensive batch), PC1 and PC2 represented contrasts across stages of regeneration and/or types of phenotype measures.Although PCs for these groupings are not simple to interpret, clusters of values are clearly visible corresponding to distinct measures and/or stages of regeneration.For the transgenic tissue frequency PCA, both PC1 and PC2 display clusters of values for three groups: (1) frequencies of transgenic shoots; (2) frequencies of transgenic tissues labeled as callus; and (3) frequencies of transgenic tissues labeled as either callus or stem, which we determined to include very early or stunted stages of microcallus development that were not clearly visible and classifiable by our system as callus (Fig. S7).No significant PCs for any of the five PCA groupings appear to represent clear contrasts between media treatments with and without added AS.

Large number of associations supported by combination of GWAS methods
Among traits with a h 2 SNP above 0.10, all related to callus alone or combinations/contrasts of callus/shoot were investigated for candidate genes, with the exception of a borderline-heritable trait representing a contrast of callus and shoot regeneration with and without extra AS (Table S2).Traits representing shoot alone could not be mapped successfully due to problems with model-fitting for these highly sparse traits.Where multiple loci within 30 kb passed significance thresholds, the candidate locus was taken as that of greatest significance (henceforth, 'QTL peak').MTMC-SKAT yielded 97 QTL peaks passing the conservative Bonferroni threshold and 319 passing this and/or the FDR (α = 0.10) threshold.Only nine QTL peaks passing the FDR (α = 0.10) threshold were identified by GEMMA, and none by GMMAT.The use of ART greatly improved statistical power for GEMMA and GMMAT, enabling the identification of 203 and 22 QTL peaks, respectively (Fig. 3; Table S3).Across methods, traits, and significance thresholds, we report a total of 409 unique poplar candidate genes implicated by a QTL peak within 5 kb, and 271 of these are annotated as being putatively orthologous with a specific Arabidopsis gene (Tables S4, S5).Descriptions and sources are provided for each curated dataset related to Arabidopsis homologs of candidate genes and their regulation.These published datasets provide information on differentially expressed genes (DEGs), and interactions supported by yeast one-hybrid (Y1H) and yeast two-hybrid (Y2H) assays.This published data was integrated to support the construction and analysis of a regulatory network comprising homologs of our genes and genes with which they interact.

Gene ontology analysis reveals significantly overrepresented categories
The 271 Arabidopsis orthologs of candidate genes within 5 kb of a QTL peak were evaluated with GO and pathway enrichment tests.Of the three groups of candidates tested (week 3 machine vision traits, weeks 7-10 machine vision traits, and contrasts/ trends from PCA), only the first group (week 3) yielded significantly overrepresented GO terms (Fig. 4).These indicate prominent roles for genes involved in outer parts of cells ('cell periphery' and 'plasma membrane'), protein-protein interactions ('protein binding'), 'gene expression' and metabolic, catabolic or biosynthetic processes, including terms suggesting involvement in RNA interference and protein degradation.

Network analysis supports the identification of likely regulatory hubs
Network analysis was performed using the 271 Arabidopsis orthologs of poplar candidate genes implicated by a QTL peak within 5 kb, with a network constructed given edges curated from literature (Table 2).Among these genes, 206 had at least one  S3.
Fig. 4 Fold-enrichment and false discovery rate (FDR)-corrected P-values for overrepresented gene ontology (GO) terms found among candidate genes.Gene ontology overrepresentation analysis was performed using PANTHER (https://www.pantherdb.org/),using Arabidopsis genes labeled in the Populus trichocarpa genome annotation as being orthologous with candidate genes supported by a quantitative trait locus peak inside the gene or within 5 kb.Color represents GO term aspects: cellular component (orange), molecular function (green), and biological process (purple).
Ó 2024 The Authors New Phytologist Ó 2024 New Phytologist Foundation New Phytologist (2024) www.newphytologist.comedge and 82 had more than one edge (degree > 1; Fig. 5).A contiguous network included 202 of these genes.Among these, 79 were differentially expressed in pi-4kβ1, 2 mutants, 183 implicated as being regulated by WUS, HAN and/or GRF1 via differential expression and/or promoter binding, and 62 appear in the region of overlap as being regulated by WUS/HAN/GRF1 and the P14Kβ family (Fig. 6; Zhang et al., 2013;Ma et al., 2019;Piya et al., 2020;Starodubtseva et al., 2022).RWR scores were computed with WUS as a seed node and displayed two clusters, where genes in the cluster with greater scores have direct connections to WUS and others do not (Fig. 5b,c).

Epistasis testing provides evidence for interactions relevant to regeneration in poplar
We used Factored Spectrally Transformed Linear Mixed Models (FAST-LMM) to screen for epistatic interactions among 400 candidate genes with respect to 19 traits (the Materials and Methods section).Results passing Bonferroni and/or FDR (α = 0.10) significance thresholds were only found for a single trait, representing total callus size at week 3 under the control treatment (without added AS).We report 45 SNP-SNP interactions passing the Bonferroni threshold, representing 18 pairs of distinct gene-gene interactions involving 20 genes.With the stringency of multiple-testing correction relaxed to FDR (α = 0.10), we report 294 SNP-SNP interactions, representing 115 gene-gene interactions involving 81 genes (Figs 7, 8; Tables S6, S7).
We cross-referenced results from epistasis testing together with the interaction network curated from literature review.Among the 81 candidate genes indicated by FAST-LMM to be involved in epistatic interactions, 59 were annotated as having Arabidopsis orthologs, 38 of which were represented in network data curated from literature, and 27 of which were in a contiguous network constructed from the literature-curated data (which featured 202 Arabidopsis orthologs of candidate genes in total).We observed no cases where parallel direct edges were found across the FAST-LMM epistasis and literature-curated network datasets (e.g.Gene X and Gene Y directly linked by our epistasis test in addition to published yeast one-hybrid interaction).

Candidate genes imply important roles of hormone signaling and cross talk
We undertook an extensive literature review to investigate candidate genes prioritized based on h 2 SNP of traits, proximity of SNPs to genes, overrepresented GO categories, and multiple lines of evidence from network analysis and epistasis testing.We only considered 271 candidate genes which had an Arabidopsis homolog as indicated in the P. trichocarpa genome annotation.
Our most notable candidate genes represent regulators of hormone-regulated gene expression, biosynthesis of hormones and other metabolites, and wound response, pH, nutrient and hormone transport, cell structure and more.These associations and the body of literature on their Arabidopsis homologs demonstrate that capacity for in vitro regeneration depends on highly complex, nonlinear and interdependent hormonal pathways, Fig. 5 Summary statistics from network analysis of candidate genes.Edges connecting genes (nodes) based on in vitro assays or their differential expression in transgenic lines were curated from literature (Table 2).Statistics are shown for 201 Arabidopsis orthologs of candidate genes that were connected into a contiguous network, except for NF-YC9, HAN, OBP1 and PI-4Kβ1 due to their outlier status as a result of being the focus of curated datasets (Table 1).All network statistics are shown in Supporting Information Tables S4 and S5 2).For simplicity and readability, this figure is limited to curated datasets closely linked to auxin/cytokinin signaling (those for WUS, GRF1 and HAN ), or salicylic acid signaling (via the PI-4Kβ family).The complete set of edges used in our upstream network construction and analysis are given in Supporting Information Tables S4 and S5.In addition to candidate genes (orange and purple), other genes relevant to pathways connecting candidates are also shown (gray).Genes represented by identical edges are placed adjacently and represented by with a single nonredundant edge or edges for the cluster.For genes represented by at least some degree of functional characterization and which have a common name on the Arabidopsis Information Resource (arabidopsis.org;Berardini et al., 2015), the common name acronym is given as a label.For genes for which a common name was not retrieved, typically due to little or no characterization of the gene, the accession ID is given instead.This figure was produced using CYTOSCAPE.
Ó 2024 The Authors New Phytologist Ó 2024 New Phytologist Foundation New Phytologist (2024) www.newphytologist.comalong with downstream regulators of cell division, expansion, and structure (Fig. 9).Here, we emphasize the candidate genes we identified that are most likely to act as regulatory hubs of hormone signaling, as transgenic perturbations of these may show the most promise in helping to improve regeneration of recalcitrant plants.Summaries of additional selected candidate genes are provided in Notes S1.

Auxin and cytokinin network
Key regulators of auxin-related epigenetic reprogramming are in the GRF family (Szczygieł-Sommer & Gaj, 2019), members of which appear to be involved in recruitment of SWITCH/SUCROSE NONFERMENTING (SWI/SNF) chromatin remodeling complex (Debernardi et al., 2014) and include our candidate Potri.002G115100 (homolog of GRF1).This poplar homolog is a likely target of a stem-expressed microRNA transcriptionally regulated downstream of auxin (Yang et al., 2023).In addition to acting both upstream (Szczygieł-Sommer & Gaj, 2019) and downstream (Beltramino et al., 2021;Yang et al., 2023) of auxin, the GRF family has been implicated downstream of gibberellin signaling (van der Knaap et al., 2000) and upstream of cytokinin, JA and SA, among other phytohormone pathways (Piya et al., 2020).A key role of GRF1 as a regeneration regulatory hub is supported by our epistasis results, showing significant interactions with 18 other candidates.
HANABA TANARU (HAN, homolog of Potri.005G122700)represents another, likely major nexus between auxin/cytokinin signaling and other phytohormones.A transcriptomic study of wood development in Populus alba × glandulosa found the poplar homolog to be differentially expressed in cambium of mature stems (Kim et al., 2019).Arabidopsis overexpression lines of HAN display differential expression of hundreds of genes representing diverse phytohormone pathways, including that of auxin, cytokinin, ethylene, JA, and SA, among others (Zhang et al., 2013).These include orthologs of many of our candidate genes Fig. 7 Tallies of epistatic interactions detected, shown for the 12 candidate genes for which pairwise interactions were detected with six or more candidate genes.Interactions were tallied by the most stringent multiple-testing threshold each passed for a given gene pair, including Bonferroni (more stringent) and the false discovery rate (FDR; less stringent).All interactions passing the Bonferroni threshold also passed the FDR threshold, and are only tallied toward the more stringent Bonferroni threshold in this figure .New Phytologist (2024) www.newphytologist.comÓ 2024 The Authors New Phytologist Ó 2024 New Phytologist Foundation (Fig. 6).These widespread effects on gene expression may result largely from direct transcriptional regulation of CYTOKININ OXIDASE 3, as well as through other targets (Ding et al., 2015).HAN loss-of-function leads to disrupted apical/basal patterning and meristem formation during embryogenesis, indicated by meristematic development markers including WOX5 (Nawy et al., 2010), a poplar homolog of which influences adventitious rooting when overexpressed (Li et al., 2018).Our epistasis results support an interaction of poplar HAN with Potri.010G161700(homolog of MITOCHONDRIAL EDITING FACTOR 21), Genes of greatest degree (most connections) include homologs of phytohormone signaling regulators GRF1, GA2OX1 and ZRF1A, as well as alternative splicing regulator PTB3, each of which were targeted for literature review (the Discussion section), as well as two genes of unknown function.Genes are labeled by common names for the Populus trichocarpa (poplar) gene and/or Arabidopsis homolog where such names are found in the poplar genome annotation, while uncharacterized or otherwise unnamed genes without annotated putative homologs are instead represented by accession IDs (Tuskan et al., 2006).which may regulate RNA splicing in mitochondria in a manner responsive to abiotic stress and ABA signaling (Takenaka et al., 2010;J-M. Liu et al., 2016).
PI-4Kβ1 (homolog of Potri.007G109000) is believed to affect SA (Janda et al., 2014;Šašek et al., 2014) via biosynthesis of phosphoinositides, which function as cofactors for enzymes involved in membranes and vesicles responsible in part for signaling molecule transport (Mayinger, 2012;Starodubtseva et al., 2022).In addition to regulating levels of SA itself, phosphoinositides may regulate the action of downstream SA-signaling genes, which are highly represented among targets of the phosphoinositide-dependent phospholipase C (Kalachova et al., 2015).Double loss-of-function mutants of PI-4Kβ1 and the related PI-4Kβ2 (pi-4kβ1, 2) show phenotypes including overaccumulation of SA (Janda et al., 2014;Šašek et al., 2014) and insensitivity to exogenous auxin (Starodubtseva et al., 2022).Our epistasis results support four interactions for PI-4Kβ1, including with two uncharacterized genes, an elongation factor and the phosphate scavenger PURPLE ACID PHOSPATASE 12 (Hurley et al., 2010;Robinson et al., 2012) (Fig. 8; Notes S1), and our network integration supports the co-regulation of diverse genes by PI-4Kβ1 and regulators of the auxin-cytokinin axis (Fig. 6).

Abscisic acid and gibberellin signaling
Other candidate genes suggest a role for ABA and gibberellins.NF-YC9 (homolog of Potri.005G035800)encodes a member of a transcriptional complex responsible for ABA and gibberellin signaling, with loss-of-function and overexpression lines displaying altered sensitivity to one or both hormones (X.Liu et al., 2016;Bi et al., 2017).In addition, one other candidate Fig. 9 Proposed model integrating roles of in vitro transformation and regeneration candidate genes in phytohormone signaling, crosstalk, and biosynthesis.Evidence for direct and indirect mechanistic relationships between genes was collected from studies in model organisms (Supporting Information Table S8).Interactions are assumed to be indirect except where strong functional evidence suggests a direct interaction.Key candidate genes with roles in gene expression regulation or hormone biosynthesis/metabolism are presented here, from both the current study and our prior genome-wide association study of in planta regeneration.This proposed model is simplified by combining connections with respect to seven gene families (ARF, PLT, SWI/SNF, HDIII-ZIP, WIND, ESR1/2, JAZ ) with partially redundant and/or sequential roles.This model is simplified and nonexhaustive, intended to summarize likely roles of key candidate genes in these pathways rather than comprehensively explaining the pathways.This figure was produced using BIORENDER.

Research
New Phytologist gene (putative RNA-splicing regulator PTB3; Potri.002G066000) was indicated by our epistasis tests to interact with poplar NF-YC9.We identified two candidate genes involved in metabolism of gibberellins.GIBBERELLIN 2-OXIDASE 2 (homolog of Potri.004G065000) and GIBBERELLIN 20-OXIDASE 1 (homolog of Potri.014G073700)encode multiple steps of a gibberellin metabolic pathway (Fig. S8; Kanehisa et al., 2023).Roles for these genes in TR may involve the regulation of GRF family genes by gibberellin signaling (van der Knaap et al., 2000), and/or other functions of gibberellins in regulating plant growth and development.

Cell periphery regulators or structural proteins
In addition to transcriptional and metabolic regulators, components and regulators of the cell periphery were shown by GO analysis to be significantly overrepresented among our candidates.Cell expansion is also believed to be influenced by EXOR-DIUM (EXO) and related genes (including EXORDIUM-LIKE 3, a homolog of Potri.015G129700), which appear to bridge brassinosteroid signaling with downstream expression of expansins and other growth regulators (Schröder et al., 2009).Overexpression of EXO was reported to enhance in vivo root and shoot growth (Coll-Garcia et al., 2004).Additionally, cell expansion can be affected by stiffening of cell walls with lignin; Potri.016G107900 is a homolog of LACCASE 7 (LAC7), known along with other LACCASE genes to catalyze the biosynthesis of lignin in diverse plants including poplar (Sterjiades et al., 1992;Zhao et al., 2015;Qin et al., 2020;Yu et al., 2020) while producing ROS byproducts (relevance discussed below; Perna et al., 2020).

Prioritization and functional validation of candidate genes and combinations thereof
Functional characterization of morphogenic regulators that can enhance regeneration when overexpressed has shown that these genes tend to function as regulatory hubs that control cascades of transcriptional regulation closely associated with auxin and cytokinin (Ikeuchi et al., 2018;Nagle et al., 2018;Gordon-Kamm et al., 2019;Lee & Wang, 2023), for example in the cases of BBM and the downstream LEAFY COTYLEDON family (Horstman et al., 2017), WUS (Ma et al., 2019;Jha et al., 2020), the WIND family and downstream ESR family (Iwase et al., 2017(Iwase et al., , 2021)), the PLT family (Kareem et al., 2015), and the GRF family (Beltramino et al., 2021).Together with our GO evidence for a significant role of transcription factors, we propose that transgenic perturbation of other candidate genes that function as auxin/cytokinin-signaling transcription factors, including OBP1, SLK2, AGL8, and HAN, may enhance regeneration.
The diverse roles of Arabidopsis homologs of our candidate genes across phytohormone pathways suggests that transgenic perturbation of ethylene, JA and SA signaling via candidates such as WRKY7 and WRKY50, or other transcription factors, may help to improve TR.Stress and wounding response downstream of these hormones may act as 'bottlenecks' of regeneration via cross talk with auxin/cytokinin signaling, and/or through roles in necrosis, ROS homeostasis, growth and metabolism.
Reliable validation of highly polygenic traits, especially those marked by complex gene interactions and cell-or temporalspecific expression, presents a challenge for the commonly used functional validation approaches that employ simple gain-or loss-of-function methods.As an alternative and supplement to mutant studies, multi-omic integration and network analysis have become key in GWAS candidate gene validation, with a recent meta-analysis underscoring the roles of these in silico approaches to provide added support for noncoding variants in human GWAS (Alsheikh et al., 2022).In addition, the availability of many published high-throughput datasets from Arabidopsis enabled us to construct a network model that informs possible functional relationships for over 200 of our candidate genes.
However, as published datasets and genome annotations are limited to known and characterized genes, there is likely to be confirmation bias with respect to known pathways.Among candidate genes putatively implicated by 409 QTLs in this work, 138 lacked known Arabidopsis homologs in the genome annotation.This risk of bias can be mitigated in future work via comprehensive and untargeted genome-wide gene/protein interaction datasets, along with the continued characterization and annotation of genes in reference genomes.
The key contributions from our epistasis and network analyses are the insights they provide into prioritizing candidate genes for further research, particularly with regard to design of experimental strategies to confirm or characterize their functions.For instance, in planta epistasis mutant studies and in vitro interaction assays can take advantage of putative gene-gene interactions to clarify conservation of gene function across poplar and model species homologs.In addition to functional validation, analysis of gene interactions in diverse genotypes should offer clues regarding the high genetic diversity in amenability to TR, and thus may help to develop 'gene cocktails' that will enhance TR in diverse genotypes.Methods S7 Gene ontology and pathway overrepresentation testing.
Methods S8 Network analysis.
Notes S1 Literature review of additional candidate genes.
Table S1 Overview of three sterilization methods used for tissue culture and listing of sterilization methods for each experimental phase.
Table S2 Metadata for in vitro traits, including methods and results of transformations and SNP heritability as computed by Genome-wide Efficient Mixed Model Association.
Table S3 Summary statistics of in vitro associations with respect to methods, significance thresholds, and genes most closely implicated by quantitative trait locus peaks.
Table S4 In vitro callus and shoot regeneration and transformation associations passing Bonferroni and/or Benjamini-Hochberg False Discovery Rate (FDR; α = 0.10) thresholds.
Table S5 In vitro callus and shoot regeneration and transformation associations passing Augmented Rank Truncation-Bonferroni threshold.
Table S6 Numbers of significant epistatic interactions detected by Factored Spectrally Transformed Linear Mixed Models, tallied by candidate gene and significance threshold.
Table S7 Detailed results from Factored Spectrally Transformed Linear Mixed Models tests for epistatic interactions among candidate genes from genome-wide association study.

Table S8
Interactions among Arabidopsis homologs of gene candidates and their intermediates.
Please note: Wiley is not responsible for the content or functionality of any Supporting Information supplied by the authors.Any queries (other than missing material) should be directed to the New Phytologist Central Office.

Fig. 1
Fig. 1 Classification of explants as transgenic or not based on thresholding with fluorescent reporter weights.(a-d) Results are shown for selected parameters (transgenic if 3 pixels have fluorescent reporter weight of 38 or greater), which were first obtained by tuning for DsRed and then confirmed to also work for enhanced Green Fluorescent Protein (eGFP).Hyperspectral image data for explants is shown in false color, with DsRed weight in green and chlorophyll weight in red.The model was trained with a ground truth dataset based on human classifying explants as transgenic or not using fluorescent microscopy.(a) True positives typically displayed clear and strong reporter signal in callus and/or shoot.(b) False negatives typically featured very small areas of reporter signal, which were more readily identified by microscopy than our macroscopic hyperspectral system.A box (dotted line) displays a 'zoomed-in' view of an area with few pixels (orange arrow) displaying reporter signal.(c) False positives were found to represent cases of one explant growing into a grid position assigned to an adjacent explant on the same Petri dish (blue arrow) in all cases inspected.(d) True negatives displayed no fluorescent signal.(e) We performed a parameter sweep to identify parameters that worked well to provide high true positive rates (blue) with low false positive rates (red).Parameters include thresholds for weight of fluorescent signal intensity representative of the reporter protein (signal weight, x-axis) and for the number of pixels with this signal weight (pixel count, z-axis).Datapoints for the selected heuristic (pixel count of 3 and signal weight of 38) are enlarged and opaque.This plot was produced with PLOTLY in R.

Fig. 2
Fig.2GMODETECTOR integration of red, green, and blue (RGB)-based segmentation and hyperspectral analysis to measure frequencies of transformation and regeneration of specific tissues.(a) Raw RGB image data for a given explant.(b) False-color representation of hyperspectral image data for given explant, where green and red represent weights indicative of enhanced Green Fluorescent Protein (eGFP) and chlorophyll, respectively.Most shoot in this image is transgenic, although an 'escape' leaf with no eGFP is visible (orange arrow).(c) Segmentation output given RGB image data (segments: shoot (green), callus (blue), unregenerated stem (red), contamination (yellow)).(d) Cross-referencing of results from RGB-based segmentation and hyperspectral analysis to measure transgenic shoot.In phenomics outputs (b-d), shoot tissue is circled and highlighted with a manually added circle to aid visualization.A grid was also manually added to all these figures to aid visualization.

Fig. 3
Fig. 3 Tallies of in vitro callus and shoot associations across genome-wide association study methods.(a) Conservative Bonferroni correction, given assumption of independence between all single-nucleotide polymorphisms (SNPs); Genome-wide Efficient Mixed Model Association (GEMMA); Generalized Mixed Model Association Test (GMMAT) or SNP windows for Multi-Threaded Monte Carlo SNP-Set Sequence Kernel Association Test (MTMC-SKAT).(b) False discovery rate (α = 0.10) correction computed by Benjamini-Hochberg method, also given assumption of independence.(c) Augmented Rank Truncation (ART)-Bonferroni, where the number of tests is considered as the number of possible 1 kb ART windows in the genome.Additional statistics on candidate genes are given in Supporting Information TableS3.
Fig.5Summary statistics from network analysis of candidate genes.Edges connecting genes (nodes) based on in vitro assays or their differential expression in transgenic lines were curated from literature (Table2).Statistics are shown for 201 Arabidopsis orthologs of candidate genes that were connected into a contiguous network, except for NF-YC9, HAN, OBP1 and PI-4Kβ1 due to their outlier status as a result of being the focus of curated datasets (Table1).All network statistics are shown in Supporting Information TablesS4 and S5.Summary statistics are shown for the collapsed version of the network in which parallel edges between any two given genes are reduced to single edges (the Materials and Methods section).(a) Degree (k) provides a measure of the number of genes that a given gene is connected to.(b) The random walk with restart (RWR) score provides measure of relevance of candidate genes with respect to WUS (used as seed).(c) Relationship between k and RWR score.

Fig. 6
Fig.6Sub-network for candidate genes with Arabidopsis homologs.Arabidopsis homologs of candidate genes are presented along with connections from published transcriptomic, ChIP-seq, yeast one-hybrid (Y1H) and yeast two-hybrid (Y2H) datasets (Table2).For simplicity and readability, this figure is limited to curated datasets closely linked to auxin/cytokinin signaling (those for WUS, GRF1 and HAN ), or salicylic acid signaling (via the PI-4Kβ family).The complete set of edges used in our upstream network construction and analysis are given in Supporting Information TablesS4 and S5.In addition to candidate genes (orange and purple), other genes relevant to pathways connecting candidates are also shown (gray).Genes represented by identical edges are placed adjacently and represented by with a single nonredundant edge or edges for the cluster.For genes represented by at least some degree of functional characterization and which have a common name on the Arabidopsis Information Resource (arabidopsis.org;Berardini et al., 2015), the common name acronym is given as a label.For genes for which a common name was not retrieved, typically due to little or no characterization of the gene, the accession ID is given instead.This figure was produced using CYTOSCAPE.

Fig. 8
Fig. 8 Epistatic interactions detected by Factored Spectrally Transformed Linear Mixed Models (FAST-LMM) are shown for (a) large contiguous network; and (b) islands disconnected from the larger network.The width of connections represents significance, while green connections represent passing the Bonferroni threshold and yellow represents passing the false discovery rate (FDR) threshold (α = 0.10) computed by the Benjamini-Hochberg method.Genes of greatest degree (most connections) include homologs of phytohormone signaling regulators GRF1, GA2OX1 and ZRF1A, as well as alternative splicing regulator PTB3, each of which were targeted for literature review (the Discussion section), as well as two genes of unknown function.Genes are labeled by common names for the Populus trichocarpa (poplar) gene and/or Arabidopsis homolog where such names are found in the poplar genome annotation, while uncharacterized or otherwise unnamed genes without annotated putative homologs are instead represented by accession IDs(Tuskan et al., 2006).
Downloaded from https://nph.onlinelibrary.wiley.com/doi/10.1111/nph.19737by Oregon State University, Wiley Online Library on [23/04/2024].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License

Fig. S8
Fig. S8 Portion of gibberellin metabolic pathway regulated by homologs of two candidate genes.Methods S1 GWAS population.Methods S2 Assays of in vitro regeneration and transformation.Methods S3 Image acquisition.Methods S4 Computer vision workflows.Methods S5 Principal component analysis.Methods S6 Association mapping.

Table 2
Published datasets that were integrated for network analysis.