Unraveling the role of MADS transcription factor complexes in apple tree dormancy

Summary A group of MADS transcription factors (TFs) are believed to control temperature‐mediated bud dormancy. These TFs, called DORMANCY‐ASSOCIATED MADS‐BOX (DAM), are encoded by genes similar to SHORT VEGETATIVE PHASE (SVP) from Arabidopsis. MADS proteins form transcriptional complexes whose combinatory composition defines their molecular function. However, how MADS multimeric complexes control the dormancy cycle in trees is unclear. Apple MdDAM and other dormancy‐related MADS proteins form complexes with MdSVPa, which is essential for the ability of transcriptional complexes to bind to DNA. Sequential DNA‐affinity purification sequencing (seq‐DAP‐seq) was performed to identify the genome‐wide binding sites of apple MADS TF complexes. Target genes associated with the binding sites were identified by combining seq‐DAP‐seq data with transcriptomics datasets obtained using a glucocorticoid receptor fusion system, and RNA‐seq data related to apple dormancy. We describe a gene regulatory network (GRN) formed by MdSVPa‐containing complexes, which regulate the dormancy cycle in response to environmental cues and hormonal signaling pathways. Additionally, novel molecular evidence regarding the evolutionary functional segregation between DAM and SVP proteins in the Rosaceae is presented. MdSVPa sequentially forms complexes with the MADS TFs that predominate at each dormancy phase, altering its DNA‐binding specificity and, therefore, the transcriptional regulation of its target genes.


Fig. S3 Protein-protein interactions among apple DAM-, SVP-and FLC-like proteins. a)
Yeast two-hybrid assay using full-length apple proteins. MdDAM1, MdDAM2 and MdDAM4 showed strong autoactivation even in the drop-out media supplemented with 5 mM of 3AT. To circumvent this issue, the C-terminal region of all proteins was removed, as it is known that this region is responsible for the generation of autoactivation in MADS-domain proteins. b) Yeast two-hybrid assays using truncated protein versions. Interactions were tested in both directions, unless a positive interaction was identified before. In this case, the other direction was not tested (blank spaces with nt -not tested). Proteins that did not produce autoactivation were not evaluated in the drop-out media supplemented with 3AT. The negative controls are representative pictures of several independent controls that were used in this assay. , MdSVPb (f) and MdFLC (g) were translationally fused to GFP and Nicotiana benthamiana leaves were agroinfiltrated and observed in a confocal microscope after 3 days of incubation. h) VENUS fused to a nuclear localization signal (NLS). Scale bar 100 µm.

Fig. S5 Flowering time under LD of Arabidopsis F1 plants in comparison to WT and homozygous lines. a)
Total leaf number (including both cauline and rosette leaves) was scored prior to flowering. b) Number of days from germination to bolting (elongation of the first internode around 0.5 cm). c) Number of days from germination to the opening of the first flower. The box extends from the 25th to 75th percentiles, the line in the middle is plotted at the median, and the whiskers are drawn down to the 10th and up to the 90th percentile. The outliers below and above the whiskers are drawn as individual points. One-way ANOVA followed by Tukey's test was used for the statistical tests. Letters shared in common between the genotypes indicate no significant differences (for P ≤ 0.05).

Fig. S6
Chilling hours accumulated during the dormancy cycle. The accumulation of chilling hours (number of hours with temperatures below 7.2 °C) was monitored by daily recording the air temperature in field trials. Red arrows represent sampling points for RT-qPCR studies.

Fig. S7 Complementary information about the seq-DAP-seq assays. a)
In vitro coimmunoprecipitation of proteins. MdDAM1, MdDAM4, MdFLC and MdSVPa were translationally fused to FLAG, whereas MdDAM1, MdDAM4, MdDAMb, MdFLC and MdSVPa were translationally fused to MyC. Protein fusions were synthesized in vitro and tested in pairs. The input was composed of total proteins recovered before immunoprecipitation. FLAG-fused proteins were immunoprecipitated using anti-FLAG beads and immunoblotted using anti-FLAG or anti-MyC antibody. b) Logos of the enriched CArG-box motifs identified with the degenerated CArG-box strategy (see Methods S1). c) Frequency distribution of CArG-box motifs in peaks associated to genes for each complex and to 1,000 random sets. Bars represent 95% confidence interval. d) GO term enrichment analysis of target genes containing a CArG-box for each apple transcriptional complex. For data visualization, the best P-value was transformed using log. Note that all P-values higher than 0.05 were replaced by zero (white boxes). e) DNA-binding profiles of the four complexes and the control (input) to the locus region of MdDAM1, MdDAMb and MdFLC. Horizontal bars below the plots represent the position of the peak regions. The color code between the plots and the bars is preserved. The Integrated Genome Browser (IGB) was used for visualization.

Fig. S8 Complementary information about the GR DEX-inducible assays. a)
Dendrogram illustrating the distribution of the RNA-seq replicates in the GR DEX-inducible assay performed in transgenic apple calli. b) Venn diagram illustrating common DEGs between the four TFs using the GR DEX-inducible system. c) GO term enrichment analysis of genes identified in transgenic apple calli after 8 hours of DEX induction. Enrichment tests were performed separately for up-and downregulated gene sets and only the best P-value (smallest) was kept. For data visualization, the best P-value was transformed using -log or log when it belonged to the up (orange gradient) or downregulated (blue gradient) gene set, respectively. Note that all P-values higher than 0.05 were replaced by zero (white boxes).  Moser et al. 2020. b) Expression analysis of apple bud samples harvested from 'Fuji' trees in the field and exposed to controlled chilling conditions according to Takeuchi et al. 2018. In (a) and (b), bars represent the minimum and maximum values obtained. c) Venn diagram illustrating the overlap between dormancy-related DEGs. Only genes present in both datasets were considered in this analysis. The obtained P-value (hypergeometric test) for the intersection between datasets was equal to 0. Time-course expression analysis of apple buds harvested from field-grown 'Golden delicious' trees according to Moser et al. 2020. Bars represent the minimum and maximum values obtained. The Integrated Genome Browser (IGB) was used for visualization. The arrows represent induction or repression according to the expression data obtained in the calli assay for genes belonging to the high-confidence list of target genes (Data S4).  (Tabuenca, 1964).

Gene expression studies
Total RNA was isolated using the Spectrum™ Plant Total RNA kit (Sigma-Aldrich), and

DNase-treated using the TURBO DNA-free Kit (Ambion). The SuperScript™ III First-Strand
Synthesis System (Thermo Fisher Scientific) was used for cDNA synthesis according to manufacturer's instructions. Real-time PCR was performed using the LightCycler 480 instrument (Roche), and relative expression was calculated using the 2( -ΔΔ Ct) method as described (Livak & Schmittgen, 2001;Falavigna et al., 2018).

Complementation assay in Arabidopsis
The CDS of MdDAM1, MdDAM2, MdDAM4, MdDAMb, MdSVPa, MdSVPb, SVP and Venus were independently introduced into a modified pGreen0229 Gateway vector (pYB187, kindly provided by Dr. Youbong Hyun, MPIPZ, Cologne) (Hellens et al., 2000) in which the p35S promoter was replaced by the Arabidopsis SVP promoter. To this end, overlapping primers were designed to amplify a region of 3 kb upstream of the TSS of SVP with complementary borders to the pYB187 vector (Table S1). In parallel, PCR reactions were performed to amplify the pYB187 plasmids carrying the cloned CDS sequences. Overlapping fragments were assembled by Polymerase Incomplete Primer Extension (Stevenson et al., 2013).

Protein production and purification for EMSA and seq-DAP-seq experiments
The full-length CDS sequences of MdDAM1, MdDAM4, MdDAMb, MdFLC and MdSVPa were amplified without the stop codon and the restriction sites of EcoRI and SalI were added to their 5' and 3', respectively (Table S1). Each gene was amplified and cloned into the double EcoRIand SalI-digested XLp34 and XLp39 vectors (Lai et al., 2020), fusing the genes to 3xFLAG and 5xMyC, respectively, to avoid epitope-tag interference to the N-terminal MADS DNA-binding domain. In pairs, tagged proteins were simultaneously produced in vitro using the TNT® SP6 High-Yield Wheat Germ Protein Expression System (Promega). Protein complexes were purified using anti-FLAG magnetic beads (Merck Millipore), and Western blots were performed as

Calli transformation and RNA-seq
In vitro cuttings of apple cultivar Gala were subcultured, and apple leaf transformation was carried out to produce transformed calli as described elsewhere (Estevan et al., 2020).
Positive transformed calli were selected in media supplemented with antibiotics and the observation of GFP fluorescence. Six months after leaf transformation, 30 transformed calli were obtained for each construct. Transformed calli were pretreated with 40 μM CHX (Sigma-Aldrich) for 30 min, rinsed with distilled water, and treated with 10 μM DEX (Sigma-Aldrich) or mock (ethanol). After 1 h, samples were rinsed with distilled water, incubated at room temperature for 7 h and then were frozen in liquid nitrogen and stored at −80 °C.

Read cleaning and mapping of seq-DAP-seq assays
Raw seq-DAP-seq reads were pre-processed by removing potential sequence adapters with Cutadapt (Martin, 2011) and trimming of low-quality bases (Q < 15) at the ends using Trimmomatic (Bolger et al., 2014). All pre-processed reads with final lengths smaller than 50 bases were discarded. Cleaned reads were mapped to the apple double-haploid genome version 1.1 (Daccord et al., 2017) using BWA (Li & Durbin, 2009) with default settings. Raw BWA alignments were filtered by only keeping alignment pairs with mapping quality of at least 20 and by removing secondary alignments.

Peak calling and annotation
Raw seq-DAP-seq peaks were individually called together with a single control sample (input) using the program MACS2 (Zhang et al., 2008) (parameters: -p 0.05 -g 550601767).
Individual peak sets were called for all three replicates of the four different complexes.
Reproducibility between the replicate peak sets was assessed using the irreproducible discovery rate (IDR) of 0.01 with the framework available from the BioConda channel of the conda package manager (Dale et al., 2018). The final peak sets were generated by merging overlapping peaks that passed the IDR assessment. Peaks were annotated to apple genes (3 kb up-and 1 kb downstream) using the Bioconductor R package ChIPpeakAnno (Zhu et al., 2010).

Peak position analysis
The preferred location of peaks relative to genes was analyzed using a previously described approach (Romera-Branchat et al., 2020). Briefly, for every peak annotated to a gene, the distance between the peak center to the TSS or TES of the respective gene was calculated depending on whether the peak was located upstream or downstream of it, respectively. For those peaks residing within the gene body, a normalized distance was calculated in relation to the TSS. The normalized distance was obtained using a gene-specific length factor that sets the gene length to 2,000 nucleotides. The observed peak location distribution was compared to that of 1,000 random peak sets consisting of the exact number of peaks with the same length as the observed peak set.

Identification of enriched motifs
De novo motif enrichment analysis was performed using MEME-chip (Ma et al., 2014). For that, the central 100 nucleotides of the peaks were used for motif discovery (parameters: memechip -meme-minw 8 -meme-maxw 15 -meme-nmotifs 15 -meme-mod anr -db JASPAR2018_CORE_plants_non-redundant_pfms_meme.txt -centrimo-local -centrimo-ethresh 1), provided with binding sites of plant-specific transcription factors obtained from the JASPAR webpage (http://jaspar.genereg.net). The occurrences of predicted motifs were extracted from the corresponding FIMO output produced by the MEME-ChIP suite. Complementarily, a custom C++ code was written to manually search the peaks for the presence of a previously described degenerated CArG-box motif, MYHWAWWWRGWWW (Mateos et al., 2017;Tilmes et al., 2019).
To each complex, we compared the number of peaks associated with genes and containing degenerated CArG-box motifs to the frequency distribution of CArG boxes in 1,000 random sets.
To generate each random set, the distance from the start of the peak to the nearest TSS was recorded for each observed peak. Next, a random gene was selected, and a sequence with the same size and with the same distance from the TSS of a corresponding observed peak was obtained. Finally, the number of peaks with degenerated CArG-box sequences was determined in each random set.

RNA-seq analyses
The RNA-seq reads of the GR DEX-inducible assays were pre-processed as described above (see "Read cleaning and mapping of seq-DAP-seq assays"). Cleaned reads were mapped to the apple genome using TopHat2 (Kim et al., 2013) (parameters: -i 50 -I 10000 -N 5 --read-edit-dist 5 --read-gap-length 2). The apple genome annotation was used for intron hints. The number of reads mapped to each gene was determined using the HTSeq count program (Anders et al., 2015). Differential gene expression analyses were carried out using the DESeq2 R package (Love et al., 2014). The publicly available transcriptome samples PRJNA374502 (dataset A (Moser et al., 2020)) and PRJDB6779 (dataset B (Takeuchi et al., 2018)) were downloaded from the NCBI sequence read archive and quantified using the Salmon program (Patro et al., 2017). Differential gene expression analyses were performed as previously described.  (Moser et al., 2020) were obtained using the fpkm function of DESeq2 (Love et al., 2014). The expression values were transformed into the zscale and clustered using the scale and hclust functions of R (R, 2020), respectively. The dendrograms from the heatmaps generated from the z-transformed expression data were visually inspected and a cutoff value was arbitrarily chosen in order to obtain the gene clusters.
The expression data for each cluster was summarized by the eigen gene, which was obtained using the moduleEigengenes function of the R package WGCNA (Langfelder & Horvath, 2008).

GO enrichment tests
A BLAST (Altschul et al., 1997) similarity search (parameters: -F F -e 1e-10 -b 1 -v 1) was performed using the apple proteins as a query against the Arabidopsis TAIR10 proteome (www.arabidopsis.org). The apple proteins/genes inherited the GO term annotation of their best Arabidopsis counterpart.
The 'plant GO slim' subset was obtained from the GO website (http://current.geneontology.org/ontology/subsets/goslim_plant.obo). All enrichment tests using the 'plant GO slim' annotations were performed in MATLAB. By using the geneont class from the MATLAB Bioinformatics toolbox, genes were annotated with a particular GO slim term if one of its originally blast-based GO terms corresponds to or is a descendant of that GO slim category. The GO term enrichment in a gene set was assessed using the hypergeometric test.
Raw P-values were adjusted for multiple testing using the Benjamin Hochberg method (Benjamini & Hochberg, 1995). For the enrichment analysis of the seq-DAP-seq targets of the four MADS complexes, the background consisted of all genes in the apple genome with at least one GO slim annotation. The test sets encompassed the target genes of each MADS complex having the specific GO slim terms to be tested. The results were displayed as heatmaps with a grayscale reflecting the P-value. For the enrichment analysis using expression data from the calli assay, the background was determined as the set of genes tested by DESeq2 (in a MADS TF-dependent manner) and assigned to at least one GO slim term. The test sets consisting of GO-annotated DEGs were then separated into up-or downregulated gene sets, and both sets were individually tested for GO slim term enrichment. The results were plotted as a heatmap in which an orange or blue gradient was assigned in the event that the upregulated or the downregulated set, respectively, was enriched (P ≤ 0.05). In case both sets were enriched, the set with the best Pvalue was kept.
The GO term enrichment analysis for the gene clusters obtained from dataset A (Moser et al., 2020) was performed using the Cytoscape plugin BiNGO (Maere et al., 2005). For these tests, the background consisted of all GO annotated apple genes. The test sets consisted of the genes within a particular cluster having the GO term being tested.