A genome‐scale TF–DNA interaction network of transcriptional regulation of Arabidopsis primary and specialized metabolism

Abstract Plant metabolism is more complex relative to individual microbes. In single‐celled microbes, transcriptional regulation by single transcription factors (TFs) is sufficient to shift primary metabolism. Corresponding genome‐level transcriptional regulatory maps of metabolism reveal the underlying design principles responsible for these shifts as a model in which master regulators largely coordinate specific metabolic pathways. Plant primary and specialized metabolism occur within innumerable cell types, and their reactions shift depending on internal and external cues. Given the importance of plants and their metabolites in providing humanity with food, fiber, and medicine, we set out to develop a genome‐scale transcriptional regulatory map of Arabidopsis metabolic genes. A comprehensive set of protein–DNA interactions between Arabidopsis thaliana TFs and gene promoters in primary and specialized metabolic pathways were mapped. To demonstrate the utility of this resource, we identified and functionally validated regulators of the tricarboxylic acid (TCA) cycle. The resulting network suggests that plant metabolic design principles are distinct from those of microbes. Instead, metabolism appears to be transcriptionally coordinated via developmental‐ and stress‐conditional processes that can coordinate across primary and specialized metabolism. These data represent the most comprehensive resource of interactions between TFs and metabolic genes in plants.

The manuscript is well-thought-out and all experiments are executed with greater details. No major points are listed by this reviewer.
Minor Points: I may have missed that did the authors compare their Y1H-based network vs co-expression network constructed from salt and light-dependent stresses? Such comparison would further substantiate the findings of this manuscript.

Reviewer #2:
This is a really nice study that demonstrates the principle of distributed control in biological networks. The focus on contextspecific regulation of the TCA cycle is an interesting choice since TCA cycle intermediates contribute to the synthesis of proteins, lipids and other macromolecules, but also act in the regulation of chromatin modifications and post-translational modifications by contributing acetyl groups. Thus, the TCA cycle itself is considered a signaling hub, and is an indicator of cellular energy supplies. The study itself lays out that there are a bunch of tfs that regulate many genes in many pathways, but not all tfs are regulating all the genes all the time. Some tfs will only regulate a pathway in a certain subcellular compartment, and some tfs will only regulate a pathway in response to a particular developmental or environmental cue. The salt stress analysis demonstrates this nicely. The dark-grown experiment is a little confusing though. With the hypothesis of context-specific regulation of the TCA cycle, and the absence of a universal set of TF/TCA target interactions, was it expected that the same 17 TFs would so strongly influence phenotypes under dark-grown and salt stress conditions? This seems in conflict with the distributed control idea.
Minor points: Several times throughout the ms promoters are referred to in the context of metabolic pathways (eg: line 28 "promoters of primary and specialized metabolism" and line 115 "promoters of multiple metabolic pathways") rather than genes. It would be more precise to always say "gene promoters in primary and specialized metabolic pathways." Using the subcellular localization to constrain TF-target interactions is a really nice idea. Did you also look at organ-specific TF expression (eFP browser) as part of the "context specific" information? Would root vs shoot expression have local effects on hypocotyl and root phenotypes, or do you think the phenotypes are the result of global changes in TCA cycle derived metabolites?
A small issue with the TCA cycle-focused part: since the above sections demonstrate that single TFs regulate several genes across several pathways, how can the observed changes in plant growth, development, and response to the environment to TF perturbation be specifically linked to changes in the TCA cycle without measuring the organic acids in the TCA cycle (as validation of direct impact), or without examining changes in expression in genes in the other affected pathways? For example wri3: 8 gene promoters in glycolysis/gluconeogenesis and the 3 gene promoters in PPP.
Line 276: extra word in the sentence. Figure 5C and EV4D are showing the same thing, but the relative effect scale is different. 5C makes the wri3 result look much more significant than it does in EV4D. It just looks less sketchy if they're the same.
Lines 386-388 state that the authors propose that plant metabolism is controlled via a distributed system... large collections of TFs (similar statement in lines 405-407). That's kinda the central thesis of biological networks that biological systems are dominated by distributed control (very few "hub" genes), so this research provides support for that rather than being a novel proposed mechanism.

Summary
The study aims to systematically characterize the complexity of metabolic networks from the perspective of transcriptional regulation in multicellular organisms. Specifically, the authors hypothesized that Arabidopsis metabolism is orchestrated by TFs that regulate both primary and specialized metabolism.
Although Y1H suffers from false positive and false negatives, the author performed Y1H screening between 1930 TFs and 220 promoters across 11 central carbon metabolism pathways and one specialized metabolism pathway (aliphatic glucosinolate biosynthesis). The authors revealed that 90% of TFs bind to promoters of metabolic genes and all TFs bind to promoters of multiple metabolic pathways.
Then the authors tested the regulatory capacity of 4 selected TFs by performing transcriptome analyses of DEX-inducible lines of 4TFs. The results showed that the predicted target enzyme genes were enriched among the DEGs in the respective DEX-inducible lines.
The authors focused on TCA cycle isozymes and showed the enrichment of certain TF families in isozymes localized to specific cellular compartments. Coexpression relationship between TF and TCA cycle isozyme genes across 5 microarray datasets supported the idea that regulation of TCA cycle genes is highly conditional.
The phenotypic analyses of mutant alleles of 17 TFs within the TCA cycle sub-network demonstrated the importance of the regulatory module in plant growth and its response to salt stress as well as the importance of TFs in coordination of TCA cycle function in different organs and conditions. General remarks In the case of primary metabolism in plant, transcriptional regulation is not obvious compared to specialized secondary metabolism in which a relatively small number of TFs regulate the most part of enzyme genes involved. This study clarified interaction of TFs and promoters of central carbon metabolism genes (and aliphatic glucosinolate biosynthetic genes) in a genome-scale. It is surprising that an average of 125 transcription factors are bound to one promoter. The authors performed well-designed phenotypic analyses focusing of sub-network of TCA cycle regulation to validate the finding from Y1H. This study provides useful datasets to understand complicated regulatory mechanisms of primary and specialized metabolism.
Minor comments -Please carefully check the numbers of promoters and TFs used. For example, line130 says "224 promoters", while 203 genes (except 22 aliphatic glucosinolate genes) are listed in Table EV1. Similarly, line138 says "additional 1227 TFs", while 1225 TF genes are listed in TableEV2.
-line167-177 If I understand it correctly, regional enrichment can be clarified by a hypergeometric test. I missed concrete description of the method and result of this test.
-line203-217 I got a bit confused in understanding of Fig.3C-E. First of all, Panel C and Panel D seem reversed. "Gene ontologies associated with the reductive pentose phosphate pathway, glycine biosynthesis, and cysteine biosynthesis were over-represented more than expected by chance in the significantly up-regulated DEGs upon GR induction of CHA19, LBD16 and WRI3". I could not find such a result in Panel D. There is no GO terms "glycine biosynthesis" and "cysteine biosynthesis" in the figure. "reductive pentose phosphate pathway" does not seem enriched in GR-LBD16. More detailed explanation is better to be added in Fig.3E legend.
-line266 I think "TF/TCA target interaction" is better to be "TF/TCA target coexpression" to avoid readers' confusion. "interaction" is also used to explain TF-promoter binding in Y1H.
-line281-282 "root length of the wri3 loss-of-function mutant allele is shorter compared to wild type (Fig 5C-E)." In Fig.5E, the root length of wri3 looks longer than that of WT in control condition. -line618 "rifampin" may be "refampicin".
-On which database did gene annotation depend? In Panel A, "Total Biomass" is better to be "Shoot Biomass".
-English proofreading may be recommended for some points.
- Table EV3 For example, MYB28 bound to a limited number of aliphatic GSL genes. I wonder why MYB28 did not bind to CYP79F1, MAM1, etc.
- Table EV5 More detailed explanation is desirable.

Reviewer #1:
Summary This manuscript "A Genome-Scale TF-DNA Interaction Network of Transcriptional Regulation of Arabidopsis Primary and Specialized Metabolism" by Tang et al. focuses on the construction of a TF-target network on genes involving in primary and specialized metabolism. The authors have used a global collection of Arabidopsis transcription factors (TFs) and 224 promoters pertinent to primary and specialized metabolism and perform a large-scale Y1H experiment. This led to the identification of 7,485 interactions between 1,930 TFs and 220 promoters across the 12 pathways. The authors chose four TFs CHROMATIN REMODELING 19 (CHA19), EIN2 NUCLEAR ASSOCIATED PROTEIN 1 (ENAP1), LATERAL ORGAN BOUNDARIES-DOMAIN 16 (LBD16) and WRINKLED 3 (WRI3) and tested their regulatory capacity. Using GR-TF method, they look for direct and indirect targets of these four TFs. Subsequently, the authors focused on TCA cycle and identified 17 TFs that can contribute to TCA-cycle dependent plant growth using reverse genetics approach. Specifically, they obtained 31 insertional mutant alleles of 17 TFs. Importantly, Of the seventeen tested TF and their corresponding mutant alleles, twelve TFs have significant genotype-dependent 306 hypocotyl growth effects. Next, the authors focused on salt stress. Mutations in fifteen of the TCA cycle-linked TFs led to small but significant effects on shoot biomass, flowering time, growth rate and seed yield in both control and salt conditions relative to wild-type seedlings General remarks: Overall, the manuscript addresses an essential process of biology using a systems biology approach. The experiments are performed unbiasedly to identify novel players in plant stress biology that are interconnected with plant metabolism. This manuscript is definitely a step forward in our understanding of an essential area of biology that is common to every plant species. The knowledge gained from this work can be extrapolated on important crop plants for better crop production as well as protection against other environmental stresses including light and salt.

Major Points:
The manuscript is well-thought-out and all experiments are executed with greater details. No major points are listed by this reviewer.

Thank you for your positive comments!
Minor Points: I may have missed that did the authors compare their Y1H-based network vs co-expression network constructed from salt and light-dependent stresses? Such comparison would further substantiate the findings of this manuscript.
We calculated co-expression values for TFs and their target genes using connections predicted by the Y1H network. The results of this analysis are summarized in Dataset EV8. The reviewer is correct that this comparison further substantiated the findings from the Y1H network. We then chose TFs that were strongly co-expressed with their target genes for further functional analysis.

Reviewer #2:
This is a really nice study that demonstrates the principle of distributed control in biological networks. The focus on context-specific regulation of the TCA cycle is an interesting choice since TCA cycle intermediates contribute to the synthesis of proteins, lipids and other macromolecules, but also act in the regulation of chromatin modifications and post-translational modifications by contributing acetyl groups. Thus, the TCA cycle itself is considered a signaling hub, and is an indicator of cellular energy supplies. The study itself lays out that there are a bunch of tfs that regulate many genes in many pathways, but not all tfs are regulating all the genes all the time. Some tfs will only regulate a pathway in a certain subcellular compartment, and some tfs will only regulate a pathway in response to a particular developmental or environmental cue. The salt stress analysis demonstrates this nicely.
The dark-grown experiment is a little confusing though. With the hypothesis of context-specific regulation of the TCA cycle, and the absence of a universal set of TF/TCA target interactions, was it expected that the same 17 TFs would so strongly influence phenotypes under dark-grown and salt stress conditions? This seems in conflict with the distributed control idea.
We had some inclination that these 17 TFs would affect the phenotypes under the conditions we tested due to the co-expression analysis results utilizing cell-/tissuespecific and abiotic stress expression datasets (Dataset EV8). We consider our results in agreement with the distributed control idea since these 17 TFs had significant but small effects on the phenotypes we measured indicating that no single TF had large effects on the higher-order phenotypes (Dataset EV10-12).
Minor points: Several times throughout the ms promoters are referred to in the context of metabolic pathways (eg: line 28 "promoters of primary and specialized metabolism" and line 115 "promoters of multiple metabolic pathways") rather than genes. It would be more precise to always say "gene promoters in primary and specialized metabolic pathways." References to promoters are updated to say "gene promoters in . . . metabolic pathways" or "promoters of genes in . . . pathways" in lines 28, 119, 162, 168, 272, 418.
Using the subcellular localization to constrain TF-target interactions is a really nice idea. Did you also look at organ-specific TF expression (eFP browser) as part of the "context specific" information? Would root vs shoot expression have local effects on hypocotyl and root phenotypes, or do you think the phenotypes are the result of global changes in TCA cycle derived metabolites?
We thank the reviewer for this suggestion. Although datasets profiling expression in both the root and the shoot were included in our co-expression analyses for TFs and targets, we did not explicitly choose TFs for validation based on their organ-specific TF expression. Looking at the hypocotyl and root expression of the TFs we had biologically validated on the eFP browser, we did not see any obvious similarities between organspecific expression pattern and whether the mutants of the TFs will significantly influence the traits we measured. We saw significant effects in hypocotyl or root when the expression of TFs were both high or below the limit of detection (likely indicating very low expression or specific spatiotemporal expression not captured by whole organ profiling). TF expression levels were similar in the hypocotyl and in the root for the majority of the TFs we functionally validated. An exception to this is for the cases where TF expression was higher in one of the organs for GATA9 and GATA12, hypocotyl and root phenotypes, respectively, were significantly influenced by the TF.
A small issue with the TCA cycle-focused part: since the above sections demonstrate that single TFs regulate several genes across several pathways, how can the observed changes in plant growth, development, and response to the environment to TF perturbation be specifically linked to changes in the TCA cycle without measuring the organic acids in the TCA cycle (as validation of direct impact), or without examining changes in expression in genes in the other affected pathways? For example wri3: 8 gene promoters in glycolysis/gluconeogenesis and the 3 gene promoters in PPP.
As suggested by the reviewer, we had measured the expression changes in genes in other pathways in GR-CHA19, GR-ENAP1, GR-LBD16 and GR-WRI3 lines and that data can be found in Fig 3 and Dataset EV6. We appreciate the reviewer's suggestion about measuring the organic acids in the TCA cycle, however the interpretation of direct causality is complicated given the anaplerotic nature of the TCA cycle. This leads to the potential for steady state levels of the metabolite and the fluxes to be significantly disconnected and the organellar and even cellular specific expression patterns of these TFs and the target genes indicate that it

isn't clear what specific metabolic measurements would be appropriate. Further, given the connection of the TCA cycle to a wide range of metabolites, it isn't clear if the organic acid changes themselves would be expected to be directly causal. We feel that this question is critical for future studies where it would need a combination of steady state and flux-based measurements on organs if not single cells to map causality. And even then there may not be single causalities as for example, if the TCA cycle alters the shikimate pathway, this could lead to both auxin and lignin alterations. This complexity is the reason we instead focused on growth based assays to try and capture the higherorder consequences of changes in the CHA19, ENAP1, LBD16 and WRI3 TFs that bind to promoters of genes in multiple metabolic pathways.
Line 276: extra word in the sentence.
The extra word (currently) in line 301 has been removed. Figure 5C and EV4D are showing the same thing, but the relative effect scale is different. 5C makes the wri3 result look much more significant than it does in EV4D. It just looks less sketchy if they're the same.
While the data underlying Figure 5C and Figure EV4D are the same, the plots are showing different aspects of the results. Figure 5C is a summary heatmap, showing the averages of all the effects in Figure EV4D) whereas Figure EV4D displays the effects of the mutants under each of the conditions tested. The scales are set based on the maximum and minimum relative effect for each plot, which are different between the two plots.
Lines 386-388 state that the authors propose that plant metabolism is controlled via a distributed system... large collections of TFs (similar statement in lines 405-407). That's kinda the central thesis of biological networks that biological systems are dominated by distributed control (very few "hub" genes), so this research provides support for that rather than being a novel proposed mechanism.

Summary
The study aims to systematically characterize the complexity of metabolic networks from the perspective of transcriptional regulation in multicellular organisms. Specifically, the authors hypothesized that Arabidopsis metabolism is orchestrated by TFs that regulate both primary and specialized metabolism.
Although Y1H suffers from false positive and false negatives, the author performed Y1H screening between 1930 TFs and 220 promoters across 11 central carbon metabolism pathways and one specialized metabolism pathway (aliphatic glucosinolate biosynthesis). The authors revealed that 90% of TFs bind to promoters of metabolic genes and all TFs bind to promoters of multiple metabolic pathways.
Then the authors tested the regulatory capacity of 4 selected TFs by performing transcriptome analyses of DEX-inducible lines of 4TFs. The results showed that the predicted target enzyme genes were enriched among the DEGs in the respective DEX-inducible lines.
The authors focused on TCA cycle isozymes and showed the enrichment of certain TF families in isozymes localized to specific cellular compartments. Coexpression relationship between TF and TCA cycle isozyme genes across 5 microarray datasets supported the idea that regulation of TCA cycle genes is highly conditional.
The phenotypic analyses of mutant alleles of 17 TFs within the TCA cycle sub-network demonstrated the importance of the regulatory module in plant growth and its response to salt stress as well as the importance of TFs in coordination of TCA cycle function in different organs and conditions.

General remarks
In the case of primary metabolism in plant, transcriptional regulation is not obvious compared to specialized secondary metabolism in which a relatively small number of TFs regulate the most part of enzyme genes involved. This study clarified interaction of TFs and promoters of central carbon metabolism genes (and aliphatic glucosinolate biosynthetic genes) in a genome-scale. It is surprising that an average of 125 transcription factors are bound to one promoter. The authors performed well-designed phenotypic analyses focusing of sub-network of TCA cycle regulation to validate the finding from Y1H. This study provides useful datasets to understand complicated regulatory mechanisms of primary and specialized metabolism.
Minor comments -Please carefully check the numbers of promoters and TFs used. For example, line130 says "224 promoters", while 203 genes (except 22 aliphatic glucosinolate genes) are listed in Table EV1. Similarly, line138 says "additional 1227 TFs", while 1225 TF genes are listed in TableEV2.
We thank the reviewer for their careful review of our data. Table EV1 summarizes AGIs, gene names, promoter lengths and cloning primers for 203 genes. One of the gene promoters, promoter of AT3G54050, was cloned with different primers and subsequently screened. We included both versions and the resulting Y1H contains TFs detected by both bait strains. We also corrected these numbers in lines 133 and 143 to reflect the number in Table EV2 (1224 TFs + one duplicate).
-line167-177 If I understand it correctly, regional enrichment can be clarified by a hypergeometric test. I missed concrete description of the method and result of this test.
This summary of how aliphatic GSL is integrated with central carbon metabolism is an extension of the previous paragraph. The method involves calculating the significance of association between two pathways, for all pairwise combinations of pathways, followed by ordered ranking of the adjusted p-values of the Fisher's exact test (hypergeometric distribution) to determine the metabolic pathways that share the most TFs with aliphatic GSL. We have added a reference to the result table (Dataset EV4) as well included a description of our methods in the section "Pairwise Pathway Comparison" (lines 548 to 556).
-line203-217 I got a bit confused in understanding of Fig.3C-E. First of all, Panel C and Panel D seem reversed. "Gene ontologies associated with the reductive pentose phosphate pathway, glycine biosynthesis, and cysteine biosynthesis were over-represented more than expected by chance in the significantly up-regulated DEGs upon GR induction of CHA19, LBD16 and WRI3". I could not find such a result in Panel D. There is no GO terms "glycine biosynthesis" and "cysteine biosynthesis" in the figure. "reductive pentose phosphate pathway" does not seem enriched in GR-LBD16. More detailed explanation is better to be added in Fig.3E legend.
We have updated Panel C and Panel D to match the figure legend and to focus on GO terms related to metabolism. We have edited the manuscript text to call out explicitly the enriched GO terms for each GR-TF in line 217 to 222. We included more detailed explanation for Fig. 3E legend (line 770 to 775).
-line266 I think "TF/TCA target interaction" is better to be "TF/TCA target coexpression" to avoid readers' confusion. "interaction" is also used to explain TF-promoter binding in Y1H.
We edited the text to clarify that we are describing both TF-promoter binding interactions and co-expression (line 286 to 287).
-line281-282 "root length of the wri3 loss-of-function mutant allele is shorter compared to wild type (Fig 5C-E)." In Fig.5E, the root length of wri3 looks longer than that of WT in control condition. -line316-317 (Fig.5C) may be (Fig.5D).
The text has been revised to call out the correct figure panels (Fig 5B & 5D) (line 345).  Fig. 6A have been added (line 377 to 379).

Abbreviations of those genes as shown in
-line618 "rifampin" may be "refampicin". This error has been corrected to "rifampicin" in line 670.
-On which database did gene annotation depend?
The methods section, Gene Ontology Enrichment Analysis, has been updated to include version of gene annotation database (2019-10-08) (see lines 594 to 595).
- Fig.1 Aliphatic glucosinolate pathway was not found in Panel A. Fig 1A. -In Fig.5C, the relative effect of atbpc4 on root length is almost zero, while that in Fig  - Fig.EV3 legend TF-TCA target gene "interactions" is better to be "coexpression".

Aliphatic glucosinolate (Aliphatic GSL) has been added to
Text has been changed to "co-expression" in line 861.
-English proofreading may be recommended for some points.
The co-corresponding authors have reviewed the text for remaining grammatical errors.
- Table EV3 For example, MYB28 bound to a limited number of aliphatic GSL genes. I wonder why MYB28 did not bind to CYP79F1, MAM1, etc.
As the reviewer points out, in other in planta assays published by several groups, MYB28 does bind most of these promoters. Our current thoughts about why MYB28 seems to have a false negative issue in Y1H is that MYB28 may require bHLHs like MYC2, AP2/ERFs like RAP2.2 or a complex of these TFs to maximize its binding strength. This would be akin to the flavonoid and anthocyanin MYBs that use a tripartite TF complex to maximize binding. -

B-Statistics and general methods
the assay(s) and method(s) used to carry out the reported observations and measurements an explicit mention of the biological and chemical entity(ies) that are being measured. an explicit mention of the biological and chemical entity(ies) that are altered/varied/perturbed in a controlled manner.
a statement of how many times the experiment shown was independently replicated in the laboratory.
Any descriptions too long for the figure legend should be included in the methods section and/or with the source data.
In the pink boxes below, please ensure that the answers to the following questions are reported in the manuscript itself. Every question should be answered. If the question is not relevant to your research, please write NA (non applicable). We encourage you to include a specific subsection in the methods section for statistics, reagents, animal models and human subjects.
definitions of statistical methods and measures: a description of the sample collection allowing the reader to understand whether the samples represent technical or biological replicates (including how many animals, litters, cultures, etc.).
The data shown in figures should satisfy the following conditions: Source Data should be included to report the data underlying graphs. Please follow the guidelines set out in the author ship guidelines on Data Presentation.
Please fill out these boxes ê (Do not worry if you cannot see all your text once you press return) a specification of the experimental system investigated (eg cell line, species name).
Replication numbers for all experiments were designed to provide significant power for moderate effect sizes and modest power for small effect sizes using a presumed broad-sense heritability of about 15% based on previous publications and data for these traits.
graphs include clearly labeled error bars for independent experiments and sample sizes. Unless justified, error bars should not be shown for technical replicates. if n< 5, the individual data points from each experiment should be plotted and any statistical test employed should be justified the exact sample size (n) for each experimental group/condition, given as a number, not a range; Each figure caption should contain the following information, for each panel where they are relevant:

Captions
NA Any plants that did not survive the full duration of phenotyping assay were excluded from analysis. Two 35S:GR-WRI3 RNA-Seq libraries, L8_dex_rep1 and L10_mock_rep2, were excluded from downstream analysis due to being outliers in the principal component analysis of normalized read counts.
Mutant lines were assigned random aliases for experiments and analysis. Gene mutant names were linked to the data after the analysis.

Manuscript Number: MSB-2021-10625
Yes We used generalized linear mixed models when determining statistical significance in response variables between groups because generalized linear mixed models allow for more flexibility in the data, particularly when distribution of data deviates from normal and variance is not constant between groups.

NA
Phenotyping assays were performed in a semi-random block design. For both the TCA feeding and the salt response assays, each plate/flat had a wild type Col-0 control to which each mutant was compared. Samples for RNA-Seq were plated on individual plates and placed in a random order in the dark growth chamber. After mock or Dex treatment, plates were placed in the same random order as before treatment.

Data
the data were obtained and processed according to the field's best practice and are presented to reflect the results of the experiments in an accurate and unbiased manner. figure panels include only data points, measurements or observations that can be compared to each other in a scientifically meaningful way.