Single‐cell dynamics of chromatin activity during cell lineage differentiation in Caenorhabditis elegans embryos

Abstract Elucidating the chromatin dynamics that orchestrate embryogenesis is a fundamental question in developmental biology. Here, we exploit position effects on expression as an indicator of chromatin activity and infer the chromatin activity landscape in every lineaged cell during Caenorhabditis elegans early embryogenesis. Systems‐level analyses reveal that chromatin activity distinguishes cellular states and correlates with fate patterning in the early embryos. As cell lineage unfolds, chromatin activity diversifies in a lineage‐dependent manner, with switch‐like changes accompanying anterior–posterior fate asymmetry and characteristic landscapes being established in different cell lineages. Upon tissue differentiation, cellular chromatin from distinct lineages converges according to tissue types but retains stable memories of lineage history, contributing to intra‐tissue cell heterogeneity. However, the chromatin landscapes of cells organized in a left–right symmetric pattern are predetermined to be analogous in early progenitors so as to pre‐set equivalent states. Finally, genome‐wide analysis identifies many regions exhibiting concordant chromatin activity changes that mediate the co‐regulation of functionally related genes during differentiation. Collectively, our study reveals the developmental and genomic dynamics of chromatin activity at the single‐cell level.

Transaction Report: (Note: With the exception of the correction of typographical or spelling errors that could be a source of ambiguity, letters and reports are not edited. Depending on transfer agreements, referee reports obtained elsewhere may or may not be included in this compilation. Referee reports are anonymous unless the Referee chooses to sign their reports.) Figure 1L -why pick magic number 5 with 67%? And the conclusion drawn from this 'Given that many embryonic cells have similar developmental fates, this means the cellular CAL is sufficient for defining cellular states at a sub-tissue level' is a bit inflated? Figure 2C/S8A, B -'Thus, cellular CAL diversifies as lineage unfolds and recapitulates the kinetics of lineage-coupled fate differentiation' -'recapitulates kinetics' here is a bit overstated -as the data don't show that the CAL has similar information content to expression. Furthermore, analysis of the cellular gene expression of L1 stage animals(Liu et al., 2009) consistently obtained an identical pattern (Appendix Fig S15A). This result thus reveals a chromatin basis for L-R functional asymmetry.' Identical pattern or just similar pattern?
Other suggestions Some of the early figures (1,2,3) could be broken into 2 smaller figures to improve their readability. Appendix Figure S10 -the most informative number of lineage groups could be used -and the heatmap combined with another figure? Appendix Figure S1A: If possible to include in legend that green indicates the 113 strains examined and the 18 shown were PCR verified in this paper In methods, the 'Analysis of the cellular dynamics and implications of CAL during cell lineage differentiation' section can be moved after the 'Comparison of position-effects on GFP with chromatin features' to keep with the flow of the paper. Similarly the Perturbation of writers of histone modifications could be moved ahead. Figure 3C -would be better to show CAL divergence plots instead of Circos? Page 4 "As compared to existing sequencing-based epigenomic approaches" -position effect analysis is not mutually exclusive with sequencing, really what this section needs is a brief mention of the advantages (e.g. cell ID/position) of imaging based vs sequencing approaches Page 10 "Together, these results revealed a chromatin basis for the tissue-based convergence of regulatory states in cells originating from diverse lineages." -this isn't quite right. The authors showed that fate is correlated with chromatin activity convergence, not that chromatin controls convergent expression (could just as easily be the reverse or independent control by a third factor given the data presented). Page 14 Charest et al (2020) showed that several other neurons thought previously to be symmetric have some molecular asymmetry, including AUA (listed here as an example of a symmetric neuron) Existence and extent of TADs in C. elegans is controversial -indeed the paper cited identified "TAD-like" regions on the x-chromosome but not autosomes, but it is unclear if these are analogous to TADs seen in other organisms. More recent work has suggested that C. elegans TADS may exist on autosomes but be much smaller (on the order of 5-15 kb). This should be addressed in the text of this section so it is clear what scale of "TADs" were analyzed here.
Reviewer #2: The authors take a powerful approach to investigating the influence of chromatin environment on gene expression. They analysed the expression of the same reporter integrated into 113 sites at the level of single lineage-resolved cells during embryogenesis. They found abundant position effects on reporter expression level and used the expression data to infer a chromatin activity landscape (CAL) value for individual cells at each insertion site. The authors then used similarities and differences in CAL across lineages and cell types to make conclusions about the effects of chromatin regulation on fate patterning. This is a good approach and the data on gene expression values appear to be of high quality. However, many analyses are difficult to understand, often because of insufficient information in the text, legends, and methods. In addition, many conclusions made are stronger than the data warrant.
The authors need to be more careful and considered in their conclusions and to explain their analyses more clearly. With considerable re-writing, this paper could be of wide interest.
Below I give examples of issues that the author need to improve.
1. Most analyses in the paper rely on their CAL metric, but this metric is not clearly explained. As far as I can see, CAL is a metric based on the expression level of the same reporter gene (ubiquitous eef-1 promoter driving GFP) in embryonic cells when integrated into 113 genomic sites. In the first part of the results, the authors need to include much more explanation of what CAL is. In the text, they simply say that their assays "allowed a cell-by-cell integration of GFP expression levels at different genomic positions in lineage-equivalent cells ( Fig 1F). This enabled us to construct a chromatin activity landscape (CAL) in each lineaged cell ( Fig 1G, Table EV3)." The method for calculating CAL is unclear. Is it simply the expression level? For ABalpaapaa in SS343 (Chr I, 0.53), there are expression values for two embryos: .650484648 and 2.968127764. The CAL for this cell is 2.2318772. How was this was derived? The authors also need to explicitly explain that they use gene expression measurements to infer effects of the chromatin landscape and that they will use CAL values as a proxy for chromatin state.
2. Some additional basic information about the reporter is needed. For example, was the eef-1 reporter indeed ubiquitously expressed as expected, at least in some integration positions? In the methods, they refer to S2F to support that the eef-1A reporter is expressed in most cells from the 350 cell stage and onwards. Indeed, nearly every cell does appear to show expression from at least one integration site, but in their analyses they only considered cells where expression was seen in 60% of the embryos: "only cells in which GFP is expressed in more than 60% of the embryos were considered expressing cells" implying non-ubiquitous expression. This cutoff and its rational needs to be discussed and justified in the main text. From first principles, it seems to me that cells that show a large divergence in expression would be highly informative. In addition, the authors should explain what types of expression changes they observed. Was expression from some insertions completely lost in specific cells and/or lineages, or were effects more often changes in levels?
3. In many places, the authors use strong language for their conclusion coupled with vague statements. The nature and limits of their conclusions need to be more precisely and accurately made. Clarifying what CAL means early on in the paper will help. Examples: p. 9 we "found that the inferred CAL transitions immediately result in significantly larger transcriptome divergences" Here they strongly conclude a cause and effect relationship. They have shown a correlation, not a causation. At the end of this paragraph, they write "Thus, chromatin transitions predict anteriorposterior asymmetry during lineage progression." This should be "inferred chromatin transitions." On p. 13, they write "This result thus reveals a chromatin basis for L-R functional asymmetry." Here it would be more appropriate to change "reveals" to "supports." Similar issues are found throughout the paper.
4. I could not follow the first paragraph of the section "Chromatin diversification predicts lineagecoupled fate determination." 5. I didn't understand their concluding sentence of the first paragraph on p. 12 " Intriguingly, analysis of lineage-resolved expression data from cells of L1 stage animals (Liu,nLong et al., 2009) demonstrated that the lineage effects not only result in gene expression heterogeneity but also are stably maintained after hatching (Appendix Fig S13D)." 6. In the discussion " Through a multidimensional analysis of the lineage-resolved chromatin landscape, we reveal that the regulation of lineage commitment, anterior-posterior asymmetry, tissue fate specification, cell heterogeneity, and bilateral symmetry establishment are readily inferable from cellular chromatin." implies that the authors have specific information on cellular chromatin state.
7. The legends need much more information in order to understand the figures. e.g, Figure 1G "CAL of representative embryonic cells. There is no explanation of colors. 1H on histone modifications is impossible to understand without reading the methods. Most figure legends need more information.
8. In the explanation of how GFP was quantified in the methods, the authors say that GFP intensities of the same cell at multiple time points were averaged. How many time points per cell? What is the range?
9. Which TADs were used from Crane et al 2015?
10. Images in Figure S5 B  However, I am not convinced by many of the other conclusions drawn by the authors, which I detail below. Most importantly, the authors should temper their conclusions because this is a reporterbased system with a strong promoter and it is not known if the conclusions will hold true for endogenous sequences. In fact, from Figure S4F, the overall correlation with endogenous gene expression appears poor. This result casts some doubt on the physiological relevance of the reporter data. One practical solution may be to only consider reporter data from an integrated location where there was high correlation with nearby unmodified endogenous genes.
Throughout the manuscript, the authors should not assume that chromatin changes precede gene expression and fate changes as the embryo develops, because this has not been proven. They should avoid statements to this end (e.g. chromatin regulation of cell differentiation etc), and instead state their apparent correlations plainly.
General comments: The language used in the title and abstract is vague. These elements should be concise and clearly written, and avoid new unexplained terminology.
Not all figure panels are referenced in the main text, leaving the reader wondering what conclusions can be drawn from these missing panels. Figure legends are under-described, especially in the supplement. They should explain how each experiment was done briefly but clearly and what conclusions can be directly drawn from each.
New terminology like CAL and CDCA which are critical to the message of the manuscript need to be thoroughly and clearly described in the main text.
The code used to analyze the data and produce the figures should be deposited in an appropriate online location.
Specific comments: Pg 5 "to explore chromatin regulation of cell differentiation in C. elegans," This is very vague, and the data is only correlative, so the authors should be careful not to imply causation without evidence. This wording implies chromatin regulates cell differentiation. What they are doing is cataloging the expression levels of a reporter in ~100 different locations using previously generated strains, not measuring cell differentiation. They can say that cells of different lineages and developmental ages have different reporter expression patterns (or chromatin activity landscapes to use the authors' term).
Pg 5 "These findings will contribute to a systems-level mechanistic understanding of how chromatin regulates cell lineage differentiation in a metazoan embryo across cell lineages, tissue types, and symmetric morphological organization." I disagree with this statement. The data in the manuscript does not support this claim. There are no proofs in the manuscript of chromatin regulating lineage differentiation.
Pg 6 "In total, we measured GFP expression in 268 embryos to quantify chromatin activity at 113 genomic positions" In main text explicitly state how many times each strain was recorded and analyzed. It appears that only duplicates were performed (from Figure S2). From the duplicate data in Figure S2, the correlation can range from 0 to 1. The authors should discuss this variation in the main text. These experiments form the basis of the entire manuscript and should be conducted in biological triplicates to be convincing. The replicates should be added to Figure S3 so that readers can visually assess the variability.
All images of the GFP reporters should have the nuclear mCherry marker side-by-side as a control.
It is perplexing that reporter expression correlates with histone modification status of the whole embryo (as in mutant and RNAi experiments Figure S5) but not the expression levels of nearby genes ( Figure 1I, S4). I do not agree with statements in the text such as "the measured chromatin activity is highly concordant with endogenous gene expression in the same cells" (pg 7) because the data show low Pearson R values 0.3-0.6. It appears that the reporter activity level at its various integrated locations is a poor predictor of endogenous gene activity.
The term CAL and what data it encompasses should be more clearly stated in the main text.
I am not convinced by the statement that "Given that many embryonic cells have similar developmental fates, this means the cellular CAL is sufficient for defining cellular states at a subtissue level." (pg 7) How does the data in Figure 1L prove this?
The authors state that "chromatin transitions predict anterior-posterior asymmetry during lineage progression", however I see no data showing CAL divergence predicts the fate of anterior cells or posterior cells in the embryo.
Figure S10 is quite interesting. Can this data (fraction of positions with distinct on/off states) be normalized to lineage distance? This will help to show which results are expected/surprising, and both cases should be discussed.
The authors should avoid statements such as the underlined: "Collectively, these findings support that cellular chromatin diversifies considerably during early lineage progression and systematically predicts lineage-coupled fate differentiation, including the global lineage-dependent diversification of cell fates, anterior-posterior fate asymmetry, and the establishment of lineage-specific fates." (pg 9) I see no evidence at this point in the manuscript that a particular CAL can predict a particular fate. It is unclear how different CALs are between different lineages. The authors should describe more clearly how CAL divergence is calculated in the main text.
The data in Fig S14 is  Response: We thank the reviewer for the kind words and for the accurate summary of our work.
As you can see in the revised manuscript, we have significantly revised the text and figures to present our findings more clearly and accurately.
18th Jan 2021 1st Authors' Response to Reviewers 2 My major concern that should be addressed before acceptance is that there is not sufficient discussion and analysis of the relative importance of global changes between insertions in expression levels that do not alter the underlying pattern vs changes that cause some cells to express the reporter more than others. It seems plausible that the reporter itself is biased to certain lineages or fates (such as intestine and skin), and that some of the "lineage differences" identified here are actually artifacts of lowerexpressing lineages fluctuating above or below the detection threshold depending on global scaling of the reporter activity, compounded with measurement and biological noise. Several figures (1F, 1D, S1, S2F, S3, S5) as well as an examination of Table EV2 seem to support the idea that the overall pattern is fairly consistent between integrants. At a minimum, the authors should show how the distribution of pearson correlation (as a proxy for "pattern" as opposed to "level") varies between integrants as opposed to between replicates for the same integrant (in other words a version of Fig. S2E but between integrants instead of between replicates). To be clear, I don't think even if global effects are dominant that this would render the paper uninteresting, in contrast, I think this might even be a more fundamental result if true. However regardless I think this issue must be carefully considered and addressed in the text.

Response:
Thank you for raising this concern. In the revised manuscript, we have systematically addressed this concern in the Results section (see Cellular chromatin is dynamic and informative for defining cellular states), and also added several figure panels in Fig 2. First, as suggested by the reviewer, we directly compared GFP expression divergence (measured as Euclidian distance) between replicates and between different positions. Both quantitative levels and on/off expression patterns showed that the GFP expression divergences between different integration sites are significantly larger than those between replicates at a given position (Fig 2B). This result suggests that the eef-1A.1 promoter sequence does not significantly dominate the position-effects on GFP expression. Second, we took another approach to assess potential bias of the eef-1A.1 promoter. If the promoter sequence is biased towards certain cells, we would expect GFP to be constitutively  3) Each panel corresponds to the result for cells at a certain lineage distance. We have added lineage distance labeling in these figure panels (Fig 3D).
Transition points of CAL and inter/inter CAL divergence: I was a bit unclear on if transition points are calculated for mother-daughter pairs only or mother-grand daughter and mother-great grand daughter also (as implied by Fig S9A/B). (1) We did not directly compare a mother cell to its descendants. Instead, we applied a

retrospective approach to infer chromatin transitions by comparing chromatin landscapes observed in all traced terminal cells. We have added this information (previously described in the Methods) to the Results section for clarity. "A cell division is defined as a transition point for the chromatin landscape if the chromatin divergences between terminal cells generated by different daughter cells (inter-divergence) are significantly higher than those between terminal
cells generated by the same daughter cell (intra-divergence) (Fig EV3D)." (2) The scatter plot compares the chromatin transition score (x-axis) and A/P fate divergence score (y-axis) associated with each of the 90 early cell divisions. Detailed information can be found in Table EV6.
Can we find an easier term/way to refer to CDCA? The number of acronyms in the paper make it harder to read.

Response: Thank you for the suggestion. (1) We have used the term "chromatin co-dynamic
regions" to replace "CDCA". (2) We have also used the spelled-out version of "chromatin activity landscape" instead of "CAL" to reduce the number of acronyms.
A generalized model that could differentiate between convergence and predetermination (mothers are less similar compared to intra-tissue daughters in convergence in contrast to mothers of L/R daughters being similar to each other and the daughters in predetermination). If a cohesive model can be used in (2) Each bar plot shows the Pnhr-2::GFP expression levels across 13 genomic positions in a cell.

We have added axis labels in the figure.
More clarification of what is being shown in Fig  12 Response: We meant to use the Circos plots to illustrate the point that cells of the same tissue type exhibit similar chromatin landscapes because these would be more intuitive than a divergence plot. Following the suggestion, we have added a divergence plot and statistics to illustrate the convergence of the chromatin landscapes.
Page 4 "As compared to existing sequencing-based epigenomic approaches" -position effect analysis is not mutually exclusive with sequencing, really what this section needs is a brief mention of the advantages (e.g. cell ID/position) of imaging based vs sequencing approaches Response: Thanks for the suggestion. A brief discussion has been added to mention the advantages of the imaging-based approach.
Page 10 "Together, these results revealed a chromatin basis for the tissue-based convergence of regulatory states in cells originating from diverse lineages." -this isn't quite right. The authors showed that fate is correlated with chromatin activity convergence, not that chromatin controls convergent expression (could just as easily be the reverse or independent control by a third factor given the data presented).
Response: Thank you for this suggestion. We have now softened our statements throughout the manuscript. Specifically, the words "correlate" or "accompanying" are used to replace "chromatin basis" or "chromatin regulation".

13
Page 14 Charest et al (2020) showed that several other neurons thought previously to be symmetric have some molecular asymmetry, including AUA (listed here as an example of a symmetric neuron)

Response:
We have added this new reference, thanks. Please note that we have moved this section into the Existence and extent of TADs in C. elegans is controversial -indeed the paper cited identified "TADlike" regions on the x-chromosome but not autosomes, but it is unclear if these are analogous to TADs 14 seen in other organisms. More recent work has suggested that C. elegans TADS may exist on autosomes but be much smaller (on the order of 5-15 kb). This should be addressed in the text of this section so it is clear what scale of "TADs" were analyzed here. This is a good approach and the data on gene expression values appear to be of high quality. However, many analyses are difficult to understand, often because of insufficient information in the text, legends, and methods. In addition, many conclusions made are stronger than the data warrant. The authors need to be more careful and considered in their conclusions and to explain their analyses more clearly.

Response
With considerable re-writing, this paper could be of wide interest.
Response: We thank the reviewer for these positive comments about the approach and findings and for the valuable suggestions to improve the manuscript. We have significantly revised and expanded the manuscript by providing necessary background information and analysis details in the main text and figure legends. Furthermore, we have systematically softened our language when describing the conclusions to accurately represent the data. Specifically, the words "correlate" or "accompanying" are used to replace "chromatin basis" or "chromatin regulation".
Below I give examples of issues that the author need to improve.
1. Most analyses in the paper rely on their CAL metric, but this metric is not clearly explained. As far as I can see, CAL is a metric based on the expression level of the same reporter gene (ubiquitous eef-1 16 promoter driving GFP) in embryonic cells when integrated into 113 genomic sites. In the first part of the results, the authors need to include much more explanation of what CAL is. In the text, they simply say that their assays "allowed a cell-by-cell integration of GFP expression levels at different genomic positions in lineage-equivalent cells (Fig 1F). This enabled us to construct a chromatin activity landscape (CAL) in each lineaged cell (Fig 1G, Table EV3)." The method for calculating CAL is unclear. Is it simply the expression level? For ABalpaapaa in SS343 (Chr I, 0.53), there are expression values for two embryos: .650484648 and 2.968127764. The CAL for this cell is 2.2318772. How was this was derived? The authors also need to explicitly explain that they use gene expression measurements to infer effects of the chromatin landscape and that they will use CAL values as a proxy for chromatin state.

Response:
(1) It is correct that the chromatin activity landscape described in this study refers to the 2. Some additional basic information about the reporter is needed. For example, was the eef-1 reporter indeed ubiquitously expressed as expected, at least in some integration positions? In the methods, they refer to S2F to support that the eef-1A reporter is expressed in most cells from the 350 cell stage and onwards. Indeed, nearly every cell does appear to show expression from at least one integration site, but in their analyses they only considered cells where expression was seen in 60% of the embryos: "only cells in which GFP is expressed in more than 60% of the embryos were considered expressing cells" implying non-ubiquitous expression. This cutoff and its rational needs to be discussed and justified in the main text. From first principles, it seems to me that cells that show a large divergence in expression would be highly informative. In addition, the authors should explain what types of expression changes they observed. Was expression from some insertions completely lost in specific cells and/or lineages, or were effects more often changes in levels? 18

Response:
Thank you for raising this concern.
(1) Yes, when integrated into certain genomic positions, the reporter is expressed in most traced terminal cells, supporting that the promoter at least has the potential to be ubiquitously expressed. In addition, in all cells, we found that the reporter is expressed at a substantial fraction of all integration sites. These results support the ubiquitous nature of the eef-1A.1 promoter.
(2) "only cells in which GFP is expressed in more than 60% of the embryos were considered  (Fig 2A).
Both quantitative levels and on/off expression patterns showed that the GFP expression divergences between different integration sites were significantly larger than those between replicates at a given position (Fig 2B). This result suggests that the eef-1A.1 promoter sequence does not significantly dominate the position-effects on GFP expression.
While GFP expression was generally consistent between experiment replicates (Fig EV1G) (Fig 2E and F), indicating the information conferred by chromatin is rich." 3. In many places, the authors use strong language for their conclusion coupled with vague statements.
The nature and limits of their conclusions need to be more precisely and accurately made. Clarifying what CAL means early on in the paper will help. Examples: p. 9 we "found that the inferred CAL transitions immediately result in significantly larger transcriptome divergences" Here they strongly 21 conclude a cause and effect relationship. They have shown a correlation, not a causation. At the end of this paragraph, they write "Thus, chromatin transitions predict anterior-posterior asymmetry during lineage progression." This should be "inferred chromatin transitions." On p. 13, they write "This result thus reveals a chromatin basis for L-R functional asymmetry." Here it would be more appropriate to change "reveals" to "supports." Similar issues are found throughout the paper.
Response: Thank you for pointing this out. We have systematically revised or removed overstatements throughout the manuscript to acknowledge that correlation rather than causality was observed between chromatin dynamics and cell differentiation. In particular, we have revised the three statements the reviewer pointed out as follows:

"Indeed, transcriptome divergences between daughter cells showing chromatin transitions
were significantly larger than between those without transitions (Fig EV3F)."  (Fig 3A). Lineage relationship was quantified as cell lineage distance, which was defined as the total number of cell divisions separating cells from their lowest common ancestor (Fig EV3A). In the majority of cases, higher chromatin divergences were observed between cells with a large lineage distance and, globally, chromatin divergence increased progressively with cell lineage distance (Fig 3B and C). Thus, in general, chromatin diversifies gradually across cells during lineage progression. Fig EV3B).

This analysis showed that, generally, as the cell lineage unfolds, cells differentiate
progressively. The fate divergences between cells were proportional to their lineage distances at different developmental stages, similar to what was observed with chromatin divergences (Fig 3C and Fig EV3C). To further demonstrate that chromatin dynamics were associated with fate changes, we directly analyzed the relationship between the two using cells with identical lineage distances. The results showed that a higher chromatin divergence was generally associated with a higher fate divergence, especially between cells at a modest lineage distance (from 6 to 14) (Fig 3D). Thus, chromatin dynamics during lineage progression correlate with lineage-coupled cell differentiation". 7. The legends need much more information in order to understand the figures. e.g, Figure 1G "CAL of 24 representative embryonic cells. There is no explanation of colors. 1H on histone modifications is impossible to understand without reading the methods. Most figure legends need more information.

Response: We apologize for the confusion. The color gradient represents GFP expression level,
which is used to indicate chromatin activity. We have moved all text that describes quality controls from the Methods to the Results section to make the figures in this section easy to follow. Finally, we have significantly expanded all figure legends to provide more context for understanding the results. 8. In the explanation of how GFP was quantified in the methods, the authors say that GFP intensities of the same cell at multiple time points were averaged. How many time points per cell? What is the range? panel (Fig EV1A).

Response: We have provided information on the time points in the main text and in a figure
"On average, GFP expression was measured at 38 consecutive time points (range 13-83) for each traced cell (Fig 1D and Fig EV1A) Figure S5 B and C are hard to see. What statistical test was performed when comparing wt and mutant data? Response: We have enlarged these figure panels (Fig EV2E and F) to better present the results. However, I am not convinced by many of the other conclusions drawn by the authors, which I detail below. Most importantly, the authors should temper their conclusions because this is a reporter-based system with a strong promoter and it is not known if the conclusions will hold true for endogenous sequences. In fact, from Figure S4F, the overall correlation with endogenous gene expression appears poor. This result casts some doubt on the physiological relevance of the reporter data. One practical solution may be to only consider reporter data from an integrated location where there was high correlation with nearby unmodified endogenous genes.

Cell-by-cell comparisons of GFP expression levels were performed between wt and perturbed embryos using the Wilcoxon signed-rank test. We have included this information in
Response: Thank you for the suggestion. We have systematically tempered our conclusions to more accurately summarize the presented data. Fig S4F is two-fold. First,

Response: We have significantly expanded the legends for all figures to provide all essential information.
New terminology like CAL and CDCA which are critical to the message of the manuscript need to be thoroughly and clearly described in the main text.

Response: Thanks for this reminder. We have now added a detailed definition and description
of these new terms when they are first presented.
The code used to analyze the data and produce the figures should be deposited in an appropriate online location.

Response: We have provided all code on GitHub (https://github.com/IGDB-DuLab/Zhaochromatin)
Specific comments: Pg 5 "to explore chromatin regulation of cell differentiation in C. elegans," This is very vague, and the data is only correlative, so the authors should be careful not to imply causation without evidence. This wording implies chromatin regulates cell differentiation.
Response: Thank you for pointing this out. We have revised this statement and other related ones.
What they are doing is cataloging the expression levels of a reporter in ~100 different locations using previously generated strains, not measuring cell differentiation. They can say that cells of different lineages and developmental ages have different reporter expression patterns (or chromatin activity landscapes to use the authors' term).

Response: We have changed all statements accordingly.
Pg 5 "These findings will contribute to a systems-level mechanistic understanding of how chromatin regulates cell lineage differentiation in a metazoan embryo across cell lineages, tissue types, and symmetric morphological organization." I disagree with this statement. The data in the manuscript does not support this claim. There are no proofs in the manuscript of chromatin regulating lineage differentiation.

Response: Thank you for pointing this out. This statement has been replaced with "Our findings contribute to a systems-level understanding of the developmental and genomic dynamics of chromatin at the single-cell level."
Pg 6 "In total, we measured GFP expression in 268 embryos to quantify chromatin activity at 113 genomic positions" In main text explicitly state how many times each strain was recorded and analyzed. It appears that only duplicates were performed (from Figure S2). From the duplicate data in Figure S2, the correlation can range from 0 to 1. The authors should discuss this variation in the main text. These experiments form the basis of the entire manuscript and should be conducted in biological triplicates to be convincing. The replicates should be added to Figure S3 so that readers can visually assess the variability.

Response:
(1) In the main text, we have added a discussion on the variability of reporter expression at certain integration sites in certain cells and added a figure (Appendix Fig S2) to illustrate this. (2) We hope it can be recognized that lineage tracing and curation are very labor-intensive; we thus performed only duplicates in the majority of the cases to balance workload and data reproducibility. Nevertheless, we found that the Pearson correlation coefficient of GFP expression between duplicates is highly comparable to those having a larger number of replicates, suggesting that the estimated reproducibility between replicates is generally accurate. Fig 1) as suggested.

(3) We have included a 3D visualization of cellular GFP expression for all replicates (Appendix
All images of the GFP reporters should have the nuclear mCherry marker side-by-side as a control. Response: Thanks for the suggestion. We have added the nuclear mCherry images side-by-side in all figure panels showing GFP expression images (Fig 1B, Fig 1C, Fig 5C, Fig EV2E, Fig EV2F).
It is perplexing that reporter expression correlates with histone modification status of the whole embryo (as in mutant and RNAi experiments Figure S5) but not the expression levels of nearby genes ( Figure   1I, S4). I do not agree with statements in the text such as "the measured chromatin activity is highly concordant with endogenous gene expression in the same cells" (pg 7) because the data show low Pearson R values 0.3-0.6. It appears that the reporter activity level at its various integrated locations is a poor predictor of endogenous gene activity.
Response: As detailed in our response to another relevant concern raised by this reviewer (P26), the correlation between GFP and endogenous expression is complicated by there being differential cis-elements associated with the reporter and endogenous genes. Regarding the correlation coefficient over a large genome interval (500 kb) in single cells, we meant to use the average endogenous gene expression over large genomic regions as proxy for chromatin activity because in doing so, the effects of differential promoter activities would be normalized.
Although the correlation is not very high, it is considerable and significant, making it reasonable to use this result to support that the measured position-effects indicate chromatin activity. We would also like to point out that single-cell gene expression data generated by scRNA-seq tend to be less reliable than bulk cell data, likely affecting the robustness of the correlation.
The term CAL and what data it encompasses should be more clearly stated in the main text.

Response: We have added an explanation of the chromatin activity landscape in the main text.
P7 -"Using GFP expression levels at different genomic positions as a sensor of chromatin activity, we constructed the distribution of chromatin activity across 113 genomic positions (termed the chromatin activity landscape) for all lineage-traced cells (Fig 1F-H and Table   EV3). For multiple replicates of GFP expression at the same integration site (range 2-8), only those expressed in more than 60% of replicates were considered as being expressed, and levels were averaged to represent the consensus chromatin activity." I am not convinced by the statement that "Given that many embryonic cells have similar developmental fates, this means the cellular CAL is sufficient for defining cellular states at a sub-tissue level." (pg 7) How does the data in Figure 1L prove this?

Response:
We apologize for this misunderstanding. We have significantly revised this section to better illustrate the point that cellular chromatin landscapes could distinguish cells at the sub-tissue level.

P12 -"We finally examined to what extent the cellular chromatin landscape can distinguish individual cells. We compared the GFP expression patterns across cells and calculated for each pair-wise comparison the number of integration sites at which the GFP expression
status is distinct. Intriguingly, for most cell-cell comparisons, the binary GFP expression at many integration sites was distinct, and at a considerable number of integration sites, the expression status can distinguish a cell from many other cells (Fig 2G) (Table EV7) The methodology for CDCA calculations is unclear and therefore the relevance of this data is not apparent. The authors should avoid vague terminology like "inter-cell dynamics".
Response: Sorry for the confusion. We have revised the text and added a figure panel (Fig 6A) to explain the idea better.  (Fig 6A), and identified nine clusters of genomic regions (720 pairs) exhibiting similar chromatin changes across cells, which we termed chromatin co-dynamic regions (Fig 6B and Table   EV10)."

26th Feb 2021 1st Revision -Editorial Decision
Thank you for sending us your revised manuscript . We have now heard back from two of the three reviewers who were asked to evaluat e your st udy. Unfort unat ely, aft er a series of reminders we did not manage to obt ain a report from Reviewer #2. In the int erest of time, and since the ot her two reviewers' recommendat ions are quit e similar, I prefer to make a decision now rat her than furt her delaying the process. You will see from the comment s below that the reviewers think that while the majorit y of the concerns have been addressed, several issues remain. We would therefore ask you to address these issues in the revised manuscript . On a more edit orial level, please address the following issues. Overall, clarificat ion of terms and addit ion of some more met hodology in the text has improved the manuscript . Addit ion of specific examples to illust rat e the point is also helpful. More det ails and labeling in the figures also helps.
My main cont inuing concern is regarding my major point . The figure shown in response doesn't really address this point as Euclidean dist ance oft en can be a funct ion of differences in magnit ude as well as pat tern. This can also be true for binarized dat a due to threshold effect s. A similar analysis but using Pearson correlat ion inst ead of Euclidean dist ance would be required to answer the quest ion. I also suggest showing a clust ered heat map of the quant it at ive measurement s (e.g. wit h cells on one axis and int egrant s on the ot her axis).
To illust rat e how this might look, I've included a heat map of expression (as report ed in Table EV3) wit h int egrant s (rows) clust ered and cells (columns) ordered as in the table. From this table I think it is clear that the vast majorit y of variance in the dat aset is differences in the overall int ensit y of the same underlying pat tern (a global scaling effect , e.g. int egrant 1 is about twice as bright as int egrant 2 in every cell). A very small number of int egrant s (2-3) show dramat ically different pat terns, and these int egrat ion sit es might be quit e int erest ing. It does appear that there may also be some (more subt le) differences in the overall pat tern bet ween int egrant s, but this looks to be on average at least an order of magnit ude smaller than the global effect .
One ot her small point -Wit h regard to "unique ident it ies," -is there evidence for subst ant ial leftright asymmet ry bet ween ot herwise symmert ric cells in this dat aset ? If not , it would be appropriat e to average these to allow more comparisons wit h the left -right symmet ric pairs from the Packer dat a.

Reviewer #3:
Zhao et al. have made significant efforts to improve the clarity of the text and the data visualization, which has greatly benefitted the manuscript as a whole. I think this study will be of high interest to the chromatin field, as it assesses genomic activity potential in the developing C. elegans embryo at the single-cell level using live imaging at an impressive scope. Most of my previous concerns about the robustness and accurate interpretation of the data have been addressed well by the authors and I support publication with some minor revisions (detailed below).
Minor points: The title and abstract are much improved. However, "chromatin dynamics" is still vague, I would suggest "chromatin activity dynamics" or something along those lines to be more specific, and immediately allow readers to differentiate from chromatin conformational/accessibility dynamics. Throughout the manuscript, "chromatin dynamics" and "chromatin co-dynamics" should also be changed/specified.
Pg.6: It should be stated here in the main text that cell lineages were reconstructed with StarryNite and AceTree with the appropriate citations.
Fig1G, 4C: It would be useful to have the quantification of GFP expression below each image (in the same colormap as 1F) because it is difficult to appreciate by eye the differences between integration sites.
When discussing the correlations between the chromatin activity landscapes/integration sites with epigenetic status, it would be interesting to point out when the authors found unexpected activity landscapes. Which cells have the most divergent landscapes from the population-based modENCODE epigenetic modification datasets? This would exploit a great aspect of their singlecell data -to find something new beyond global correlations (which are already interesting). Pg. 13 The authors state: "All told, we generated a lineage-resolved chromatin landscape that is dynamic, informative, and biologically relevant in indicating the functional state of chromatin and in defining cellular states." I don't think "defining cellular state" is accurate at this point in the manuscript, "reflecting cell type" or something similar would work.

Reviewer #1:
Overall, clarification of terms and addition of some more methodology in the text has improved the manuscript. Addition of specific examples to illustrate the point is also helpful. More details and labeling in the figures also helps.
Response: Thank you for these positive comments on the revised manuscript.
My main continuing concern is regarding my major point. The figure shown in response doesn't really address this point as Euclidean distance often can be a function of differences in magnitude as well as pattern. This can also be true for binarized data due to threshold effects.
A similar analysis but using Pearson correlation instead of Euclidean distance would be required to answer the question. I also suggest showing a clustered heatmap of the quantitative measurements (e.g. with cells on one axis and integrants on the other axis).
To illustrate how this might look, I've included a heatmap of expression (as reported in Table   EV3) with integrants (rows) clustered and cells (columns) ordered as in the table. From this 15th Mar 2021 2nd Authors' Response to Reviewers 2 table I think it is clear that the vast majority of variance in the dataset is differences in the overall intensity of the same underlying pattern (a global scaling effect, e.g. integrant 1 is about twice as bright as integrant 2 in every cell). A very small number of integrants (2-3) show dramatically different patterns, and these integration sites might be quite interesting. It does appear that there may also be some (more subtle) differences in the overall pattern between integrants, but this looks to be on average at least an order of magnitude smaller than the global effect.

Response:
Thanks for pointing this out. As requested by this reviewer, we have now provided the results based on calculating Pearson correlation coefficients for both quantitative and binarized expression levels. As shown in Fig 2C and 2D, the correlation coefficient of cellular GFP expression between replicates is significantly higher than between different integrants whether using quantitative (2C) or binarized (2D) expression data. We have also provided a heatmap in Figure 2A  "Having established that the inferred chromatin landscape is reliable, we next determined the extent to which chromatin activity changes across positions and cells (Fig 2A). We first examined whether the expression of Peef-1A.1::GFP changes considerably with genomic position, taking the expression variability at each position into account (Fig 2B). For both quantitative and binarized expression, analysis revealed that the correlation of GFP expression between different integration sites was significantly lower than that between replicates at a 4 given position (Fig 2C and D). Pair-wise comparisons likewise showed that the cellular pattern of GFP expression at a given integration site was, on average, distinct (R < 0.5) from 40% (quantitative expression) and 92% (binarized expression) of the patterns resulting from other integration sites. These results suggest that the eef-1A.1 promoter sequence does not significantly dominate the position effects on GFP expression." In addition, we would like to take the opportunity to clarify further why our results support that cellular expression of GFP changes (a readout of chromatin activity) considerably across integrants. First, the promoter used in this study (eef-1A.1, also known as eft-3) is a wellknown strong promoter, which might account for quantitative reduction in expression being more frequently observed than on/off changes. It is reasonable to imagine that a considerable quantitative reduction of chromatin activity assayed by a strong promoter could correspond to more dramatic (on/off) changes in many endogenous contexts. Thus, quantitative reduction of Peef-1A.1::GFP could be biologically relevant. Second, while selection of a binary cutoff could be relatively arbitrary, our cutoff for calling expression was carefully determined using the background expression in a strain without the GFP transgene (as described in the Materials and Methods). Thus, a value of 0 in our dataset explicitly indicates that GFP is not expressed.
In this regard, the results from calculating the fraction of integrants expressed in each cell (the last paragraph on Page 12) further support that on/off expression changes are frequent. Finally, using the Pearson correlation coefficient (R) as a measurement of similarity, we compared cellular expression (quantitative and binarized) among all integrants and determined that, on average, cellular expression at an integration site is distinct (defined as having an R < 0.5) 5 from 40% (using quantitative data) and 92% (using binarized data) of the cellular expression patterns of other integrants.
Together, we hope the new figure panels and the added clarification could at least partially address the concern raised by this reviewer regarding the level of expression change in individual integrants relative to the global pattern.
One other small point -With regard to "unique identities," -is there evidence for substantial left-right asymmetry between otherwise symmertric cells in this dataset? If not, it would be appropriate to average these to allow more comparisons with the left-right symmetric pairs from the Packer data.

Reviewer #3:
Zhao et al. have made significant efforts to improve the clarity of the text and the data visualization, which has greatly benefitted the manuscript as a whole. I think this study will be of high interest to the chromatin field, as it assesses genomic activity potential in the developing C. elegans embryo at the single-cell level using live imaging at an impressive scope. Most of my previous concerns about the robustness and accurate interpretation of the data have been addressed well by the authors and I support publication with some minor revisions (detailed below).

Response:
We thank this reviewer for the kind words and for supporting the publication of our manuscript.
Minor points: The title and abstract are much improved. However, "chromatin dynamics" is still vague, I would suggest "chromatin activity dynamics" or something along those lines to be more specific, and immediately allow readers to differentiate from chromatin conformational/accessibility dynamics. Throughout the manuscript, "chromatin dynamics" and "chromatin co-dynamics" should also be changed/specified.

Response:
Thank you for the suggestion. We have now changed "chromatin dynamics" to "chromatin activity dynamics", "chromatin divergence" to "chromatin activity divergence", and "chromatin co-dynamics" to "chromatin activity co-dynamics" throughout the manuscript.
Furthermore, to better highlight the data and findings of this work, we have changed the title to "Finally, the cellular resolution of our data enabled identifying potential cell-and positionspecific chromatin activity that is unable to be obtained from the bulk data of cell populations.
Although the chromatin activity described here is globally concordant with previously-defined chromatin regions for which a silent/active state is evident in bulk epigenomic data ( Fig EV2G   and H), this activity exhibited considerable cell-specificity at given genomic positions. For example, certain integrants located in regions with a global silent/active state exhibited divergent activity in specific cells (Fig 2I). Moreover, certain cells also exhibited chromatin activity landscapes that diverged from those predicted by cell population-based histone modification datasets (Ho et al., 2014). Correlation analysis comparing the observed chromatin 9 activity landscape in each cell with that predicted by a combination of 19 types of histone modifications revealed that in certain cells, the predictive power of bulk histone modifications was low (Fig 2J). Interestingly, these cells were significantly enriched for neuronal cells from the ABal lineage (Fig 2K, 3 Pg. 13 The authors state: "All told, we generated a lineage-resolved chromatin landscape that is dynamic, informative, and biologically relevant in indicating the functional state of chromatin and in defining cellular states." I don't think "defining cellular state" is accurate at this point in the manuscript, "reflecting cell type" or something similar would work.
Response: Thank you for the suggestion. We have changed the word "define" to "distinguish" throughout the manuscript.
Pg. 25: State what percentage of co-dynamic regions were within 1Mb vs further apart/different chromosomes.

Response:
We have now included the percentage in the main text.
"A small fraction (4%) of chromatin activity co-dynamic regions were linked on the chromosome at a distance of less than one Mb; most (80%) co-dynamic regions were located on different chromosomes."

22nd Mar 2021 2nd Revision -Editorial Decision
Thank you for sending us your revised manuscript . We have now heard back from the reviewer who was asked to evaluat e your st udy. As you will see the reviewer is sat isfied wit h the modificat ions made and thinks that the st udy is now suit able for publicat ion.
Before we can formally accept your manuscript , we would ask you to address the remaining minor issues raised by the reviewer. I thank the aut hors for the thought ful response to my major concern. I'm sat isfied wit h the current version alt hough suggest a small text ual change: 1) The most compelling argument in the authors' more detailed response is that quantitative changes are likely meaningful (especially given that they are consistent between replicates). I agree with this and suggest that a concise version of this description be added either immediately before or after paragraph 1 of the section "Cellular chromatin activity is dynamic and informative for distinguishing cellular states" 2) This is especially important because the other point made ("our cutoff for calling expression was carefully determined using the background expression in a strain without the GFP transgene...Thus, a value of 0 in our dataset explicitly indicates that GFP is not expressed." Isn't quite valid. As I understand this procedure, the threshold is conservative, so a value of 1 'explicitly indicates that GFP is expressed' but a zero value could indicate sub-detection expression. Indeed all live imaging microscopy platforms have a nonzero true expression level that cannot be detected. Consistent with this, most of the cells that have many "Zero" values in Figure 2A also have low quantitative levels in other integrants. For this reason I think that the quantitative changes are a stronger argument than the binary changes.