Evaluating the Arrhenius equation for developmental processes

Abstract The famous Arrhenius equation is well suited to describing the temperature dependence of chemical reactions but has also been used for complicated biological processes. Here, we evaluate how well the simple Arrhenius equation predicts complex multi‐step biological processes, using frog and fruit fly embryogenesis as two canonical models. We find that the Arrhenius equation provides a good approximation for the temperature dependence of embryogenesis, even though individual developmental intervals scale differently with temperature. At low and high temperatures, however, we observed significant departures from idealized Arrhenius Law behavior. When we model multi‐step reactions of idealized chemical networks, we are unable to generate comparable deviations from linearity. In contrast, we find the two enzymes GAPDH and β‐galactosidase show non‐linearity in the Arrhenius plot similar to our observations of embryonic development. Thus, we find that complex embryonic development can be well approximated by the simple Arrhenius equation regardless of non‐uniform developmental scaling and propose that the observed departure from this law likely results more from non‐idealized individual steps rather than from the complexity of the system.

Previous met a analysis of development al time and temperat ure (Gillooly et al 2002 Nat ure) has indicat ed that development al rat es and temperat ure follow the Arrhenius law across the animal kingdom. More recent and careful st udies of Drosophila by Eisen in 2014 find an Arrhenius like behavior for fly embryogenesis. Int erest ingly, this present manuscript finds that the apparent Ea for fly embryogenesis is not const ant for all st ages, which is cont radict ory to the Eisen st udy. They find significant but small variat ion in Ea bet ween st ages that might have been missed or overlooked by the previous st udy. One thing they observed was that the est imat ed Ea for fly versus frog embryos were somewhat different . Gillooly, West and colleagues had proposed a generalized model for scaling of development al rat e wit h tot al body mass -this rat e inversely scaled as M^.25. If one normalizes for the comparat ive mass difference bet ween a frog and fly embryo using this scaling fact or, do the values of Ea est imat ed in this st udy become more similar? In any event , this could be done and report ed, even if it does not appear normalized by mass. The aut hors should discuss their findings wit hin the cont ext of that West and colleagues st udy.
They convincingly show that their dat a does not exact ly fit the Arrhenius model but deviat es significant ly at temperat ure ext remes making a quadrat ic fit bet ter. This was most convincing for the fly dat a. However, in Fig 3BD -the BIC scores are lower for frog than fly by an order of magnit ude or more. So the quadrat ic is not really much of a bet ter fit for the frog. Do the aut hors have an explanat ion for the difference? The aut hors should tone down the generalit y of the nonlinearit y of the Arrhenius plot ting, and not emphasize it is some universal propert y of animal development .
They briefly explore a couple of explanat ions why there is nonlinearit y. One is a coupled model that I am not equipped to judge, and so will leave to ot her reviewers. The ot her explanat ion is a rat her offhand experiment using purified GAPDH (a key met abolic enzyme) to perform enzymat ic react ions at different temperat ures in vit ro. They find a similar nonlinear trend comparing react ion rat e to temperat ure. This result is pret ty weak. They use a fixed concent rat ion of subst rat es and enzyme, not account ing for enzyme kinet ic propert ies. The react ion will display Michaelis Ment en kinet ics, meaning at cert ain subst rat e concent rat ions relat ive to enzyme concent rat ion, the rat e will be more highly dependent on Km, the binding affinit y of subst rat es for enzyme. The experiment needs to be done wit h sat urat ing subst rat es so it is zero-order. Then the value of rat e, Vmax, will be proport ional to kcat , which is the rat e they really want to measure. Ot herwise, they int rude int o the thermodynamics of subst rat e binding, which can also be temperat ure dependent but not in the manner of their int erpret at ion. The aut hors should eit her: 1. remove this result and experiment 2. codify that the enzyme activities and substrate concentrations used are zero order across the ENTIRE temperature range 3. repeat these experiments with titration of substrate and determine Vmax across the temperature range.
Other comments: Fix the Chong et al citation/reference and other references missing information Use formal and standard means to present equations 1(E0?) and 2 (E1) on page 6 in body of the text.
Define what i and n are in those equations.
In Figs 3 and 4 -what do the error bars represent? And why are there large error bars for temperature? Was it not a highly controlled and invariant variable?
In Methods it is not stated how temperature was measured in the room. Hopefully not relying on a thermostat. Was there a thermometer placed on the stage? More information is needed on how they measured this important variable and how often over the course of an experiment.
Why did they use a Drosophila klarsicht mutant for their experiments? Why not a simple wildtype lab strain? Since the mutant affects microtubule-dependent cargo transport inside cells, are the results a possible artefact of the mutant? And what genotype were the fathers? It was not stated.
Show actual images of the frog and fly embryos at the different stages captured from typical movies. I cannot judge the quality of imaging, which is critical if one wants to unambiguously define the time when a particular stage or event is reached. Charcoal sketches of embryos at different stages in the figures are not really acceptable. Here, the authors expand on this by providing a more detailed characterisation of the activation energy of different developmental stages. They then demonstrate that the behaviour, while providing a close fit to an Arrhenius-like process, is in fact more correctly described by a more non-linear approximation. Finally, they include analysis of frog early development and draw interesting parallels in the behaviours between the species.
Overall, this is an interesting piece of work that adds substantially to our knowledge of how temperature affects the timing of developmental processes. However, I have a number of concerns that need addressing.

Major issues
The explanation of the experimental results needs significant improvement. In particular, the authors are claiming evidence for relatively subtle variations in developmental time. Yet, I cannot find, in the main paper or the SI, any mention of number of samples. The legends are also similarly opaque. In the SI it is stated that 3-4 fly embryos were imaged at each time. How many in total for each temperature? What was the variability in measurements between independent experiments at the same temperature? Was a power analysis performed to estimate beforehand the number of embryos required for analysis? As shown by Chong et al. there is significant temporal variability that needs to be carefully accounted for. For example, the measured Ea are likely unreliable without sufficient n (likely > 40 embryos per condition). With the current methodology as presented, I do not have confidence that the results are supported with sufficient statistical rigour.
In the modelling, there is reference to the "curvature of the system". However, as far as I can see, this is not defined. Is it referring to the curving away from linearity? This lack of clarity makes this section very challenging to read, as it's not particularly clear what the authors are trying to conclude.
In the simulations, a range of EA and A are used to simulate different conditions. But, what about stochasticity? It is likely that even two "identical" embryos would show variability in the timing of developmental events due to noise in processes such as transcription factor binding. How does the inclusion of intrinsic noise affect the predictions? This is important, as the authors claim that it is "impossible" to achieve a fit to their data only with Arrhenius-like behaviour. But, they haven't (as far as I can see) accounted for potential stochastic variation that is likely present.
It would help to find a better way to describe the linear and quadratic fittings. The current use of "Arrhenius space" is non-standard and likely to cause more confusion. Given that MSB is a broad biology journal, the authors need to improve the description of these results.
In the Introduction, the authors clearly lay out the open questions and how their work extends on existing results. However, at times there is some redundancy with existing work that should be clarified. For example, the first paragraph of the Discussion states: "We have shown that embryonic fly and frog development can be well approximated by the Arrhenius equation over each organism's core viable temperature range." But this is already well known (Kuntz and Eisen 2014, Chong 2018) and is not the new result of this manuscript. At a minimum, previous work should be more appropriately cited. Figure 1D, stage "L" in the frog is highly variable. Is this because of low "n", or is this a consistently reproducible phenotype? Further discussion would help here. Relatedly, the frog data is much noisier than the Drosophila. Is this due to low statistical power or is the variability a feature of the early frog development? Currently, given the lack of important details (see point 1), it is difficult to ascertain. Figure 2C. Given the large error bars (and the uncertainty about number of experiments), only D-E seems like it really could be described as meaningfully different from the other stages. This suggests that EA is actually quite constrained during development, with perhaps one or two exceptions. It would be interesting to see further analysis of this, particularly why D-E is different. It would be helpful to add the average time between the events and the variability (as quantified in Chong et al. 2018). The error in developmental timing behaves non-trivially with temperature, and this does not seem to have been carefully accounted for here. I would suggest using the figures from Figure S4 instead in Figure 3. They are more convincing that the quadratic behaviour is real as they more precisely deal with the question of outliers.

Minor issues
In the last sentence of the abstract, the conclusion given is too strong. While the modelling and experiments support that individual steps, rather than system complexity, drive the Arrhenius-like behaviour, this is far from definitive. This statement needs to be more suitably expressed to acknowledge uncertainty in the results.
The paper is littered with grammatical errors. In the revision, these should be addressed. For example, the first paragraph of the Introduction has multiple tense confusions, making reading more challenging than it should be.
The legends are overly wordy. For example, in Figure 4D, there is a description of why the experiment was done, which effectively repeats what's in the main text. This is unhelpful, as it makes finding out key information about what is actually plotted more laborious.
The references are littered with errors.
Reviewer #3: One of the striking features about embryonic development is its robustness against genetic and environmental insults. One such insult, of special relevance to species that develop outside their mother, is that of temperature fluctuations. Specifically, various previous experiments had suggested that the timing of development scales with temperature in a manner consistent with the Arrhenius equation. This scaling suggested that development is dominated by one overarching rate-limiting step.
Crapse et al. set out to test whether it is true that overall development scales following the Arrhenius relation. To make this possible, they measured the timing of developmental features during fruit fly and Xenopus embryogenesis. They found that, at first glance, the timing between each developmental stage follows an Arrhenius equation, albeit with different parameters suggesting the existence of distinct rate-limiting steps. However, they also identified deviations from this simple scaling at the extremes of the observed temperature ranges, indicating a breakdown of Arrhenius in some of the fundamental reactions dictating development.
While the data for the fly is quite similar to that obtained by Kuntz and Eisen (PLoS Genetics 10:e1004293, 2014), the data for frogs is novel. Further, the BIC-based analysis to determine whether alternative models to Arrhenius can better explain the data, and the simulations meant to prove how multiple rate-limiting steps conspire to dictate the overall temperature scaling of development are novel. These analyses constitute an exciting opportunity to dig deeper in the nature of the fundamental biochemical reactions dictating animal development.
Major Comments: 1) The bulk of this paper focuses on re-doing the experiments done in the Kuntz and Eisen paper using only Drosophila melanogaster. They subdivided the developmental stages quite differently and, with these different divisions, found conflicting results with the Eisen paper. Perhaps the difference just comes down to the fact that the authors are looking at much smaller time windows, as they suggest. Did the authors attempt to compare the Kuntz and Eisen data more directly? We believe those data are publicly available. Additionally, though the authors claim that they looked at a broader range of temperatures, this doesn't seem to be the case. Kuntz and Eisen measured development times between 15-30 ºC (and actually began with a wider range of temperatures at the outset), whereas the authors of this paper look at 12-27 ºC.
2) The manuscript makes comparisons between the concavity of the different data sets and of the simulations. It would be great to show those comparisons explicitly in, for example, a plot.
3) We found the argument connected to the in vitro enzymatic assay to be somewhat uncompelling. Though it's true that many enzymes do not have an Arrhenius-like relationship to temperature across their full range of activity, and they specifically show that to be the case for GAPDH, the direct connection to the temperature-dependence of the overall developmental rate seems tenuous. Why do the authors assume that the underlying molecular scale processes are the same across all temperatures, including stress conditions? What if heat stress or cold does something beyond affecting the rates of the processes that occur at the core temperatures by activating different pathways entirely? Additionally, when considering coupled reactions, which step is rate limiting often depends on the temperature because differences in activation energy can cause different steps of a single multi-step process (e.g. transcription) to be slower at different temperatures. This causes the concave downward temperature relationship described, even though it arises from the coupling of two Arrhenius temperature relationships as shown, for example, by Roe et al., J Mol Biol 184:441 (1985).
Minor comments: 1) Line numbers would have been helpful.
2) There are various typos and grammatical errors throughout the manuscript.
3) Broken reference: Chong et al. 4) Is the "integrated frequency factor" defined anywhere in the text? 5) Figure 1A: Is it "yolk" instead of "yoke"? "Yoke" is mentioned in the movie as well. 6) Much of the important data are presented in Figures 1C and D, but we found them extremely hard to read, even when they are blown up in one of the supplementary figures. The color scheme and the legend are also amplifying this particular issue. Could the authors plot them with a log scale on the x-axis or as a proportion of total developmental time as was done in the Kuntz and Eisen paper? 7) Figure 2C, caption: You're showing the interval, not the endpoint on the x-axis labels, right? 8) Figure 4C: Some graphical way to show the difference in curvatures to make it clear that there is a modest divergence would be helpful.
9) The methods section needs to be significantly overhauled: 9.1) The math is very hard to follow with variables left undefined in the text before they appear in the equations. 9.2) The computational approach is not described in sufficient detail. 9.3) Figure S2A: What's PMG? 9.4) Figure S5A: The labels on the plot are pretty obscure. 9.5) The description of temperature control and monitoring throughout the experiments was also insufficient in our opinion. Why did the authors abandon the microfluidic setup used in the same research groups for previous papers? Further, given that the room temperature was controlled, did the authors also monitor the sample temperature? How is the temperature of acclimated water monitored and maintained?
Response to Reviewers: 12th Jan 2021 1st Authors' Response to Reviewers This is an interesting study of embryonic development and temperature, specifically looking at the tempo or rate of development as a function of temperature. It has long been observed that many biological processes follow a temperature dependence that can be approximated or predicted by the Arrhenius equation. It has been thought that the underlying chemistry governing these biological processes is responsible for the Arrhenius effect. The authors use videomicroscopy to time the appearance of specific morphological landmarks during Drosophila and Xenopus embryogenesis. They then measure the timing of these events as embryos are raised at different temperatures. The study is fairly rigorous and quantitative.
We thank the reviewer for his/her thoughtful comments. Please find detailed responses to the individual points raised below shown in blue.
Previous meta analysis of developmental time and temperature (Gillooly et al 2002 Nature) has indicated that developmental rates and temperature follow the Arrhenius law across the animal kingdom. More recent and careful studies of Drosophila by Eisen in 2014 find an Arrhenius like behavior for fly embryogenesis. Interestingly, this present manuscript finds that the apparent Ea for fly embryogenesis is not constant for all stages, which is contradictory to the Eisen study.
They find significant but small variation in Ea between stages that might have been missed or overlooked by the previous study. One thing they observed was that the estimated Ea for fly versus frog embryos were somewhat different. Gillooly, Wes t and colleagues had proposed a generalized model for scaling of developmental rate with total body mass -this rate inversely scaled as M^.25. If one normalizes for the comparative mass difference between a frog and fly embryo using this scaling factor, do the values of Ea estimated in this study become more similar? In any event, this could be done and reported, even if it does not appear normalized by mass. The authors should discuss their findings within the context of that West and colleagues study.
Thank you for the suggestion. Please note that when scaling developmental rates by They convincingly show that their data does not exactly fit the Arrhenius model but deviates significantly at temperature extremes making a quadratic fit better. This was most convincing for the fly data. However, in Fig 3BD -the BIC scores are lower for frog than fly by an order of magnitude or more. So the quadratic is not really much of a better fit for the frog. Do the authors have an explanation for the difference? The authors should tone down the generality of the non-linearity of the Arrhenius plotting, and not emphasize it is some universal property of animal development.
We believe the BIC values in the previous submission were more convincingly non-linear for fly over frog mostly due to the more consistent data-quality of our fly data compared to the frog data. We observed that frog embryo developmental progression seems to be somewhat clutch dependent (embryos from the same mother tend to develop with similar speeds). For technical reasons, in our study embryos observed under the same temperature often share the same mother frog, while fly embryos observed under the same temperature tend to come from different mothers. We added the following statement to line 130 in our manuscript explaining this: "Compared to the fly data our frog data appears to be noisier, likely due to similarities within a clutch. For technical reasons, frog embryos observed in this study under the same temperature tend to share a common mother while observed fly embryos were from different mothers" We strengthened our frog embryo analysis by collecting and analyzing additional embryos, raising our average n for these BIC calculations from 90 to 135 embryos per developmental stage. Furthermore, we removed some clutches from our data at 18.2 C, where the observed time with developmental progression is clearly discontinuous compared to time-series of other embryos at nearby temperatures. We suspect that this clutch was either consisting of sick embryos or that the temperature control might have been off. In the graph below the embryos that were removed from further analysis are highlighted with a red rectangle.
Legend : Different colored lines show the developmental time in frog since 3rd cleavage (T0) for each stage we investigated. Error bars show the standard deviation among data points. Error in temperature is the standard error of the temperature recorder as reported by the manufacturer.
Additionally, we noticed that we had poorly scored stage "I" (beginning of gastrulation) initially . For this reason, we rescored stage I for our "old" data. With these improvements the frog BIC values became convincingly more quadratic, as seen in our new figure 3 (line 698): We also increased our fly embryo sample size from 52 to 73. The additional data strengthened the quadratic BIC preference for fly data. Overall, the fly data remains more convincingly non-linear, as the fly data remains of higher data quality than the frog data.
Nevertheless, we believe that we now have strong statistical evidence that both frog and fly data are clearly quadratic as the majority of all developmental intervals are now double digit ln(Lq/Ll). They briefly explore a couple of explanations why there is nonlinearity. One is a coupled model that I am not equipped to judge, and so will leave to other reviewers. The other explanation is a rather offhand experiment using purified GAPDH (a key metabolic enzyme) to perform enzymatic reactions at different temperatures in vitro. They find a similar nonlinear trend comparing reaction rate to temperature. This result is pretty weak. They use a fixed concentration of substrates and enzyme, not accounting for enzyme kinetic properties. The reaction will display Michaelis Menten kinetics, meaning at certain substrate concentrations relative to enzyme concentration, the rate will be more highly dependent on Km, the binding affinity of substrates for enzyme. The experiment needs to be done with saturating substrates so it is zero-order. Then the value of rate, Vmax, will be proportional to kcat, which is the rate they really want to measure. Otherwise, they intrude into the thermodynamics of substrate binding, which can also be temperature dependent but not in the manner of their interpretation.
The authors should either: 1. remove this result and experiment 2. codify that the enzyme activities and substrate concentrations used are zero order across the ENTIRE temperature range 3. repeat these experiments with titration of substrate and determine Vmax across the temperature range. Thanks for the suggestion. As suggested, we have performed the experiment (3) titrating substrates into the saturated regime and added a similar activity assay for another enzyme (beta Galactosidase). However, we would like to point out that we are agnostic about the molecular mechanism leading to enzyme/embryo kinetic showing nonlinear behaviour in the Arrhenius plot e.g. a change of enzyme substrate affinity with temperature would be entirely consistent with our observations and could contribute to the observed non-linearity in the Arrhenius plot in embryonic development.
We added a supplementary figure showing that over the entire temperature range of our activity assay GAPDH is approximately in the saturated regime. Please note that for technical reasons we were only able to increase the concentration of GAP to ~0.5mM, which is approximately the physiological GAP concentration ((PubMed ID: 2200929, 2200929; 2200929, 4578278). At this concentration, GAPDH seems to be saturated for most over the temperature range assayed, except 50 ^oC. As shown in the following appendix figure seen on line 67: To further address the reviewer's comment and to generalize our findings, we additionally added the enzymatic assay of beta-galactosidase activity on Ortho-Nitrophenyl-β-galactoside (ONPG), which we found to be easy to assay in the saturated regime. We ensured that the substrate concentration (10mM) was at saturation throughout the entire temperature range by sampling several temperatures over said range at twice the concentration (and near ONPG solubility limits) (Fig. Appendix S4B, line 71 in the appendix).
Also this enzyme (at 0-order) shows strong non-linearity in the enzymatic assay. We would like to stress that these two enzymes are the only ones we tested, both are clearly non-linear in the Arrhenius plot. Shown below are updated graphs for GAPDH (  We introduce our beta-Galactosidase in our text at line 288 with the following sentences. "Additionally we have assayed another common enzyme, beta-Galactosidase, monitoring the 0-order conversion of Ortho-Nitrophenyl-β-galactoside at 420 nm (Fig. EV5), where we find similar results to our GAPDH assay. As with developmental data we find that GAPDH and beta-Galactosidase activity follows concave downward behavior." In Methods it is not stated how temperature was measured in the room. Hopefully not relying on a thermostat. Was there a thermometer placed on the stage? More information is needed on how they measured this important variable and how often over the course of an experiment.
As the reviewer suspects, we controlled the temperature to the best of our ability and monitored temperature stability by placing a thermometer directly next to the sample on the stage. We added additional text to the supplement to clarify this point. As seen starting on line Both fathers and mothers were klarsicht mutants. This mutant was used because it is much easier to score developmental stages compared to wildtype embryos, allowing for increased specificity, repeatability, and accuracy. Furthermore, various previous studies suggest that developmental progression is similar to wild type flies despite divergent biology associated with the mutant (ISBN-10: 0947946454).
Show actual images of the frog and fly embryos at the different stages captured from typical movies. I cannot judge the quality of imaging, which is critical if one wants to unambiguously define the time when a particular stage or event is reached. Charcoal sketches of embryos at different stages in the figures are not really acceptable.
We believe the exaggerated sketches make it easy to understand the scoring criteria in the figures. Of course, this does not replace the need to show the actual data. We have uploaded all time-laps raw data underlying this study to the ASCB imaging server (Frog: http://cellimagelibrary.org/groups/53201 , Fly: http://cellimagelibrary.org/groups/53226 ) . We are still working with the server administrators to improve available image quality and adjust the metadata. In the meantime our data can be accessed via the following anonymous google drive link: https://drive.google.com/drive/folders/1zdMgagG6YJo8i7y2haCfq4Ian2QSTTP3?usp=sharing .
Please note the drive preview does not display the video in full resolutions. To access high quality movies please download the videos.
Lastly, we provide one example time-lapse for frog and fly development together with this manuscript as supplemental movies (Appendix Movies S1 and S2). In these supplemental movies, we added stamps and halted the movies clearly defining the timing and criteria of the scored stages. Here, the authors expand on this by providing a more detailed characterisation of the activation energy of different developmental stages. They then demonstrate that the behaviour, while providing a close fit to an Arrhenius-like process, is in fact more correctly described by a more non-linear approximation. Finally, they include analysis of frog early development and draw interesting parallels in the behaviours between the species.
Overall, this is an interesting piece of work that adds substantially to our knowledge of how temperature affects the timing of developmental processes. However, I have a number of concerns that need addressing.

Major issues
The explanation of the experimental results needs significant improvement. In particular, the authors are claiming evidence for relatively subtle variations in developmental time. Yet, I cannot find, in the main paper or the SI, any mention of the number of samples. The legends are also similarly opaque. In the SI it is stated that 3-4 fly embryos were imaged at each time.
How many in total for each temperature? What was the variability in measurements between independent experiments at the same temperature? Was a power analysis performed to estimate beforehand the number of embryos required for analysis? As shown by Chong et al.
there is significant temporal variability that needs to be carefully accounted for. For example, the measured Ea are likely unreliable without sufficient n (likely > 40 embryos per condition). With the current methodology as presented, I do not have confidence that the results are supported with sufficient statistical rigour.
Thank you very much for bringing this to our attention. We have made sure that all results are clearly accompanied by their sample number (along with other important statistics and information) and uploaded Appendix Tables 3 and 4 with the time-intervals of the scored embryos.
Additionally we have shown below the power with respect to the number of samples for the given effect size where we claim a significant difference between activation energies (for an example stage in fly comparing stages D-E to J-K). As seen in the graph a sample size of 10 should be sufficient to give a power well above 0.8, whereas our data for this comparison has a sample size of [100, 135] with an average of 5 embryos per measured temperature spanning ~12-26 ^oC. We have reported power in the main text where we claim significant differences between stages in flies (line 153).  about stochasticity? It is likely that even two "identical" embryos would show variability in the timing of developmental events due to noise in processes such as transcription factor binding.
How does the inclusion of intrinsic noise affect the predictions? This is important, as the authors claim that it is "impossible" to achieve a fit to their data only with Arrhenius-like behaviour. But, they haven't (as far as I can see) accounted for potential stochastic variation that is likely present.
We have indeed not taken stochasticity into account when previously running our simulations of our model.
To simulate the effect of intrinsic noise on our model we have introduced gaussian noise to the prefactors with a standard deviation of 10%. The 10% is a bit more than the CVs we observe for the variability of time-intervals for our biological data (Reference Figure here).
Shown below are the mean and standard deviation (blue error bars) for these simulations. In magenta, we show the underlying deterministic model: It appears that the simulation of intrinsic noise does not result in a systematic deviation but rather introduces noise around the deterministic model. .
It would help to find a better way to describe the linear and quadratic fittings. The current use of "Arrhenius space" is non-standard and likely to cause more confusion. Given that MSB is a broad biology journal, the authors need to improve the description of these results.
Thank you for pointing this out. We have revised the text removing the mentioning of "Arrheniu Space" in the paper. We have revised the text so as to discuss linear or quadratic fits in our Arrhenius plots such as below (line 317): "we observe that the relationship between temperature and developmental rates in both species is confidently better described by a concave downward quadratic  Figure 1D, stage "L" in the frog is highly variable. Is this because of low "n", or is this a consistently reproducible phenotype? Further discussion would help here. Relatedly, the frog data is much noisier than the Drosophila. Is this due to low statistical power or is the variability a feature of the early frog development? Currently, given the lack of important details (see point 1), it is difficult to ascertain.
Stage L is the latest point in development we score in frog. For this reason developmental issues have more time to accumulate, potentially terminating some of our scored embryos. For this reason it is true that stage L has relatively fewer "n" than other stages, however we do not feel this "n" is low by any standards.
We have additionally taken care to add more data, specifically 27 additional embryos at stage L. Furthermore, we have updated the error bars to the commonly used standard deviations, rather than the previously used 95% (~2x standard deviation). Finally to more adequately represent our data's trend and we combine previous data of similar temperature, correcting the error in time and temperature to account for this combination.
This updated data can be seen in the following:  We apologize for not having clearly defined the error bars in our previous submission.
Those indicated 2x standard deviation (~95%). In the updated manuscript we clearly defined each error bar. In figure 2C&D we now use 68% confidence intervals. Together with the additional data acquired the error bars are much smaller than in the previous submission, as seen below with these updated figures. I would suggest using the figures from Figure S4 instead in Figure 3. They are more convincing that the quadratic behaviour is real as they more precisely deal with the question of outliers.
We believe that consistent behavior between the two different species observed, as well as corroboration from our enzymatic assays shows that extreme temperatures are indeed a biological phenomenon, rather than statistical outliers. For this reason we feel it important to show non-linearity over the entire organism's viable temperature range that we sampled, rather than arbitrarily choosing a more linear part of temperature range and for this subset testing for non-linearity.

Minor issues
In the last sentence of the abstract, the conclusion given is too strong. While the modelling and experiments support that individual steps, rather than system complexity, drive the Arrhenius-like behaviour, this is far from definitive. This statement needs to be more suitably expressed to acknowledge uncertainty in the results.
We have updated the last sentence of the abstract as follows to address the reviewer's concern: Original: "Thus, we find that complex embryonic development can be well approximated by the simple Arrhenius Law and propose that the observed departure from this law results primarily from non-idealized individual steps rather than the complexity of the system." Now as seen on line 28: Updated: "Thus, we find that complex embryonic development can be well approximated by the simple Arrhenius Equation regardless of non-uniform developmental scaling, and propose that the observed departure from this law likely results more from non-idealized individual steps rather than from the complexity of the system." The paper is littered with grammatical errors. In the revision, these should be addressed. For example, the first paragraph of the Introduction has multiple tense confusions, making reading more challenging than it should be.
We apologize for the grammatical errors and have done our best to ensure the paper is consistent in regards to tense as well as cleaning up other grammatical issues.
The legends are overly wordy. For example, in Figure 4D, there is a description of why Standard error is shown as blue error bars (n = 2-4)." The references are littered with errors.
Thank you for pointing this out. Unfortunately, only noticed after our submission that the zotero links in the manuscript were broken. We have fixed these problems.
Reviewer #3: One of the striking features about embryonic development is its robustness against genetic and environmental insults. One such insult, of special relevance to species that develop outside their While the data for the fly is quite similar to that obtained by Kuntz and Eisen (PLoS Genetics 10:e1004293, 2014), the data for frogs is novel. Further, the BIC-based analysis to determine whether alternative models to Arrhenius can better explain the data, and the simulations meant to prove how multiple rate-limiting steps conspire to dictate the overall temperature scaling of development are novel. These analyses constitute an exciting opportunity to dig deeper in the nature of the fundamental biochemical reactions dictating animal development.
Major Comments: 1) The bulk of this paper focuses on re-doing the experiments done in the Kuntz and Eisen paper using only Drosophila melanogaster. They subdivided the developmental stages quite differently and, with these different divisions, found conflicting results with the Eisen paper.
Perhaps the difference just comes down to the fact that the authors are looking at much smaller time windows, as they suggest. Did the authors attempt to compare the Kuntz and Eisen data more directly? We believe those data are publicly available. Additionally, though the authors claim that they looked at a broader range of temperatures, this doesn't seem to be the case.
Kuntz and Eisen measured development times between 15-30 ºC (and actually began with a wider range of temperatures at the outset), whereas the authors of this paper look at 12-27 ºC.
Despite extensive searching, unfortunately we were unable to locate the bulk of the Kuntz and Eisen datasets in spreadsheet or video form. Because of this, we were therefore unable to make direct comparisons because of the differing scoring criteria used between our paper and theirs.
In regards to the temperature range we apologize for the confusion. We collected data for fly embryos from 10 -33.1C, which covers a wider range that Kuntz and Eisen. However we analyzed this data in two manners, one which used this temperature range in its entirety to 3) We found the argument connected to the in vitro enzymatic assay to be somewhat uncompelling. Though it's true that many enzymes do not have an Arrhenius-like relationship to temperature across their full range of activity, and they specifically show that to be the case for We have added line numbers to the updated manuscript.
2) There are various typos and grammatical errors throughout the manuscript.
Thanks for pointing this out. We have done our best to eliminate typos and grammatical errors.
Our apologies, our reference link broke before submission without us noticing. We have fixed this for the resubmission. 4) Is the "integrated frequency factor" defined anywhere in the text?
Thank you for pointing this out. We never explicitly define what the integrated frequency factor is. We have referenced it as an "integrated" frequency factor (A) based on the number of duplicate reactions present in a collapsed reaction network. We have revised our introduction of this term to ensure this is more explicit in the revision. A revised version can be seen here at line 86 in our main manuscript: " In this case, coupled chemical reactions would collapse into a common Arrhenius equation with one master activation energy and integrated frequency factor, which combines each reaction's individual frequency factor into one. " 5) Figure 1A: Is it "yolk" instead of "yoke"? "Yoke" is mentioned in the movie as well.
Thank you for pointing this out. We have corrected this mistake throughout the manuscript.   As the reviewer suspects, we are showing the interval (start and end points, for example C-D). We reviewed our material to make sure captions/legends are consistent with the actual figures as they were often not clear. We revised the figure legend of 2C specifically to make this point more clear, adding the following to line 676: "The x-axis is labeled with the developmental interval, marked by start and endpoint." 8) Figure 4C: Some graphical way to show the difference in curvatures to make it clear that there is a modest divergence would be helpful.
This is a valid concern that we should have realized earlier. The deviation between 4C and 4D is barely noticeable, additionally the simulated data points further obscure the modest divergence from linearity. To better represent our points we have combined figures 4C & D into one figure with two y-axis to compare the two curves to an overlapping linear fit, as well as removed the superfluous data points.
The original figures and figure legends were as follows: 9) The methods section needs to be significantly overhauled: 9.1) The math is very hard to follow with variables left undefined in the text before they appear in the equations.
Thank you very much for bringing this up. The math in the supplemental material (appendix) has been reformatted and variables are now properly explained in the text when they are first used. Additionally we have made efforts to incorporate math into text more so as the logical flow can be more easily followed. An example of such reformatting from the appendix on line 123 is as follows: "First, we show that a relaxation time scale formulation can adequately yield the known τ=1/k, where τ is the reaction network's time constant and k is the rate constant, assuming a simple transition from some stage A →B. Here, A_0 is the initial amount of A and B(t) is the amount of B at time t, and we define a function R(t), which defines the fraction of the total mass that hasn't been converted to the final product." 9.
2) The computational approach is not described in sufficient detail.
We have explained our computational approach in more detail and what these methods are and how they give results. Additionally our code can be found at the following link: Optimized values for E a and A were then substituted into equation (3) to predict rates at similar temperature points as investigated in our time-lapse experiments for the associated reaction network size as seen in Fig. 4C.
For the random simulated network rand() was used to choose random E a and k within reasonable bounds (Lepock). Reasonable k were determined as the inverse of embryonic states between 1 second and 3 days. Rand() results were then fed into equation (3) to predict the overall network as displayed in Fig. 4B"   9.3) Figure S2A: What's PMG?
Thanks for pointing this out. PMG is supposed to mean posterior midgut. We updated the figure legend to better spell this out, seen on line 731: "Seven additional developmental events (scores) in fly that were later cut. PMG indicates Posterior Midgut." 9.4) Figure S5A: The labels on the plot are pretty obscure.
Thanks for pointing this out. We have updated the labels on the plots to make them easier to understand. The original S5A is as follows: Here is the updated Figure S5A 9.5) The description of temperature control and monitoring throughout the experiments was also insufficient in our opinion. Why did the authors abandon the microfluidic setup used in the same research groups for previous papers? Further, given that the room temperature was controlled, did the authors also monitor the sample temperature? How is the temperature of acclimated water monitored and maintained?
The microfluidics device is not compatible with frog embryos. As for fly embryos we chose the most stable and consistent method for recording our long developmental videos. For this reason we chose a highly controlled ambient temperature control method to ensure consistent image quality at our high magnification that allowed us to run several experiments simultaneously.
We have updated the description of temperature control and monitoring throughout the experiment in the updated Materials and Methods, as seen at line 429.
" A temperature recorder (Elitech RC-5) was placed near the embryo on the microscope stage to record temperature over the experiment's duration. " We have compared our temperature recording method (data logger by the embryo cage) to a record of the solution temperature (using an aquatic thermometer); For some temperatures the results are shown below. Each experiment was performed after allowing the controlled temperature chamber to equilibrate for several hours. There is little to no difference between how the temperature was recorded. For the analysis throughout the paper we used the measured ambient temperature of the stage next to the embryo for both frog and for fly.
Legend : Table showing a control test comparing the used ambient temperature recordings against actual temperature of water. Temperatures are shown in ^oC.

22nd Feb 2021 1st Revision -Editorial Decision
Thank you for sending us your revised manuscript . We have now heard back from the three reviewers who were asked to evaluat e your st udy. You will see from the comment s below that the reviewers think that while the majorit y of the concerns have been addressed, several import ant issues remain. In principle, our edit orial policy only allows a single round of major revision. However, we think it is import ant to address Reviewer #3's concerns wit h regard to 1) a direct comparison to exist ing relat ed dat a and 2) temperat ure cont rols, and to discuss Reviewer #1's concerns about the non-physiological temperat ures. Therefore, we would ask you to address these point s toget her wit h all ot her comment s from the three reviewers in an except ional second round of revision. On a more edit orial level, please do the following. 1, They test ed GAPDH at 2 subst rat e concent rat ions to det ermine if the thermodynamic behavior is concent rat ion dependent . They said they did tit rat ions to ensure subst rat e was sat urat ing. However nowhere in the manuscript did they present the dat a. Nor did they do so for the new enzyme they assayed, bet agalact osidase. This dat a should be added in supplement plus a descript ion as to why it was done and what it means. I will not be the only reader who wonders about the simplicit y of the result s and they need to make an effort to show they did it ; 2. They have STILL not fixed the Chong citation 3. They uploaded the raw imaging data to two separate databases. One, cellimagelibrary.org has the files that can be run but no metadata is available for each movie. Thus I didnt know which temperature or replicate each specimen was being imaged. Deposited data MUST be properly annotated. The data on googledriv.com can only be downloaded to a local computer. The data is huge! I did not download it and so I cannot see whether those were properly annotated. But the data should be accessible and viewable on the cloud and not necessarily on a local computer 4. I did not make a note of the breadth of typos and various errors in the original manuscript but I certainly noticed them. The other reviewers made note of it though, and although the authors say they fixed the errors, if they fixed those, then they made perhaps even more. The manuscript is rife with errors. Sloppy is the word. For example, they sometimes write galactosidase (correct) and sometimes galactidase (not). 5. And by sloppy I am not simply saying they are sloppy with spelling English words. Appendix S4 in the rebuttal they write 0.0028um GAPDH. In the methods they write they used 0.0028mM, which is 2.8 uM. The whole seems like it was hastily put together without checking and rechecking for errors, which is an essential part of putting a scientific manuscript together. I worry that if a similar cavalier attitude was taken in doing the experiments and their analysis, then it too is filled with errors.
Finally, the authors really do not address the non-physiological temperatures at which the non-linear data reside with respect to the Arrhenius law. For the fly frog and enzyme experiments, the outliers from linearlity reside in the extreme temperatures above and below the "sustainable living" range for each animal, ie 30, 37 and 8 C are not sustainable for Drosophila. For the enzymes, taken from bacteria or animals that live at a constant 37C, the outliers sit at temperatures of 50, 45, 41C. In the discussion they need to put their claims of regions where development does not fit Arrhenius in this context. Acute temperature stress response utilizing specialized mechanisms may modulate the cell and molecular scale events occuring during development. I know they have a sentence buried amongst some nonsense about rate limiting biochemistry. But effort needs to be made to note the non-physiological range of these behaviors and what that implies. As they stated at one point in the rebuttal, they are more agnostic. That model might be the more correct one than the one they push, and so they should be as agnostic as possible since they have no experiments to support or refute that hypothesis.
In conclusion, the authors could have responded to my comments and those of my colleagues more to heart and really modified the manuscript to thoroughly incorporate the responses into the meat of the manuscript and thereby improve it. The present response is inadequate. I have one specific comment that requires further clarification: In Appendix Figure S1B, the EA for C-D (I think), suddenly drops to around ~55kJ/mol. Yet, this represents a very small time period of the total development of the Drosophila embryo. I'm still not convinced this observation of decreased EA has any specific physiological relevance. I don't expect a detailed analysis, but it would be good to comment on this more explicitly -currently, the authors only highlight the statistical significance (Fig. 2C). What (if any) biological relevance there is to this observation remains under-developed Another point is that the references still appear to have errors. Of course, this can be picked up at the proofing stage, but care should be taken that they are all correct.
Reviewer #3: The authors have taken care of the vast majority of our concerns. Two minor issues remain, however: 1) Why is it so hard to find the Kuntz and Eisen data to perform a direct comparison? A quick search on the Internet gave: i) https://figshare.com/articles/dataset/Raw_data_for_Kuntz_and_Eisen_2015_Oxygen_changes_drive_non_uniform_scaling_in_Drosophila_melanogaster_embryogenesis_ ii) https://github.com/sgkuntz iii) https://datadryad.org/stash/dataset/doi:10.5061/dryad.s0p50 Isn't the data there? Did the authors try to contact Mike Eisen?
2) We thank the authors for providing information about their temperature control setup, which was completely missing from the first version of the manuscript. How did they ensure that there is no temperature gradient within the sample holder? Do they have a validation of the temperature the flies feel based on, for example, the timing of the early nuclear divisions?
While we are aware that MSB typically only allows for one round of revisions, we hope that the Editor will agree to let the authors make these minor revisions. We will be happy to look at a next version of the manuscript responding to the remaining issues.
The authors have attempted to address my comments and those of the other reviewers. For some comments, they have adequately addressed the issue, such as increasing the number of Xenopus replicates to obtain stronger statistics.
Other responses are not sufficient. 1, They tested GAPDH at 2 substrate concentrations to determine if the thermodynamic behavior is concentration dependent. They said they did titrations to ensure substrate was saturating. However nowhere in the manuscript did they present the data. Nor did they do so for the new enzyme they assayed, beta-galactosidase. This data should be added in supplement plus a description as to why it was done and what it means. I will not be the only reader who wonders about the simplicity of the results and they need to make an effort to show they did it; We are sorry that in our previous submission the supplementary figure demonstrating that GAPDH and beta-galactosidase were in the saturated regime were not highlighted clear enough (Appendix figures S4 in last submission).

31st May 2021 2nd Authors' Response to Reviewers
We have modified our text to more clearly reference these additional measurement. The modified text reads as follows: Old: "Interestingly, we find that GAPDH shows clearly non-linear behavior in the Arrhenius plot from 10 to 45 o C (Fig 4D, Appendix Figure S4A, Appendix Table  S5)." New, line 284: "Interestingly, we find that GAPDH shows clearly non-linear behavior in the Arrhenius plot from 10 to 45 o C (Fig 4D). When halving the substrate concentrations used we find similar kinetics, suggesting the enzyme is in the saturated regime (Appendix Figure S6A, Dataset EV6)." Additionally we have modified our text referring to β-galactosidase: Old: "Additionally we have assayed another common enzyme, Beta-galactosidase, monitoring conversion of ortho-Nitrophenyl-Beta-galactoside at 420 nm (Fig.  EV5, Appendix Figure S4B, Appendix Table S6, 7), where we find similar results to our GAPDH assay" New, line 287: "Additionally, we have assayed another common enzyme, β-galactosidase, monitoring the conversion of ortho-Nitrophenyl-β-galactoside at 420 nm (Fig.  EV5, Dataset EV7, 8). Also here, we find that the enzyme shows strong nonlinearity in the Arrhenius plot. We performed these experiments with saturating substrate concentrations (Appendix Figure S6B)."

They have STILL not fixed the Chong citation
This is embarrassing and we are very sorry for this repeated omission. We originally thought the reviewer was pointing out that there was a software error when linking citations. However, we realized that our citation format was off. We have updated the formatting for the Chong et al citations from : "Chong, J., Amourda, C., and Saunders, T.E. Temporal development of Drosophila embryos is highly robust across a wide temperature range. 11." To, line 845: "Chong J, Amourda C & Saunders TE (2018) Temporal development of Drosophila embryos is highly robust across a wide temperature range. J R Soc Interface 15: 20180304" 3. They uploaded the raw imaging data to two separate databases. One, cellimagelibrary.org has the files that can be run but no metadata is available for each movie. Thus I didnt know which temperature or replicate each specimen was being imaged. Deposited data MUST be properly annotated. The data on googledriv.com can only be downloaded to a local computer. The data is huge! I did not download it and so I cannot see whether those were properly annotated. But the data should be accessible and viewable on the cloud and not necessarily on a local computer During the last submission, we were still working with staff from the Cell Image Library to match the metadata with the supplied movies. Because the timing of the update was not under our control, we supplied the movies with matching metadata as a temporary solution via google drive. Since then the Cell Image Library data has been updated. Please refer to our updated data accibility links for our fly and frog movies:

Data Availability:
Fly developmental time-lapses: Cell Image Library server (http://cellimagelibrary.org/groups/53322) Frog Developmental Time-lapses: Cell Image Library server (http://cellimagelibrary.org/groups/54564) Modeling and analysis scripts: (https://github.com/wuhrlab/ArrheniusAndAnimalDevelopment) We have updated our frog videos to be clearer which embryo refers to which data in our datasets, to do so we start each video with a second long "key frame" that labels each embryo with a letter to reference them to specific embryos in datasets EV2 and EV4. Additionally we have updated all of our datasets with legends detailing their contents as well as where to find any relevent raw source data. An example for frog from Dataset EV4 is as follows: "Xenopus laevis developmental time data used for analyzing the Arrhenius equation in Frog. Individual sheets contain the temperature of the experiment (the row headed with 'temp', and sheet name) as well as score abbreviations and names (rows 1-2 respectively). Absolute time, time since T0 (accumulated), and time since last score (per-stage) are recorded at each score for each embryo replicate collected at the reported temperature.
Each embryo's data is associated with a movie located here (http://cellimagelibrary.org/groups/54564). Movies are labeled in the following format: Xenopus_experimentalTemperatureC_batchNumber. The experimentalTemperatureC correlates to a sheet name. The 'batchNumber' correlates to the run, where if multiple experiments were conducted at the same temperature then sheets are named as 'temperatureRNumber', where RNumber matches batchNumber. Specific embryo identifiers are located In column 'A' and correlate to the beginning frame of each relevant video. Each movie is given a CIL# which is recorded under "temp" on each sheet." As for our fly datasets we have updated them similarly to the following, an example from Dataset EV3: "Drosophila melanogaster developmental time data used for analyzing the Arrhenius equation in Fly. Individual sheets contain the temperature of the experiment (the row headed with 'temp', and sheet name) as well as score abbreviations and names (rows 1-2 respectively). Absolute time, time since T0 (accumulated), and time since last score (per-stage) are recorded at each score for each embryo replicate collected at the reported temperature Each embryo's data is associated with a movie located here (http://cellimagelibrary.org/groups/53322). Movies are labeled in the following format: Drosophila_experimentalTemperatureC_embryoID. The experimentalTemperatureC correlates to a sheet name. The 'embryoID' then correlates to the letters in column 'A' under 'Embryo Name'. Each movie is given a CIL# which is recorded under each associated embryo name in column A." Unfortunately, the Cell Imaging Library website does not offer high-quality previews for movies. To view our data in the original quality one will need to download it. 4. I did not make a note of the breadth of typos and various errors in the original manuscript but I certainly noticed them. The other reviewers made note of it though, and although the authors say they fixed the errors, if they fixed those, then they made perhaps even more. The manuscript is rife with errors. Sloppy is the word. For example, they sometimes write galactosidase (correct) and sometimes galactidase (not).
Thanks for pointing this out. Based on the reviewers' comments, we asked a professional science writer to edit the manuscript before this resubmission. 5. And by sloppy I am not simply saying they are sloppy with spelling English words. Appendix S4 in the rebuttal they write 0.0028um GAPDH. In the methods they write they used 0.0028mM, which is 2.8 uM. The whole seems like it was hastily put together without checking and rechecking for errors, which is an essential part of putting a scientific manuscript together. I worry that if a similar cavalier attitude was taken in doing the experiments and their analysis, then it too is filled with errors.
We apologize for the many typos. The reviewer is correct that we were indeed in time-trouble to meet the journal's resubmission deadline and did not spend as much time as we should to polish the manuscript. We have hopefully done a better job this time and additionally asked a professional science editor to rectify our mistakes. We are particularly sorry for accidentally using mM where we should have used uM.
Correction can be seen on line 597 of the main text: "0.0024 uM" Finally, the authors really do not address the non-physiological temperatures at which the nonlinear data reside with respect to the Arrhenius law. For the fly frog and enzyme experiments, the outliers from linearlity reside in the extreme temperatures above and below the "sustainable living" range for each animal, ie 30, 37 and 8 C are not sustainable for Drosophila. For the enzymes, taken from bacteria or animals that live at a constant 37C, the outliers sit at temperatures of 50, 45, 41C. In the discussion they need to put their claims of regions where development does not fit Arrhenius in this context. Acute temperature stress response utilizing specialized mechanisms may modulate the cell and molecular scale events occuring during development. I know they have a sentence buried amongst some nonsense about rate limiting biochemistry. But effort needs to be made to note the non-physiological range of these behaviors and what that implies. As they stated at one point in the rebuttal, they are more agnostic. That model might be the more correct one than the one they push, and so they should be as agnostic as possible since they have no experiments to support or refute that hypothesis.
We apologize for the confusing terminology used in our paper and less than optimal clarity of data labeling in the Arrhenius plot. In the revised version we have indicated the core temperature range (14.3 o C to 27 o C in drosophila, and 12.2 o C to 25.7 o C in frog) as the approximate linear range we use for a linear fit to deduce the apparent activation energies. In the previous submission, all data within this regime were labeled in blue in our Arrhenius plots and data-points outside this regime in red. However, the range at which embryos survive to the last stage scored in our assays ("First Breath" in fly and "Late Neurulation" in frog) is different. This viable temperature range is wider than this core temperature range (14.3 o C to 30.1 o C in fly and 12.2 o C to 28.5 o C in frog) and shows clear non-linearity. We added a separate Appendix Figure redoing the analysis for non-linearity for only this viable temperature range (Appendix Figure S4). To make this clearer to the readers we updated all Arrhenius plots in the paper so that blue data-points indicate the viable temperature range, while the red data-points indicate temperatures in which the embryos are viable for some stages of development but not at others. We also want to point out that the embryos ability to withstand very low or high temperatures even for only parts of developmental progression is very likely to be physiologically relevant for exothermic species like frog and flies as this could increase their chances to survive temporary temperature fluctuations. Old: "Both the frog and fly data exhibit wide core temperature regions that we approximate with a linear fit, between 14.3 and 27 o C in flies and 12.2 and 25.7 o C in frogs ( Fig. 2A, B, Fig. EV3). However, for each organism clear deviations are observed as temperatures near the limits of the viable range and outside the core temperatures (Fig. EV3C, D)." New, Line 140: "Both the frog and fly data exhibit wide core temperature regions that we approximate with a linear fit, between 14.3 and 27 o C in flies and 12.2 and 25.7 o C in frogs ( Fig. 2A, B, Fig. EV2). However, for each organism we observe clear deviations from linearity, particularly outside of these temperature ranges (Fig.  EV2) " Additionally, we have added Appendix Figure S4 to specifically show the non-linearity of the 'viable temperature range'. We still observe strong quadratic behavior, which shows that this behavior is not limited to 'non-physiological' or 'non-viable' temperatures, but rather is a characteristic throughout. We understand that confusion may arise due to our claims of linearity earlier in the paper. However, these claims were made because the data appeared strikingly linear in the Arrhenius plot, especially compared to the most extreme, non-viable temperatures.
We re-wrote the associated paragraph in the main text on line 212 to better explain our position: "These findings raise the question if the temperature region is also non-linear for the temperature range over which the embryos can develop to the last scored developmental event (14.3 to 30.1 o C in fly and 12.2 to 28.5 o C in frog). We performed BIC analysis for all developmental intervals in fly embryos and find this "viable regime" is clearly quadratic over most intervals (Appendix Figure  S4B, C). Although less conclusive, we find similar results when reanalyzing our frog data (Appendix Figure S4D, E). We always observe deviation to be downward concave, i.e. the rates at very low and very high temperatures are lower than predicted by the Arrhenius equation. Thus, while the Arrhenius equation is a good approximation for the temperature dependence of early fly and frog development, at temperature extremes, we see clear deviation. This observation supports our initial intuitions that Arrhenius cannot perfectly describe a complex system; although why it deviates and how it is still a fairly decent approximation remains to be answered.." The new Appendix Figure S4 is as follows: Appendix Figure S4: ANCOVA to compare activation energies and BIC to compare quadratic versus linear fit over the viable temperature range. A) p-values for fly developmental stages using ANCOVA calculated to determine the probability of observing difference between activation energies for every combination of fly developmental intervals shown in Fig. EV2C. Blue marks p-values above 5E-2, purple marks ≤5E-2, pink marks ≤1E-2, and red marks ≤1E-3. B) Linear (black blue) and quadratic (dashed red) fits calculated for an example fly developmental intervals from 14 th Cleavage to Beginning of Germband Retraction over the viable temperature range where embryos survive until First Breath (blue data, excluding red data). BIC was used to test model preference for a quadratic fit, reported in the top right as the log ratio likelihood for quadratic over linear for this temperature range. C) Shown is a heatmap reporting the natural log ratio likelihood of quadratic over linear fit preference for all developmental intervals in fly development over the viable temperature range. Intervals are marked with their beginning event on the X-axis, and their ending event on the Y-axis. Red represents a preference for quadratic while blue represents a preference for a linear fit. D) As (A) but for calculating frog p-values between slopes of developmental intervals shown in Fig.  EV2D. E) As (B) but for frog 3rd to 10th cleavage over the viable temperature range where embryos survive until End of Neurulation. F) As (C) but for all possible frog intervals.
Lastly, we have extended the discussion based on the reviewer's comment and thank him for the sentence: "Acute temperature stress response utilizing specialized mechanisms may modulate the cell and molecular scale events occurring during development.", which we have adapted to our manuscript. Old: "However, there remain several additional possibilities contributing to this highly complex behavior. For example embryos might activate entirely different pathways at certain temperatures e.g. via cold or heat stress. Additionally, single multi-step processes (e.g. transcription) might exhibit different rate-limiting steps at different temperatures due to the different underlying activation energies (Roe et al., 1985)." New, Line 334: "However, several other factors may also contribute to this behavior. Although we have observed nonlinearity over temperature ranges where morphology is normal and viability is high, it is possible that additional processes come into play at extreme temperatures. Acute temperature stress responses utilizing specialized mechanisms may modulate the cell and molecular scale events occurring during development. Embryos might activate entirely different pathways at more extreme, near non-viable, temperatures e.g. via cold or heat stress." In conclusion, the authors could have responded to my comments and those of my colleagues more to heart and really modified the manuscript to thoroughly incorporate the responses into the meat of the manuscript and thereby improve it. The present response is inadequate.
We are very sorry that we seem to have left this impression. We greatly appreciate the reviewers' constructive criticism and have tried our best to address their comments to improve the manuscript. We believe a track change comparison between of manuscripts from the different submissions reveals that we have drastically updated the manuscript and added substantial amounts of additional data and analysis. We believe both rounds of reviews have improved the manuscript and we hope that the additional improvements and clarifications put forward in this round are able to address the remaining concerns. I have one specific comment that requires further clarification: In Appendix Figure S1B, the EA for C-D (I think), suddenly drops to around ~55kJ/mol. Yet, this represents a very small time period of the total development of the Drosophila embryo. I'm still not convinced this observation of decreased EA has any specific physiological relevance. I don't expect a detailed analysis, but it would be good to comment on this more explicitly -currently, the authors only highlight the statistical significance (Fig. 2C). What (if any) biological relevance there is to this observation remains under-developed Thanks for pointing this out. We would like to point out the analysis of our frog data reveals several stages with much longer time-intervals to be statistically significant different (Appendix Figure S3 & S4). Further, when we reanalyze the data from Kuntz & Eisen (Fig. EV3) activation energies of larger regimes also appear to be significantly different. The physiological relevance is probably minimal if all developmental events with different activation energies would indeed be entirely sequential. For technical reasons, in this study we can only score sequential events. However, we are very puzzled by how embryos could deal with differential scaling with temperatures for processes that run in parallel. Interestingly, it has been previously been demonstrated that even different stages of the same cell cycle show different activation energies (PMID: 33278404). These events are temporally coupled closely making it likely that they occur at least somewhat in parallel. To discuss this, we have updated the discussion section of the manuscript as follows: Line 341: "One major question this study raises is how complex embryonic development can result in a canonically developed embryo if the different reactions required for faithful development proceed at different relative speeds at different temperatures. In our assays, we are only able to follow temporally sequential reactions, and one can argue that increasing or decreasing time spent at a particular event should not influence the success of development. However, development must be much more complex and hundreds or thousands of reactions and processes must occur in parallel, e.g. in different cell types developing at the same stage. Therefore, how can frog and fly embryos be viable over a ~15 o C temperature range wherein different developmental intervals' varying temperature sensitivity could possibly throw development out of balance? We envision two major possible developmental strategies to overcome this problem. Either all rate-limiting steps occurring in parallel at a given embryonic stage have evolved similar activation energies, or the embryos have developed checkpoints that assure a resynchronization of converging developmental processes over wide temperature ranges." Furthermore we have revised our original Appendix Figure S1 (now S3) for clarity and flow as the developmental intervals of interest was not legible (the significant score in D-E, not C-D). Additionally we have made the length of the interval as a portion of development more clear. The old and revised figures can be found as follows: Old: We therefore requested the data from the authors. They kindly provided tabulated data for his their developmental scorings for all investigated temperatures, which we reformatted and provide Dataset EV5. Using this data, we investigated if 1) their data supports different activation energies for different developmental intervals. 2) If the data supports a quadratic over a linear fit in the Arrhenius plot. These results are shown in figure EV3, found below. While the developmental scores used by Kuntz & Eisen were different from our own, the reanalysis further supports our main conclusions. Using the Kuntz & Eisen data still show significantly different activation energies and a quadratic fit in the Arrhenius plot is statistically preferred over a linear fit: data. We fit data for temperatures below 28.75 o Celcius (dashed red line i.e. core temperature range) via linear regression (solid blue line). Shown is the apparent activation energy plus/minus the confidence interval. Additionally, we fit the entire temperature range with quadratic (dashed blue line) and linear fits. BIC was also calculated and is shown here as the natural log ratio likelihood for quadratic over linear fit and displayed in black. C) Apparent activation energies over the core temperature regime are shown. Error bars show the 68% confidence intervals. Black braces point out example developmental intervals that have significantly different apparent activation energies. '**' p-value < 0.01, '***' p-value < 0.001. D) Shown are p-values between all developmental intervals, stage 1 and 2. Blue marks p-values above 5E-2, purple marks ≤5E-2, pink marks ≤1E-2, and red marks ≤1E-3.

E)
Shown are natural log ratio of likelihoods for quadratic over linear fits for all possible developmental intervals, marked by their starting and ending scores, using Kuntz & Eisen's data over all temperatures. Blue signifies a preference for linearity; red signifies a preference for quadratic behavior.
We have modified the manuscript to cite the added analysis lines 152 "In this respect our results differ from the uniform scaling proposed for fly development in a previous study (Kuntz & Eisen, 2014). However, when we reanalyzed the data that the authors kindly provided, we find that apparent activation energies between developmental intervals vary significantly (p-value = 1x10 -3 ) (Fig. EV3A-D, Dataset EV5)." and 317: "One striking finding of our study is that different developmental processes within the same embryo clearly scale differently with varying temperature i.e. the apparent activation energies for different developmental intervals can vary significantly. We reaffirmed this observation upon reanalyzing Kuntz & Eisen's 2014 data ( Figure EV3). Different temperature scaling has also previously been observed in component processes in presumed simpler processes such as cell cycle progression during the cleavage division in fly embryos (Falahati et al,  2020)." Different E a s imply different developmental scaling, rather than the uniform developmental scaling posited by Kuntz & Eisen. To resolve this discrepancy we revisited figure 3C in Kuntz & Eisen. At first glance the figure does suggest approximately even scaling across all temperatures for different stages of development. However when we replotted the data in a similar fashion fitting the data with linear fits, as well as slightly stretching the figure (so as to better see individual developmental events) we begin to see a few events which clearly do not share the same vertical lines as the 0 and 1 normalized reference events. The remade figure can be seen as follows, compared against the original Eisen figure 3C: intuitively suggests that different developmental periods of fly development scale differently with temperature. This intuition is supported by the F-test analysis in the Arrhenius plots. We find some significant different slopes i.e. activation energies (Fig.  EV3 B, C, D).
2) We thank the authors for providing information about their temperature control setup, which was completely missing from the first version of the manuscript. How did they ensure that there is no temperature gradient within the sample holder? Do they have a validation of the temperature the flies feel based on, for example, the timing of the early nuclear divisions?
Thanks for pointing out that we needed to provide more detail on our temperature measuremnts and control . To this end we added additional information in our Materials and Methods for both fly and frog experiments. For fly, found on line 377 in the main text: "To record and validate temperatures for the fly embryo data collections, temperatures were taken next to each a microscope's sample holders (~1 inch from the embryo) using either an Elitech RC-5 (standard error +/-0.5 o C), Dickson TH300 (standard error, +/-1.0 o C), or Fluke 54 II B (standard error, +/-0.3 o C) thermometer. We worked with two microscopes in the room. When comparing the temperatures between microscopes they never differed more than a degree suggesting the temperature in the room was very homogenous." When comparing the temperatures at the two microscopes they never differed more than a degree at a minimum separation of 3 feet. Making the maximum possible gradient between thermometer and embryo likely ~1/36th of a degree between the embryo and adjacent temperature recorder.
For our frog acquisitions we added a detailed description of our image capturing and temperature control to the Material and Methods, on line 499 of our main text.
"To validate the temperature experienced by our frog embryos, we used an aquatic thermometer (QTI, DTU6024C-004-C, tolerance provided by the manufacturer +/-0.1 o C) that measured the temperature of the 0.1 MMR the embryos were raised. Additionally, we recorded the temperature of the surrounding air in the aforementioned temperature controlled chamber with an Elitech RC-5 temperature recorder (+/-0.5 °C). We observed that these readings agreed with each other within the standard errors of the thermometers. Each experiment was performed after allowing the controlled temperature chamber to equilibrate for several hours. For the analysis throughout the paper we used the measured ambient temperature at the microscope stage, directly adjacent to the frog embryos.." Representative temperature recording comparisons for evaluating the frog set-up are shown in the table below.
1.a. How was the sample size chosen to ensure adequate power to detect a pre-specified effect size?
1.b. For animal studies, include a statement about sample size estimate even if no statistical methods were used.
2. Describe inclusion/exclusion criteria if samples or animals were excluded from the analysis. Were the criteria preestablished?
3. Were any steps taken to minimize the effects of subjective bias when allocating animals/samples to treatment (e.g. randomization procedure)? If yes, please describe.
For animal studies, include a statement about randomization even if no randomization was used.
4.a. Were any steps taken to minimize the effects of subjective bias during group allocation or/and when assessing results (e.g. blinding of the investigator)? If yes please describe. Do the data meet the assumptions of the tests (e.g., normal distribution)? Describe any methods used to assess it.
Is there an estimate of variation within each group of data?

Reporting Checklist For Life Sciences Articles (Rev. June 2017)
This checklist is used to ensure good reporting standards and to improve the reproducibility of published results. These guidelines are consistent with the Principles and Guidelines for Reporting Preclinical Research issued by the NIH in 2014. Please follow the journal's authorship guidelines in preparing your manuscript.

B-Statistics and general methods
the assay(s) and method(s) used to carry out the reported observations and measurements an explicit mention of the biological and chemical entity(ies) that are being measured. an explicit mention of the biological and chemical entity(ies) that are altered/varied/perturbed in a controlled manner.
a statement of how many times the experiment shown was independently replicated in the laboratory.
Any descriptions too long for the figure legend should be included in the methods section and/or with the source data.
In the pink boxes below, please ensure that the answers to the following questions are reported in the manuscript itself. Every question should be answered. If the question is not relevant to your research, please write NA (non applicable). We encourage you to include a specific subsection in the methods section for statistics, reagents, animal models and human subjects.
definitions of statistical methods and measures: a description of the sample collection allowing the reader to understand whether the samples represent technical or biological replicates (including how many animals, litters, cultures, etc.).
The data shown in figures should satisfy the following conditions: Source Data should be included to report the data underlying graphs. Please follow the guidelines set out in the author ship guidelines on Data Presentation.
Please fill out these boxes ê (Do not worry if you cannot see all your text once you press return) a specification of the experimental system investigated (eg cell line, species name).
For the conditions where we claim significantly different activation energies we have a power above 80% with our sample sizes of 40 graphs include clearly labeled error bars for independent experiments and sample sizes. Unless justified, error bars should not be shown for technical replicates. if n< 5, the individual data points from each experiment should be plotted and any statistical test employed should be justified the exact sample size (n) for each experimental group/condition, given as a number, not a range; Each figure caption should contain the following information, for each panel where they are relevant:

Data
the data were obtained and processed according to the field's best practice and are presented to reflect the results of the experiments in an accurate and unbiased manner. figure panels include only data points, measurements or observations that can be compared to each other in a scientifically meaningful way.

N/A
The analysis used throughout our paper accounts any differences in variation between compared groups.

N/A
Xenopus laevis and drosophila melanogaster embryo age is indicated in each experiment. We used wildtype Xenopus, whose parents were obtained from Nasco. Drosophila melanogaster embryos were klarsicht mutants.
The Xenopus experiments were performed under IACUC protocol 2070-19, approved by Princeton University Institutional Animal Care and Use Committee. We have added a "Data Availability" section in our materials and methods.
We have doposited the videos of embryonic development to the ASCB imaging server, analyzis code to github, and make scored timelapses for enzymes and embryos available as supplemental tables. For additional information please see Materials and Methods.