Microbial lag calculator: A shiny‐based application and an R package for calculating the duration of microbial lag phase

The duration of lag phase can be used as an organismal fitness marker; however, it is often underreported as its estimation may be challenging and method and parameters dependent. Moreover, there are no publicly available tools to calculate lag duration by different methods. We developed a shiny‐based web application (https://microbialgrowth.shinyapps.io/lag_calulator/) where the lag duration can be calculated based on the user‐specified growth curve data, and for various explicitly specified methods, parameters and data preprocessing techniques. Additionally, we release an R package ‘miLAG’ that can be further customised and developed. We also describe in short the assumptions, advantages and disadvantages of the most popular lag calculation methods and propose a decision tree to choose a method most suited to one's data. Finally, we show some working examples of how to calculate lag duration using our shiny server.

density measurements (spectrophotometry) taken in time intervals.
However, one of limitations of spectrophotometry is the fact that optical density is not a perfect measure of cell count (discussed by Rolfe et al. (2012) and Swinnen et al. (2004)).
In order to know when exactly cells started duplicating (i.e. the end of the lag phase), we would need to continuously monitor the number of cells.However, spectrophotometry measurements not only are taken in intervals but also may be inaccurate if the cells change their mass or shape, or if initial inoculum size is below the detection level (Rolfe et al., 2012).In such cases, some assumptions are needed to estimate the lag phase duration.If one assumes that there is no population growth during the lag phase and then cells start synchronically dividing at a constant growth rate, the end of the lag phase can be calculated as the intersection between the tangent line to the point of maximum growth rate and the y = log(N0) line, where N0 is the inoculation density (hereinafter 'tangent method' (Bertrand, 2019); e.g.Cerulus et al., 2018;Jomdecha & Prateepasen, 2011;Valík et al., 2021).This is in fact the most frequently used method of calculating the lag duration.
However, there are also other methods, for example, defining the end of lag as the point of the growth curve where the second derivative of the population size in time is maximal (hereinafter 'max growth acceleration', e.g.Buchanan & Cygnarowicz, 1990;Liu et al., 2021), determining when the population size or biomass increased from the initial value by some predefined threshold (minimal detectable increase, hereinafter 'biomass increase', e.g.Opalek et al., 2022) or fitting experimental data to a mathematical model (hereinafter 'parameter fitting to a model', e.g.Reding-Roman et al., 2017).The summary of the most popular methods of calculating the lag duration, the assumptions underlying each of the methods as well as possible challenges related to each method are given in Table 1.
Mathematical models can be used to overcome some methodological limitations.Various models have been proposed to account for the lag phase (Swinnen et al., 2004) and there are tools that use some of these models to estimate the lag duration from the experimental data (e.g.R package nlsMicrobio; Baty & Delignette-Muller, 2013).However, these tools focus on fitting entire growth curve to a model rather than focussing on the lags and they require a good knowledge of the models and R programming skills, which may make them difficult to use and hard to decide if the method is appropriate to one's data.Finally, as discussed in Baty and Delignette-Muller (2004), the data quality impacts the lag duration estimation to a higher extent than the choice of a model.Although Baty and Delignette-Muller (2004) investigated the insufficient number of data points as a potential problem in lag duration calculation, experimental biologists may face other problems with the data quality such as noisiness or growth curve shape that deviates from mathematical models (Heinz et al., 2019).
Interestingly, technicalities related to dealing with such 'unideal' data tend to be omitted in methodologies described by both empirical and theoretical studies.

| Experimental growth curves
The empirical growth curves used as example data within the shinybased application were obtained within Opalek et al. (2022).See Data S1 for details.

| Lag duration calculation methods
The mathematical formulation of each of the method is presented in Data S1.

| THE L AG PHA S E DUR ATI ON C ALCUL ATOR
We created an R package 'miLAG' (https:// cran.r-proje ct.org/ web/ packa ges/ miLAG/ index.html) as well as a shiny-based application (freely available under the following address: https:// micro bialg rowth.shiny apps.io/ lag_ calul ator/ ) which allows to calculate lag duration by different methods and compare the results.The tool is designed to automate the lag duration calculation process and it does not require any programming, nor mathematical modelling skills.Moreover, it allows parameter adjustments and data preprocessing.The application code as well as the R package are deposited on GitHub so that they can be further developed and customised (https:// github.com/ bogna bogna bogna/ micro bial_ lag_ calul ator).

| How to choose the lag calculation method best suited to one's data set?
The most frequently used methods to calculate lags which have been implemented in our R package and the shiny application are described in Table 1.
See Data S1 for the formulation of each model.Baty and Delignette-Muller (2004), the frequency of measurements can strongly influence the lag phase duration estimates.We recommend taking measurements with maximal 0.5 h intervals, and more frequently if one expects untypically shaped growth curves.We also highlight the importance of correct calculation of N0 (the initial number of alive cells, capable of proliferating) and of that number being above the detection limit.If N0 is below the detection level, then we are likely to overestimate the lag duration, because the first signals of growth will also be below the detection level.To overcome this problem, one can assume a certain growth curve shape below the detection limit as done in Pierantoni et al. (2019) and apply model fitting to estimate the lag duration.Additionally, if N0 cannot be measured with high confidence (e.g. because of some dead or senescent cells being a part of TA B L E 1 The summary of the lag phase length estimation methods.

Examples of experimental studies using the method biomass increase
The first time point where the population size increased by a certain value from the beginning of the growth The population does not increase its size during the lag and then starts growing with any growth rate that may be variable or hard to measure.The increase by the threshold value is the minimal increase possible to detect with high confidence Choice of the threshold value is arbitrary (not 'biologically relevant') )/r where r and q 0 are explicit model parameters (r is the maximal growth rate and q 0 represents the theoretical physiological state of the inoculum) r and q 0 are estimated from the growth curve data using parameter fitting procedures such as for example nonlinear least squares estimation The population size grows according to the logistic model adjusted by function: = q(t) where q(t) = q 0 e vt describes the concentration of some critical substance.The adjustment function slows down the initial growth but it does not pause it during the lag time as assumed by other methods The lag duration fitted to the data depends on many technical parameters of the fitting algorithms, or it may not be found if the fitting algorithms fails to converge.Finally, the lag understood by Baranyi does not mean the time when cells do not divide, but the time when they divide slower while adjusting to the new media the inoculum), one can use the biomass increase method to estimate population lag duration.
Although the biomass increase method is simplistic, we suggest to use it if the growth curve greatly deviates from model shape or when the growth curve cannot be corrected for blanks or dead cells (i.e. if a fraction of the population size accounts for dead or senescent cells, as e.g. in Opalek et al. (2022)).In this case, all other methods will not work correctly, because their assumptions are violated.Note that an important drawback of this method is that the chosen threshold value is arbitrary and may have no clear biological interpretation.
the max growth acceleration may be a good choice for growth curves with non-standard shapes.However, the calculation of the second derivative is very noise-sensitive; therefore, we recommend smoothening the growth curve before applying the max growth acceleration method.
The most popular tangent method works reasonably well even if the assumption of the constant growth rate (see Table 1) is violated.The challenging step may be the choice of how to draw the tangent line.If it is drawn to one point only (the point where the growth rate is maximal), there is a risk its slope will be underor overestimated if the data are noisy, that is, if there are some random fluctuations from one measurement to another, and an outlying point is chosen.This problem can be mitigated by drawing the regression line around points in the exponential phase.
However, it may not be apparent in which time range the population grows exponentially.In fact, in order to know where the exponential phase starts one needs to know where the lag phase finishes, which brings us back to the original problem.Thus, the selection of data points in exponential phase often requires some manual inspection or additional assumptions.Within our web tool, N points are chosen around the point with the maximal growth rate, where N is a user-specified parameter.Note that the tangent method requires the initial number of cells capable of proliferating (N0) to be determined with high confidence (Heinz et al., 2019;Opalek et al., 2022).

| The parameter fitting to the logistic/ Baranyi model
The performance of this method depends on the shape of the growth curve, and it should be used for curves that can be described with a logistic or Baranyi model.If the growth curve highly deviates from the standard shape, the fitting may not converge to any solution.
This problem can be fixed by finding a more suitable optimisation algorithm, initial parameter values or data pre-processing.These options are available within our web tool.
Finally, based on the points discussed above, we propose a decision tree to facilitate the choice of lag calculation method (Figure 1a).We believe that it will be useful both for planning and adjusting future experiments as well as analysing already existing data.
Altogether, we suggest trying to estimate lag duration by parameter fitting to the logistic model in the first place.This method is the most robust, it captures whole growth dynamics and because of that, it mitigates technical limitations (such as a device's detection limits).
The biomass increase method is the least dependent on any assumptions and the only one that is not affected by the blank correction or existence of dead cells in the culture.Therefore, we recommend it if the other methods cannot be applied.Additionally, we encourage to use multiple methods and to investigate possible inconsistencies between results they give.

| Working example
A working example of the usage of our web application is shown within the app and with a set of growth curves generated by growing laboratory Saccharomyces cerevisiae for 24 h with the optical density measurements (λ = 600 nm) taken every 0.5 h. to the application in csv or txt formats, and column and decimal separators need to be specified.If the data it uploaded correctly, it will be displayed, otherwise an appropriate error message will be shown.
Moreover, if multiple growth curves are provided, the user may choose to either treat them as technical replicates and calculate the lag duration based on the averaged growth curve, or to treat them as independent curves and calculate the lag duration for each of the curves separately.In the 'Pre-process the growth curve' tab, the data can be smoothened or cut at some point to reduce noisiness.Cutting may be especially helpful when using the model-fitting methods (to avoid fitting to stationary phase instead of the lag and exponential), and smoothening is recommended when using the max growth acceleration method.Next the lag phase can be estimated with one of the methods: (1) biomass increase, (2) max growth acceleration, (3) tangent or (4) parameter fitting to the model.We encourage to use our decision tree to choose method best suited to one's dataset, as well as to visualise and compare results acquired by few methods using our web app.Each of the lag estimation methods can be customised by varying parameters values.For example, if one decides to calculate lag phase by tangent method, there is a choice whether to draw a tangent line to a single point or to N points around the point with the maximal growth rate.A more detailed manual can be found here: https:// github.com/ bogna bogna bogna/ micro bial_ lag_ calcu lator .

| CON CLUS ION
This manuscript presents two practical solutions to facilitate lag phase duration estimation.First, we designed a decision tree which aims to facilitate selection of the most appropriate method for one's data set.Second, we developed an R package and a web application which is a user-friendly platform that enables estimation of the lag phase duration using the methods discussed in this study.They F I G U R E 1 (a) Decision tree to facilitate the choice of appropriate lag calculation method.(b) Print screen from web server MICROBIAL LAG PHASE DURATION CALCULATOR where lag phase duration can be calculated for a user-specified set of growth curves, and by any of the methods described in Table 1.
Opalek et al. (2022) max growth acceleration The time point at which the second derivative of population size in time is maximal The population does not increase its size during the lag (or it increases very slowly) and then starts growing with a decreasing growth rate Noisy data (with random fluctuations from one measurement to another) may affect the detection of such point Liu et al. (2021) tangent methodThe time point at which the initial population size line intersects with the line tangent to the growth curve at the maximal growth rate (on log.Scale) The population size does not increase its size the lag (or it increases very slowly) and then starts growing exponentially with a constant growth rateIf the growth rate varies, it may be challenging to find the 'real' maximal growth rate.Moreover, the initial population size needs to be determined with high confidenceValík et al. (2021) of the lag phase duration estimated by different algorithms (i.e.combinations of methods, parameters and data preprocessing techniques) and to increase the transparency of the selected methods and parameters which currently tend to be underreported.We perceive it as an initial point to further improvements made by the scientific community so that any new potential challenges can be solved in a reproducible way.AUTH O R CO NTR I B UTI O N SMonika Opalek, Bogna J. Smug, Dominika Wloch-Salamon were involved in conceptualization.Bogna J. Smug was involved in application development Bogna J. Smug and Maks Necki were involved in package development Monika Opalek, Bogna J. Smug were involved in writing-original draft preparation; Dominika Wloch-Salamon.and Bogna J. Smug were involved in writing-reviewing.
The input data table needs to contain the following columns: 'time' and 'biomass' (i.e.population size measured by either CFU/mL or OD).Additionally, the third column 'curve_id' is needed if the table contains multiple growth curves.Such data table can be manually inserted or uploaded