Building a minimal and generalizable model of transcription factor–based biosensors: Showcasing flavonoids

Abstract Progress in synthetic biology tools has transformed the way we engineer living cells. Applications of circuit design have reached a new level, offering solutions for metabolic engineering challenges that include developing screening approaches for libraries of pathway variants. The use of transcription‐factor‐based biosensors for screening has shown promising results, but the quantitative relationship between the sensors and the sensed molecules still needs more rational understanding. Herein, we have successfully developed a novel biosensor to detect pinocembrin based on a transcriptional regulator. The FdeR transcription factor (TF), known to respond to naringenin, was combined with a fluorescent reporter protein. By varying the copy number of its plasmid and the concentration of the biosensor TF through a combinatorial library, different responses have been recorded and modeled. The fitted model provides a tool to understand the impact of these parameters on the biosensor behavior in terms of dose–response and time curves and offers guidelines to build constructs oriented to increased sensitivity and or ability of linear detection at higher titers. Our model, the first to explicitly take into account the impact of plasmid copy number on biosensor sensitivity using Hill‐based formalism, is able to explain uncharacterized systems without extensive knowledge of the properties of the TF. Moreover, it can be used to model the response of the biosensor to different compounds (here naringenin and pinocembrin) with minimal parameter refitting.

conversion (Xu et al., 2014). To overcome the limited number of naturally occurring metabolite-responsive TFs available, progress has been made through their heterologous use, which includes transplantation of prokaryotic transcriptional activators into the eukaryotic chassis (Skjoedt et al., 2016). Additionally, it was recently shown that it is possible to expand the detection abilities by adding one or more enzymatic steps to transform a nondetectable compound into a detectable one (Delépine, Libis, Carbonell, & Faulon, 2016;. This latest tool considerably expands the scope of chemicals that can be sensed via transcriptional regulators. One of the interesting metabolic pathways implemented with relative success is the flavonoid pathway (Fehér et al., 2014). The industrial demand for some flavonoids is increasing, and among the top promising chemicals is (2S)-pinocembrin, which is a plant secondary metabolite and the main starting point for the synthesis of other flavonoid molecules. This compound has a broad range of interesting characteristics such as antioxidant (Rasul et al., 2013), antibacterial (Weston, Mitchell, & Allen, 1999), antifungal (Peng et al., 2012), inhibitor of atherosclerosis (Yang et al., 2013), and neuroprotection in neurodegenerative diseases (Liu et al., 2012;Liu, Gao, Yang, & Du, 2008). To produce pinocembrin from glucose, four heterologous genes have to be implemented in Escherichea coli . First, phenylalanine ammonia lyase converts phenylalanine into cinnamic acid, which is then converted by coumarate-CoA ligase into cinnamoyl-CoA. Then, chalcone synthase condensates cinnamoyl-CoA and three molecules of malonyl-CoA to produce pinocembrin chalcone, which will be then converted into pinocembrin through chalcone isomerase ( Figure 1). As of today, pinocembrin is produced at a low titer from glucose (Wu, Du, Zhou, & Chen, 2013; only 40 mg/L), and work still needs to be carried out to increase productivity, most likely through the building of combinatorial libraries with various enzyme sequences and regulatory elements (promoters, ribosome binding sites [RBSs]). Such libraries could be quickly screened with a pinocembrin biosensor, where the level of the reporter gene (i.e., fluorescence) is proportional to the pinocembrin titer.
Chemical structure similarity considerations of detectable flavonoids led us to choose as our candidate FdeR TF, a transcriptional activator-based biosensor from Herbaspirillum seropedicae SmR1, shown to respond to naringenin (Marin et al., 2013;Siedler, Stahlhut, Malla, Maury, & Neves, 2014). Here, we have focused on developing and modeling the FdeR TF to shed light on the way we could design TF-based biosensors to overcome issues of measurable quantification of metabolite production and to monitor an adequate sensing response. We have built different constructs varying most notably in plasmid copy number, changing both the concentration of the TF and the number of binding sites for the activated complex, and modeled the impact of this varying number on the sensitivity of the response.
We provide a modeling strategy based on Hill functions to understand the impact of plasmid copy number and compound binding affinity to FdeR on our biosensor behavior, for both the dose-response and time curves, for a TF that has not been well characterized before.

| Plasmids and strains
All plasmids and strains used in this study are listed in the Supporting Information

| Pinocembrin sensor library construction
Sixteen pinocembrin biosensors were constructed by varying the plasmid copy number and the RBS strength.
First, primers P1 and P2 were used to amplify the plasmid backbones with different copy numbers from pACYCDuet-1, pCDFDuet-1, pETDuet-1, and pRSFDuet-1 (Supporting Information   Table III). Second, the red fluorescent protein (RFP) under the control of the responsive promoter to pinocembrin was amplified from the plasmid pV20 (Supporting Information Table I) using the primers P3 and P4. Third, the FdeR TF with its constitutive promoter J23100 was amplified also from the plasmid pV20 with the four couples of primers P5/P9, P6/P9, P7/P9, and P8/P9 to generate the FdeR TRABELSI ET AL.

| 2293
F I G U R E 1 Pinocembrin biosynthesis pathway. PAL, TAL 4CL, CHS, and CHI refer to phenylalanine ammonia lyase, tyrosine ammonia lyase, coumarate-CoA ligase, chalcone synthase, and chalcone isomerase, respectively [Color figure can be viewed at wileyonlinelibrary.com] key intermediate and then promoting its synthesis or its downstream fragment with an RBS sequence 1, 2, 3, and 4, respectively. Finally, the 16 possible combinations were assembled in one step by Gibson assembly and confirmed by colonies PCR and sequencing ( Figure 2

| Biosensor dose-response characterization
For each biosensor strain, an isolated colony of BL21(DE3) harboring the appropriate plasmid was inoculated in 2 ml luria broth media (LB) containing the appropriate antibiotics and grown overnight at 37°C. The culture was then diluted 1:100 in fresh LB containing the appropriate antibiotics as well as different concentrations of pinocembrin, naringenin, or cinnamic acid (previously dissolved in ethanol) ranging from 1 to 500 µM. All the sensor cells were grown then for 24 hr with agitation at 37°C in microplate reader BioTek.
Absorbance at 600 nm and fluorescence (Exc: 580 nm/Em: 610 nm) were measured. All experiments were repeated at least three times.

| Parameter fitting
All parameters that could be found in the literature are highlighted in Table 1.
The other parameters (n, K m , and K d single ) were fitted using the nls (nonlinear square, from Package stats version 3.2.3) function using weighted least squares and the port algorithm (Dennis, Gay, & WalshWelsh, 1981), which allows for boundaries on the search space.  Table 1 and Supporting Information Table V.

| Sensitivity, fold change, and cooperativity of the different biosensors
To characterize the different biosensor dose-response curves, they were fitted to the following standard Hill function (Weiss, 1997): where I is the concentration of the considered inducer (in µM); K d is the concentration that allows for half-maximum induction (in µM as well), also termed IC 50 ; n is the Hill coefficient that characterizes the cooperativity of the induction system; and ratio is the dynamic range (in arbitrary units).

| Choice of the TF
Recently, Raman and colleagues were able to convert the intracellular presence of some flavonoids into a fitness advantage for the cell by combining the TtgR-responsive domain (a regulatory gene of the multidrug efflux pump operon, ttgABC) to a TolC membrane protein (an E. coli outer membrane protein) necessary for survival under selective conditions. The strategy was successful in the screening of targeted genome-wide mutagenesis for naringenin high-producing strains (Raman, Rogers, Taylor, & Church, 2014). It is very useful in evolution experiments looking to enrich the culture with evolved variants and counter-select the false positives but is not a first-choice strategy when planning to screen 2294 | have therefore performed a chemical structure similarity search in this family of chemicals. We have shown using the Tanimoto score that naringenin is the closest detectable compound to pinocembrin (see Section 2). We then decided to use the FdeR as a potential candidate to develop a pinocembrin biosensor (Table 2).

| Biosensor characterization
To benchmark our design, E. coli cells harboring the different constructs were grown for 24 hr in the absence and presence of increasing concentrations of pinocembrin or naringenin ranging from 1 to 500 µM, and red fluorescence was monitored in parallel with cell growth ( Figure 3a). As expected, the different biosensor constructs were active in E. coli in the presence of naringenin. More interestingly, the different constructs were able to detect pinocembrin, and most of them have shown a high expression level of RFP exceeding in all cases the level of expression in the presence of naringenin.
Moreover, FdeR appears to be more sensitive to pinocembrin than naringenin, as is evident from the steeper slope in Figure 3a. The results have shown that the minimal concentration of pinocembrin required to activate the TF ranges between 1 and 5 µM. The fold TRABELSI ET AL. Ratio between the binding constants of the inducer and the transcription factor Fitted on naringenin data n_tf 2 (dimensionless) The transcription factor forms dimers Naringenin dose-response reference change is also shown to reach 60 folds in construct 156 for instance. In some cases, we highlighted a decrease in the fluorescence when we exceed 300 µM, which is probably due to the toxicity of the compound.
This toxicity could also explain the difficulty in reaching high titer of pinocembrin in metabolic engineering experiments, where, as mentioned previously, the record is around 40 mg/L (Wu et al., 2013).
To validate this biosensor as a potential candidate for screening purposes, we tried to evaluate the specificity of FdeR. The sensor detects pinocembrin, but what about its biosynthesis intermediates?
For example, when two parameters are used to model a forward and a backward reaction, which is actually at equilibrium given the time scale considered, an infinite number of parameters, whose ratio is the equilibrium constant of the reaction, will fit the data.
The Hill class of models does not necessitate a priori knowledge of the exact interactions between the species involved, although knowledge of the broad behavior of the interactions is necessary. This model has been, for example, extended to take into account resource competition , model both the binding with the inducer and complex binding to the promoter in the Lux system (Zucca et al., 2012) or any switch-like behavior.
Therefore, we decided to extend the Hill model to account for a key tunable parameter in synthetic biology: plasmid copy number.
Our aim was to have a model with as little free parameters as possible that could account for this effect.

| Effects of plasmid copy number that we intend to model
As can be seen in Figure 3a (or the Supporting Information Figure 1), increasing the copy number leads to increased production, as expected, except for construct 457 (very high copy number), showing a decline in production after 100 µM concentration of pinocembrin or naringenin.
The constructs behave similarly for both compounds, although the biosensor is slightly more effective for pinocembrin detection than for naringenin detection, which is somewhat unexpected given that naringenin is its natural reported activator. Another interesting aspect is the effect of copy number on IC 50 (concentration at which the biosensor reaches half-maximum induction: it corresponds to K d in the standard Hill function). We can see that effect both in the figure where the induction starts at smaller concentrations of the inducer and in biosensor characterization (Supporting Information Table VI), where IC 50 diminishes with copy number of the construct. We therefore decided to take that effect into account in our modeling effort.

| Derivation of the dose-response model: Accounting for copy number
We aim to show here a dose-response model that can account for the effect of copy number on both pinocembrin-and naringeninresponding constructs. We need to take into account two effects of plasmid copy number.  (1) The number of binding sites for the TF-inducer complex increases proportionally to the plasmid copy number, meaning that intuitively, to reach half-maximum saturation, there needs to be that many more TF-inducer complexes.
(2) The TF is produced constitutively from the biosensor plasmid, so TF number scales with plasmid copy number.
We consider that all following processes are at equilibrium since chemical binding is a fast process compared with transcription and translation, and we are considering dose-response curves for the time being.

| Formation of the TF-inducer complex
We consider that the TF forms n tf multimers to derive our equations. According to the literature, FdeR forms dimers (Siedler et al., 2014), which means = n 2 tf will be used when simulating the data. Since the exact binding configuration with the inducers (naringenin and pinocembrin) is not known, we will start by considering the following equilibrium (Equation [3]). Other neglected cooperativity effects will be accounted for in the Hill cooperativity constant (Equations 4-6): Ignoring the order of binding, which is not important for the final equilibrium but only for the kinetics, not considered here, given the time scales of the considered processes, we have Equation (4), where  there might be cooperativity and therefore more than one binding site per plasmid). We propose the following modification to Equation (5), which accounts for the fact that to reach half-maximum saturation of a higher number of binding sites, the number of binding complexes also needs to be that much higher: We chose to represent both the best fit ( Figure 4a) and 100 simulations (Figure 4b), where parameters were randomly sampled from the estimated distribution of parameters (see Section 2 for more details). We can see when looking at the random parameters that there is some leeway in the estimation, allowing for a rather wide dose-response curve. However, the expected behavior is maintained, even accounting for uncertainty in the estimation of the parameters. We chose to use the same cooperativity constant n, as well as the same K , dsingle constant, which would be the IC 50 for a single plasmid, and hence its name. However, as mentioned in the data analysis section, the dynamic range does not scale proportionally with the plasmid copy number. For this reason, ratios varying from 0.14 to 1.76 were obtained and used in this study, instead of using a single parameter for this effect. This is due to a host of factors: higher plasmid copy number diverts more resources from the cell, the replication machinery is not the same for the different plasmids, which have different replication origins, and the cells do not divert resources to plasmids proportionally to their copies. Moreover, an interesting feature of the data is that production from the very high copy number construct (457) is initially higher than with the high copy number (357) until concentrations cross a threshold. We can imagine that the demand on the cell from our constructs becomes too high in the 457 construct, and the cell activates a "stress response." This is observed when using both compounds for induction.

| Model fitting of naringenin
We were interested to determine whether the model could reproduce the features observed in the naringenin data: globally lower fold change of induction than for the pinocembrin induction, but the same overall behavior on sensitivity. Our aim was to account for the compound change using only our K m parameter, which represents the ratio between the dissociation constants of The global behavior of the biosensor is respected for all sensors, meaning that the same model does apply to this data. The shift in dose response can be explained by the K m parameter, which shifts the curve toward less sensitivity by doubling IC 50 . This is confirmed by the data for 357 and 457 constructs, which are the constructs with the least variability on IC 50 estimation.
We can also observe that the dynamic range is slightly lower, meaning that our modification of the ratio parameter by the same correcting factor is justified (for naringenin concentrations up to 100 µM). This could be explained by an effect that is not taken into account in our model, such as higher load, or some different toxicity between pinocembrin and naringenin. K m naringenin ( ) is bigger than one, which means that the dissociation of FdeR dimer with naringenin is higher than the one with pinocembrin. In other words, at the same TF and inducer concentration, there is more TF bound with pinocembrin than would be with naringenin. This is surprising given the fact that FdeR was identified in the fde operon from Herbaspirillum seropedicae, which is involved and was identified for its implication in naringenin degradation. This means that we expected it to be evolved for naringenin detection, but that it detects pinocembrin at least as well.
All this indicates that our model, although very simple and based on broad knowledge of the sensor rather than precise chemical constant values, manages to successfully capture our system's behaviour. This time-course modeling partially allowed us to understand the impact of initial dilution on the biosensor's behavior and emphasized the need to wait for it to reach steady state for it to be fully functional and decipher between different inducer concentrations. The shortcomings of this time-course modeling confirm that although it is interesting to see the delay in response of the biosensor signal, modeling the doseresponse curve is more important to show characteristics of the biosensor, such as changes to the dose-response curve when used for screening pinocembrin-producing strains.

| Leveraging our model for biosensor design improvement
Having constructed a satisfying dose-response model, it becomes interesting to use it to make predictions for future improvements of our design. We therefore considered three parameters that synthetic biologists can tune and study their effect on half-maximum induction (IC 50 ), used as a proxy for sensitivity. A higher IC 50 means shifting the sensitivity of the biosensor toward higher concentrations and therefore can be used to screen higher producing strains. A lower IC 50 means shifting it toward lower concentrations and more sensitivity to trace amounts of pinocembrin. The three parameters whose effects we decided to study are the following: plasmid copy number, DNA and TF binding strength, and TF and inducer binding strength. Plasmid copy number can easily be tuned by choosing the replication origin of the plasmid, DNA-TF affinity can be modified either by random mutagenesis of the promoter or by protein engineering (and measured through gel retardation assays), and TF-inducer affinity can be tuned by protein engineering. In Figure 6, we represent fold change compared with current fitted constants for TF and inducer binding strength. DNA and TF dissociation constant being captured by our Hill equation, it is proportional to our K dsingle , constant, so the binding strength is proportional to the inverse of K dsingle , and we are also representing fold changes around this constant. The copy number, on the other end, is represented as the desired value for copy number, as that can be achieved by choosing a correct replication origin to achieve the desired copy number. We can see in Figure 6a that increasing the binding constant between TF and DNA or TF and the inducer has similar consequences: increasing it leads to lower IC 50 or higher sensitivity, whereas decreasing it leads to higher IC 50 , allowing one to detect higher titers of pinocembrin. This suggests that random mutagenesis at the promoter might be a better first approach to tune the biosensor's behavior to an experimentalist's needs, since it is easier to engineer rather than engineering the binding strength of the TF and its inducer, and both have similar consequences. Figure 6b, on the contrary, shows the impact of changing plasmid copy number or binding affinity of the TF for the inducer. As seen in our experimental data, increasing the copy number (which leads to higher expression) also increases sensitivity, allowing for better detection of the inducer but at lower concentrations. Reducing the copy number enables detection at higher titers, but reduces the fold change of the biosensor. On the contrary,

| DISCUSSION
The use of TF-based biosensors is expanding in many fields, ranging from environmental, biomedical to industrial biotechnology applications and more specifically as a fast and reliable screening tool to address the problems of high-throughput limits of the other approaches (Dietrich, McKee, & Keasling, 2010;Eggeling, Bott, & Marienhagen, 2015). Some successful attempts have been reported describing strategies leading to the fine-tuned response dynamics and dynamic ranges by engineering tunable biosensors (Chen, Xia, Lee, & Qian, 2017;Rogers et al., 2015).
TFs have a ligand-binding domain most likely to be promiscuous. In this study, we showcased the potential of chemical structure similarity scoring to select TF starting candidates to develop or engineer biosensors for small molecules. We have constructed a biosensor to detect pinocembrin with a fold change of around 60. FdeR appears unexpectedly to be more sensitive to pinocembrin than to naringenin, its natural effector, and has the required specificity to discriminate against the intermediates in the pinocembrin biosynthetic pathway.
Indeed, the first report of this TF in Marin et al. (2013) identifies FdeR as the TF responsible for the regulation of a naringenin degradation operon. However, our experiments prove that FdeR senses pinocembrin at least as well, suggesting that this operon could also be involved in pinocembrin degradation. Two possible degradations pathways were identified by Marin et al. (2013) based on in silico analysis of the enzymes found in the operon. One started by opening the C-ring of naringenin, whereas the other opened the A-ring. In both cases, since pinocembrin differs from naringenin by a group on the B-ring, it could also be degraded by these pathways. A recent study performed by Zhang et al. (2017) was also successful in generating a new biosensor for specific lactam compounds using a chemoinformatics approach inspired by small-molecule drug discovery. Methodologies based on the structural analysis of compounds could offer an alternative to some heavy strategies based on the design of new TFs for nonnatural ligands (Looger, Dwyer, Smith, & Hellinga, 2003;Mandell & Kortemme, 2009;Schallmey, Frunzke, Eggeling, & Marienhagen, 2014) or by random mutagenesis (Tang, Fazelinia, & Cirino, 2008;Tang et al., 2013;Tang & Cirino, 2011).
To extend our knowledge of the rules governing the sensitivity, specificity, and dose responses of biosensors, we have also built different sensor constructs varying the copy number and the RBS to scan different response patterns that could serve as a template for modeling and to help extract rational understanding of the biosensor behavior.
Although simple, the model developed in this paper allows us to explain the behavior of our biosensor to both naringenin and pinocembrin with a single parameter that accounts for the binding variability between these two compounds and the TF. It also TRABELSI ET AL.

| 2301
F I G U R E 6 Effect on biosensor sensitivity of varying copy numbers, DNA, and transcription factor (TF) binding affinities or transcription factor and inducer binding affinities. Half-maximum induction (IC 50 ), used as a proxy for sensitivity, is represented in colors ranging from white (low IC 50 , high sensitivity) to dark blue (high IC 50 , low sensitivity) on a log scale. Binding constants are represented as fold-change compared with current fitted constants. (a) Comparison of the effect of changing TF and DNA binding constants and TF and inducer binding constant. (b) Comparison of the effect of changing plasmid copy number and TF and inducer binding constant [Color figure can be viewed at wileyonlinelibrary.com] accounts for variations of copy number on the sensitivity of the biosensor starting from a simple idea: if there are more binding sites, there is a need for proportionally more activators to reach halfmaximum saturation. This is a simple but useful addition to the synthetic biology modeler's toolbox when working on poorly characterized systems where more robust modeling approaches, such as mechanistic or statistical modeling, are not possible to use.
Our model allows us to not only describe trends but also quantitatively correct values.
An interesting effect we managed to capture is the effect of copy number on IC 50 . This effect was already observed in a previous work of Zucca et al. (2012) although the authors did not investigate the link between copy number and IC 50 . Although they have an IC 50 that increases with copy number (although the relationship is not linear), the way they model their binding renders a numerical comparison impossible.
In the present paper, we have two different effects when increasing copy number: we increase the number of binding sites (increasing IC 50 ) but we also increase the number of available TFs, allowing for more binding even with less inducer, thereby reducing the IC 50 . According to our model, if the TF concentration was not increasing, we would also see a reduced sensitivity, as found in Zucca et al., which confirms our biosensor design idea.
As we have seen, the time evolution model is not fully satisfying.
A few strategies could help make it closer to the data, but they all present the disadvantage of adding new free parameters: adding a lag time for protein production as the introduced dilution does not seem to be enough and add some toxicity or load effect when copy number, TFs, and inducers are in too great numbers. These were not implemented as our aim was to present a model with a minimal set of parameters that explained the data well enough.
Another interesting feature of our model is to suggest further modifications of our design depending on the desired application: increasing its sensitivity, its dynamic range, or being able to sense higher titers of pinocembrin, by capturing the effects of changing copy number, DNA-TF binding affinity, or TF-inducer binding affinity.
As a conclusion, we have presented a simple model with a minimal number of parameters that allows us to capture the effects of both copy number and inducer variations on our biosensors' behaviors and most notably on sensitivity, which are effects that have not been addressed as such and especially never with such a simple formalism. This model, based on a simple Hill equation, has the advantage of being very versatile and easy to use on previously uncharacterized systems.
The development of the pinocembrin biosensor, its modeling, and understanding its behavior open doors to generate more transcriptionfactor-based biosensors to meet the increasing demands of screening and dynamically regulating metabolic pathways in industrial strains.