Combinatorial modulation of initial codons for improved zeaxanthin synthetic pathway efficiency in Escherichia coli

Abstract A balanced and optimized metabolic pathway is the basis for efficient production of a target metabolite. Traditional strategies mostly involve the manipulation of promoters or ribosome‐binding sites, which can encompass long sequences and can be complex to operate. In this work, we found that by changing only the three nucleotides of the initiation codons, expression libraries of reporter proteins RFP, GFP, and lacZ with a large dynamic range and evenly distributed expression levels could be established in Escherichia coli (E. coli). Thus, a novel strategy that uses combinatorial modulation of initial codons (CMIC) was developed for metabolic pathway optimization and applied to the three genes crtZ, crtY, and crtI of the zeaxanthin synthesis pathway in E. coli. The initial codons of these genes were changed to random nucleotides NNN, and the gene cassettes were assembled into vectors via an optimized strategy based on type II restriction enzymes. With minimal labor time, a combinatorial library was obtained containing strains with various zeaxanthin production levels, including a strain with a titer of 6.33 mg/L and specific production value of 1.24 mg/g DCW—a striking 10‐fold improvement over the starting strain. The results demonstrated that CMIC was a feasible technique for conveniently optimizing metabolic pathways. To our best knowledge, this is the first metabolic engineering strategy that relies on manipulating the initiation codons for pathway optimization in E. coli.

The initiation codon contains only three nucleotides, yet it significantly affects the gene expression strength at the translational level (Looman et al., 1987). ATG is the most common codon, but GTG, and more rarely TTG, is also employed by some genes (Aiba et al., 1984;Danchin, Guiso, Roy, & Ullmann, 1984). It was found that GTG has a lower translation initiation efficiency than ATG, and sometimes, ATG was used to replace GTG to increase target gene expression (Reddy, Peterkofsky, & McKenney, 1985), which suggested that various codons among the exhaustive 64 combinations in an NNN library might lead to different initiation efficiency. Thus, it might be feasible to gradually modulate gene expression by changing the initiation codons. In this work, we found that the expression of reporter proteins RFP, GFP, and lacZ could be modulated by changing only the three nucleotides of their initiation codons. As intended, the expression libraries with genes initiated by random NNN codons indeed showed a large dynamic range and mostly evenly distributed expression levels. Due to the simplicity of manipulating only three or fewer nucleotides of the initiation codon, future methods using our approach might be much simpler than current strategies. Thus, a novel strategy of combinatorial modulation of initial codons (CMIC) was developed for metabolic pathway optimization in this work, which offers great flexibility at minimal costs of experimental materials and time.
In this work, the zeaxanthin synthesis pathway containing three gene products was optimized using CMIC to illustrate the application of this novel technique in E. coli.

| Strains, media, and culture conditions
The strains and plasmids used in this study are listed in Table A1.
Plasmids were extracted using the Bacterial Genomic DNA Miniprep Kit (Axygen Biosciences). Polymerase chain reaction (PCR) products were digested with DpnI for 0.5 hr at 37°C and then purified using a SanPrep Gel Extraction Kit (Sangon Biotech). Plasmids and PCR products were sequenced using Sanger sequencing (GenScript Co., Ltd).

| Construction of the reporter expression libraries pNNNrfp, pNNNgfp, and pNNNlacZ
The primers pBBR1-rfp-F and pBBR1-rfp-R were used to amplify the backbone of pNNNrfp from plasmid pBBR1-rfp, and the rfp gene was cloned into the pNNNrfp plasmid with kanamycin-resistance cassette and pBBR1 replication origin, driven by the constitutive promoter BBa J23100 (Table A2). The initiation codon library NNN was embedded into the forward primer pBBR1-rfp-F. The resulting PCR product was digested with DpnI to eliminate the PCR template and self-ligated using Golden Gate DNA assembly (Hillson, Rosengarten, & Keasling, 2012).
The GFP expression library pNNNgfp contained quite different components from those used to construct pNNNrfp, to construct pNNNgfp, the backbone fragment containing a pMB1 origin of replication and an apramycin-resistance cassette was amplified from plasmid p034apr using the primer pair pMB1_apr_F and pMB1_ apr_R; the constitutive promoter P46 (Table A2) was amplified from   the strain M1-46 using the primers p46-up and GFP_RBS-down containing the randomized initiation codon NNN; the gfp gene was cloned from plasmid pQE60-gfp.
To construct the lacZ library pNNNlacZ, the backbone fragment comprising a pMB1 origin of replication and an apramycin-resistance cassette was amplified from plasmid p034apr using the primer pair pMB1_apr_F and pMB1_apr_R; the constitutive P46 promoter (Table   A2) was amplified from strain M1-46 using the primers p46-up and lacZ_RBS-down containing the randomized initiation codon NNN; the lacZ gene was cloned from E. coli MG1655 using the primers LacZ_F and LacZ_R. The resulting plasmid libraries pNNNrfp, pNN-Ngfp, and pNNNlacZ were transferred into E. coli DH5α (CWBIO) and selected overnight on the LB plates with the corresponding antibiotics. The resulting colonies were used for expression analysis.
All primers used in library construction are listed in Table A3 and the   sequencing primers in Table A4.

| Construction of pCrtZYIlib libraries for combinatorial modulation of initial codons
To construct the combinatorial modulated plasmid library, primers crt-F and crt-R were used to amplify the backbone of the pCrtZYIlib from the plasmid pYL-crtZYI with a pSC101 replication origin and a chloramphenicol-resistance cassette; promoter 36 was amplified from the strain M1-36 using primers P36-F and P36-R; and the crtZ gene was amplified from the plasmid pYL-crtZYI using primers crtZ-F and crtZ-R, with the randomized initiation codon NNN embedded in the primer crtZ-F. The crtY gene was amplified from the plasmid pYL-crtZYI using the NNN-containing primers crtY-F and crtY-R.
The crtI gene was amplified from the plasmid pYL-crtZYI using primers crtI-F and crtI-R with the same strategy. All the DNA fragments were digested using DpnI at 37°C for 0.5 hr and ligated using the Golden Gate method (Hillson et al., 2012). All primers used in library construction are listed in Table A3 and the sequencing primers in   Table A4.

| Zeaxanthin production levels of different clones from the CMIC library
All CMIC library colonies were scraped from the plates and pooled for plasmid DNA extraction. The resulting plasmid library was transferred into the chassis strain PHY01 and grown overnight on LB/ chloramphenicol plates. The resulting single colonies were picked from the plates and used to inoculate 15 mm × 100 mm tubes containing 3 ml of LB with 34 mg/L chloramphenicol and grown at 37°C and 250 rpm overnight. Aliquots comprising 100 μl of the resulting seed cultures were used to inoculate 100-ml flasks containing 10 ml LB + 2% (v/v) glycerol carotenoid fermentation medium, and grown aerobically at 30°C and 250 rpm for 48 hr. The resulting fermentation cultures were collected for measurement of carotenoid production and biomass (OD 600 nm ).

| RFP and GFP fluorescence measurement
The RFP-and GFP-expressing colonies were picked and transferred into 15 mm × 100 mm tubes containing 3 ml LB with 50 mg/L kanamycin and 50 mg/L apramycin, respectively, and grown at 37°C and 250 rpm overnight. The cultures were then inoculated into 15 mm × 100 mm tubes containing 3 ml LB with 50 mg/L with the same antibiotics and grown at 37°C and 250 rpm for 20 hr. Subsequently, 50 μl samples of each culture were transferred into individual wells of a 96-well plate and diluted four times with LB. The blank control was 200 µl of pure LB. The optical density at 600 nm (OD 600 nm ) was measured for determining the biomass concentration using an SP-723 spectrophotometer (Spectrum SHANGHAI). Fluorescence was measured at a gain of 60, using an excitation wavelength of 585 nm emission wavelength of 620 nm for RFP, 488 and 520 nm, respectively, for GFP, using an Infinite M200 Pro ELISA spectrometer (Tecan).

| Measurement of lacZ expression
A quantitative estimate of lacZ expression was obtained by measuring the β-galactosidase activity using ortho-nitrophenyl-β-Dgalactopyranoside (ONPG; Sigma) as a colorimetric substrate.

| Measurement of carotenoid production of clones from the CMIC library
An aliquot comprising 1 ml of each culture was harvested by centrifugation at 12,000 g for 5 min, suspended in 1 ml acetone, incubated at 55°C for 15 min in dark, and centrifuged at 12,000 g for 10 min. The acetone supernatants containing the carotenoids were transferred into fresh tubes for HPLC analysis. The HPLC was conducted on a Technologies Series 1200 system (Agilent) equipped with a VWD detector at 476 nm and a Symmetry C18 column (250 mm × 4.6 mm, 5 μm, Waters). A mixed gradient flow elution at a flow rate of 0.8 ml/min at 30°C containing mobile phase C (methanol, acetonitrile, and dichloromethane at 21:21:8, by volume) and phase D (10% methanol [v/v]) was employed to separate the analytes as described previously (Li et al., 2017). The dry cell weight (DCW) was calculated from the optical density at 600 nm using the empirical formula 1 OD 600 = 0.323 g DCW/L. The results are shown as the means ± SD of three repeated experiments.

| Total RNA extraction and qRT-PCR analysis
In order to investigate the relationship between non-ATG initial codons and the transcriptional expression levels of the key carotenoid synthetic pathway genes, two representative strains PHY01(pCrtZYI7) and PHY01(pCrtZYI9) and the control strain  Table A5, and 16S rRNA gene was used as the endogenous reference gene. The relative gene transcript level was calculated using the comparative critical threshold cycle method (2 −ΔΔC t ). The data were presented as mean ± SD (standard deviation) of triplicate experiments.

| Protein extraction and sample preparation
To collect total proteins for mass spectrometry analysis, the cell protein extraction procedure was as follows: (a) Prepare 150-ml fermentation medium of the E. coli PHY01(pCrtZYIATG), PHY01(pCrtZYI7), and PHY01(pCrtZYI9), and then the cells are harvested by centrifugation at 3,500 g for 10 min; (b) dissolve the cell pellet using 15 ml PBS buffer (pH 7.2) and repeat this step three times; (c) discard the supernatant and collect the pellet for the next step; (d) the collected pellet is dissolved using the 10 ml protein lysate (8 M urea, 1% DTT) and mixed well; (e) the suspension is crushed with the ultrasonic breaker (Scientz-IID) for 10 min under ice-bath condition; (f) the crushed suspension is centrifuged at 8,000 g for 15 min at 18°C; and (g) collect the supernatant into the 2-ml centrifugal tube and repeat this step once, and the samples are stored at −80°C for analysis or protein mass spectrometry.

| Statistical analysis and analytical techniques
The significance of differences between mean values of control and test samples was compared using Student's t test in the open-source software suite "R" (http://cran.r-proje ct.org/). Differences with p < .05 were regarded as obvious, p < .01 as significant, and p < .001 as very significant. The SDS-PAGE was run using the commercially purchased SurePageTM Gels (GenScript). The protein mass spectrometry was performed using the Orbitrap Fusion Lumos Tribrid Mass Spectrometer (LC-MS) (Thermo Fisher), and the methods could be referred to references (Espadas, Borras, Chiva, & Sabido, 2017;Li, Zhou, Xiao, Li, & Tian, 2018).

| The expression intensity of reporter protein expression libraries with randomized NNN initiation codons
To determine whether the expression of genes could be gradually modulated by changing their initiation codons and study the relationship between expression levels and initiation codons, reporter libraries individually expressing RFP, GFP, and lacZ with randomized NNN initiation codons were constructed in E. coli. The RBS core region of the pNNNrfp was AGGAG and the spacer sequence between the RBS and the initiation codon was ATATACAT (Figure 1a), which was reported to be essential for translation initiation (Chen, Bjerknes, Kumar, & Jay, 1994). Colonies with visually apparent diversity of expression levels were selected semi-randomly from the pNNNrfp library on LB plates and subjected to growth and fluorescence measurement. The RFP expression levels were determined by calculating the specific fluorescence per OD 600 nm .
The specific fluorescence of selected strains from the pNNNrfp library is shown in Figure 1b. While ATG still gave the strongest expression, the canonical initial codons of GTG and TTG had an expression strength 5% and 13% of that of ATG, respectively, which was comparable to previous reports (Beard & Spindler, 1996;Rhee, Yang, Lee, & Park, 2004;Stenström, Holmgren, & Isaksson, 2001;. It was interesting that some of the non-natural codons had relatively high expression levels, whereby CGC, TGG, AAA, and ACT had 26%-33% of the efficiency of ATG; GGC, ATT, and CAG initiated translation with an efficiency of 7.2%-21.6%; TTT, GTT, ACG, and TAA showed 0.1%-1.5% relative efficiency, while TAC and CAA had nondetectable fluorescence intensity. These results suggested that the randomized NNN initiation codon library had mostly evenly distributed expression levels. Moreover, even not counting the strains with nondetectable fluorescence, the library had a large dynamic range of around 3,000-fold. A photograph of the pNNNrfp library colonies on an LB plate is shown in Figure A1a Initiation codon   ATG  TTG  AAA  CAA  CAT  CTA  ACC  TCT  CTT  TGT  CTG  GTG  CAG  CGA  CCA  TGG  CGG  GAT  TGA  CCG  AGG  GTA  TCA  GCG   0 1 0,000 2 0,000 3 0,000 4 0,000 The β-Galactosidase activity (Miller units)

Initiation codon
resistance markers, and constitutive promoter were used for investigating the initiation codons in different genetic contexts. In addition, we have checked the sequences of the genes we have used in this research to see whether there was in-frame ATG, GTG, or TTG codons within the UTR region of the three reporter genes and the crt genes. As a result, none of ATG, GTG, and TTG codons were found.
Additionally, there are no internal nature initiation codons that could shift the initial codons. The RBS core region of pNNNgfp was AGGA, and spacer sequence was AACAGCT (Figure 1c)

| Development of a combinatorial modulation of initial codons technique for metabolic pathway optimization
Since random initiation codons could be employed to generate gene expression libraries, we used the CMIC technique as a simple and feasible method to modulate and optimize the expression of multiple genes simultaneously (Figure 2). Variably regulated genes were obtained by PCR amplification with extended primers, in which the initiation codon nucleotides NNN were embedded at the 5′ ends.
Specifically designed linkers for type II restriction enzymes were also embedded into the primers to ensure the assembly pattern and efficiency. Using the Golden Gate assembly method (Hillson et al., 2012), DNA cassettes containing the pathway genes were assembled into the vector backbone to form an expression plasmid. With the above-mentioned method, the frequency of the four bases in the initial codons of pCrtZYIlib libraries was obtained by sequencing the mixture of the library with normal Sanger sequencing method. As illustrated in Figure A3 (a, b, and c), all four bases were almost evenly represented in the initial coding region, which suggested a good coverage of the initial codon libraries.
Since each gene had a random initiation codon, a combinatorial plasmid library with variably regulated pathway genes was created, which was subsequently introduced into dedicated hosts to be screened and selected for strains carrying optimized pathways. The vector backbone was universal for all reactions, providing a stable plasmid backbone. By incorporating fixed linkers and regulatory elements into the primers for gene amplification, this method varies only the actual PCR primer sequences of pathway genes (Figure 2).

| Application of the CMIC technique to improve the efficiency of the zeaxanthin synthesis pathway
The experimental results of the reporter protein (RFP, GFP, lacZ) expression libraries with the randomized NNN initiation codons indicated that the noncanonical start codons did not produce the same relative expression levels with the three reporter genes in E. coli. Therefore, the reporter expression strength had no predictive value for the expression of the crtZ, crtY, and crtI genes in the zeaxanthin pathway. Consequently, we adopted a strategy of creating a de novo codon library for each crt gene in E. coli.
This synthesis pathway containing three gene products was optimized using CMIC to demonstrate a practical application of this novel technique (Figure 3b). The chassis strain PHY01 (Table A1) producing the precursor of the zeaxanthin synthesis pathway, phytoene, was constructed previously using classic metabolic engineering strategies (Lu et al., 2012;Sun et al., 2014;Zhao et al., 2013).
Using the CMIC strategy, primers were designed to amplify crtZ, crtY, and crtI from the plasmid pYL-crtZYI (Table A1) PCR Golden gate assembly than the control strain containing the pCrtZYIATG plasmid with the original crt genes, PHY01(pCrtZYI4) to PHY01(pCrtZYI9) had 2.8-to 9.5-fold increased zeaxanthin production ( Table 1). The best strain PHY01(pCrtZYI9) produced 6.33 mg/L zeaxanthin with a specific production value of 1.24 mg/g DCW (Figure 4c,d), representing a 9.7-fold and 9.5-fold increase over the control strain (p < .001).
It was perhaps surprising that none of the crt genes in the best strain PHY01(pCrtZYI9) had natural codons, indicating that the artificial codons regulated the zeaxanthin pathway more efficiently and with better balance than the original all-ATG initiated pathway.
The CMIC technique was therefore demonstrated to offer a feasible strategy for convenient metabolic pathway optimization.
An analysis of the concentrations of synthetic intermediates revealed that the low zeaxanthin-producing strains PHY01(pCrtZYI1), PHY01(pCrtZYI2), and PHY01(pCrtZYI3) had high lycopene accumulation and no β-carotene, suggesting that these strains had very unbalanced pathways so that the carbon flux was stopped at the first synthesis step. Conversely, most strains with improved zeaxanthin production had very low or no lycopene accumulation, but all accumulated some β-carotene, indicating that it was beneficial to move the carbon flux to the second step of the synthesis pathway, which provided the direct substrate for zeaxanthin production.

| CMIC technique modulated zeaxanthin synthesis pathway genes in translational level but not transcriptional level
To determine whether the non-ATG initial codons influenced in the transcription level or translation level of these key genes, three experiments were performed, including real-time qPCR (RT-qPCR) analysis, SDS-PAGE of total proteins, and protein mass spectrometry of total proteins.
In order to investigate the relationship between non-ATG initial codons and the transcriptional expression levels of the key carotenoid synthetic pathway genes, two representative strains PHY01(pCrtZYI7) and PHY01(pCrtZYI9) and the control strain PHY01(pCrtZYIATG) were chosen to analyze the strength of the gene expression through real-time qPCR (RT-qPCR). As indicated in the figures (Figures A5, A6, and A7), although with different initial codons, the transcription levels of the genes crtI, crtY, and crtZ were constant, which suggested that the non-ATG codons did not affect the transcription levels of associated genes in E. coli.
In the SDS-PAGE experiment, as indicated in Figure A8  is demonstrated that the emPAI values of the crtZ protein in the strains of PHY01(pCrtZYI7) and PHY01(pCrtZYI9) were obviously higher than that of PHY01(pCrtZYIATG). The emPAI values of the crtY protein were nearly the same in the three strains, but the emPAI values of the crtI protein in the PHY01(pCrtZYIATG) were significantly higher than those of PHY01(pCrtZYI7) and PHY01(pCrtZYI9).
Combined with RT-qPCR data, these results proved that the non-ATG initial codons indeed affected the gene expression in the translation level but not in the transcription levels in E. coli.
To understand how the different enzyme levels affect zeaxanthin production, protein mass spectrometry experiments were performed for control strain PHY01(pCrtZYIATG), which had original ATG initial codons for crtZYI genes, and two modulated hyperproducing strains PHY01(pCrtZYI7) and PHY01(pCrtZYI9) with modulated initial codons. In the protein mass spectrometry results (Tables A7, A8, and A9), the quantity of detected proteins is represented by the emPAI value. It was determined from Tables A7, A8, and A9 that the emPAI values of crtZ from PHY01(pCrtZYI7) and ** * ** *** *** P H Y 0 1 ( p C r t Z Y I 9 ) P H Y 0 1 ( p C r t Z Y I 8 ) P H Y 0 1 ( p C r t Z Y I 7 ) P H Y 0 1 ( p C r t Z Y I 6 ) P H Y 0 1 ( p C r t Z Y I 5 ) P H Y 0 1 ( p C r t Z Y I 4 ) P H Y 0 1 ( p C r t Z Y I 3 ) P H Y 0 1 ( p C r t Z Y I 2 ) P H Y 0 1 ( p C r t Z Y I 1 ) * P H Y 0 1 ( p C r t Z Y I 9 ) P H Y 0 1 ( p C r t Z Y I 8 ) P H Y 0 1 ( p C r t Z Y I 7 ) P H Y 0 1 ( p C r t Z Y I 6 ) P H Y 0 1 ( p C r t Z Y I 5 ) P H Y 0 1 ( p C r t Z Y I 4 ) P H Y 0 1 ( p C r t Z Y I 3 ) P H Y 0 1 ( p C r t Z Y I 2 ) P H Y 0 1 ( p C r t Z Y I 1 ) Zeaxanthin specific production value (mg/g) P H Y 0 1 ( p C r t Z Y I A T G ) *** *** *** PHY01(pCrtZYI9) exhibited 5.6-and 7.6-fold increase relative to the control strain PHY01(pCrtZYIATG), respectively, while crtY emPAI values remain relatively steady for the three strains. And to our surprise, emPAI values of the first enzyme in the zeaxanthin pathway, crtI, dropped significantly compared with the control strain.
Previous research reports demonstrated that the crtZ enzyme was the rate-limit step and very essential for complete conversion from β-carotene to zeaxanthin in the biosynthesis pathway of zeaxanthin (Nishizaki, Tsuge, Itaya, Doi, & Yanagawa, 2007;Pollmann, Breitenbach, & Sandmann, 2017). Thus, the fact that high-production zeaxanthin strains PHY01(pCrtZYI7) and PHY01(pCrtZYI9) exhibited significant higher crtZ (β-carotenoid hydroxylase) enzyme levels was consistent with the previous report (Ruther, Misawa, Böger, & Sandmann, 1997). However, the lower detected crtI enzyme levels in both zeaxanthin hyper-producing strains PHY01(pCrtZYI7) and PHY01(pCrtZYI9) were not reported in related work, and we do not have a feasible explanation for it yet. However, this nonstraightforward case is worthy of investigation in future work. In addition, there is no report concerning modulating the expression of crtZ, crtY, and crtI simultaneously for regulating the production of zeaxanthin.
Our findings here might give some clues for further optimizing the zeaxanthin synthetic pathway.
Although conventional promoter engineering is a common transcriptional regulation strategy, its disadvantages are as follows: (a) The promoters are long and have high sequence similarity, which might result in homologous recombination (Borodina & Nielsen, 2014); (b) when it is the inducible promoter, large amount of the expensive inducers were essential and inevitable for using these promoters; and (c) due to the promoter sequence is too long, and the promoter strategy is complicated and tedious to operate. As for RBS-based engineering strategies, it still has some drawbacks: the initial codons provides an extra layer for expression modulation in addition to promoters and RBSs, which might be used to further improve metabolic pathways already optimized by promoters and RBSs. And by our experiment, the improvement resulted from initial codon modulation was not marginal that the application of the CMIC strategy in E. coli resulting in nearly 10-fold increased zeaxanthin production.

| CON CLUS IONS
This study proves that changing only the three nucleotides of the initiation codons can be used to generate expression libraries with a large dynamic range and evenly distributed expression levels in E. coli. Based on these findings, the novel CMIC strategy was developed for metabolic pathway optimization and applied to the zeaxanthin synthesis pathway in E. coli. A combinatorial library was obtained containing strains with various zeaxanthin production levels, including a strain with a 10-fold improvement over the starting strain. Therefore, CMIC was demonstrated to be a feasible technique for conveniently optimizing metabolic pathways. To our best knowledge, this is the first metabolic engineering strategy that manipulates the initiation codons for pathway optimization in E. coli. TA B L E 1 Carotenoid production of selected strains from PHY01(pCrtZYIlib) with their corresponding initial codons of crtZ, crtY, and crtI Strains a Titer = mg/L, spv = specific production value = mg/g DCW.

Carotenoid production with different initiation codons by the multigenes of crtZYI
The central principle and mechanism in all organisms have been researched to be highly conserved, and E. coli has been used as a model organism to have revealed many principles and mechanism in classic Genetics. Thus, we think the modulation with CMIC should be universally functional to some extent in other organisms. We plan to study this strategy in a model eukaryote, Saccharomyces cerevisiae, to determine whether such a modulation technique could be applied to eukaryotic systems and hope to present the work in the near future.

ACK N OWLED G M ENTS
We are really grateful to Prof. Li Zhu

CO N FLI C T O F I NTE R E S T S
None declared.

AUTH O R CO NTR I B UTI O N S
Changhao Bi conceptualized the study. Investigations, methodology, formal analysis, data curation, and project administration were

E TH I C S S TATEM ENT
None required.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data associated with the article have been included in this manuscript.  and (c) the frequency of each non-natural start codon in the expression libraries of pNNNCrtI. The green peak represents the base of A, the blue peak represents the base of C, the red peak represents the base of T, and the black peak represents the base of G