Evaluation of an E. coli Cell Extract Prepared by Lysozyme‐Assisted Sonication via Gene Expression, Phage Assembly and Proteomics

Abstract Over the past decades, starting from crude cell extracts, a variety of successful preparation protocols and optimized reaction conditions have been established for the production of cell‐free gene expression systems. One of the crucial steps during the preparation of cell extract‐based expression systems is the cell lysis procedure itself, which largely determines the quality of the active components of the extract. Here we evaluate the utility of an E. coli cell extract, which was prepared using a combination of lysozyme incubation and a gentle sonication step. As quality measure, we demonstrate the cell‐free expression of YFP at concentrations up to 0.6 mg/mL. In addition, we produced and assembled T7 bacteriophages up to a titer of 108 PFU/mL. State‐of‐the‐art quantitative proteomics was used to compare the produced extracts with each other and with a commercial extract. The differences in protein composition were surprisingly small between lysozyme‐assisted sonication (LAS) extracts, but we observed an increase in the release of DNA‐binding proteins for increasing numbers of sonication cycles. Proteins taking part in carbohydrate metabolism, glycolysis, amino acid and nucleotide related pathways were found to be more abundant in the LAS extract, while proteins related to RNA modification and processing, DNA modification and replication, transcription regulation, initiation, termination and the TCA cycle were found enriched in the commercial extract.

aliquoted into the desired aliquot size (usually 30 µl), flash frozen in liquid nitrogen and stored at -80 °C.
Bicinchoninic acid (BCA) assay. Each cell extract batch was subjected to a Bicinchoninic acid (BCA) assay (Pierce BCA Protein Assay Kit, Reducing Agent Compatible Thermo Fisher Scientific) according to the manufacturer's protocol to determine the total protein content of the prepared cell extracts. The results displayed in Figure 2A and C show the mean protein contents of three biological replicates, Figure S4 shows the protein content for each single extract.
Whereas Takahashi et al recommended to use a protein content of 10 mg/mL in the final cell-free protein expression reaction, [3] we used the same dilution factor for all tests independently of the protein content. Our samples contained 33. Fluorescence measurements were performed every 3-6 min using the corresponding filter sets for RFP, YFP, GFP or CFP. Time traces were background corrected with blank values and molar concentrations were calculated using calibration curves for each of the four fluorescent proteins. To compare the fluorescence time traces, their maximum slopes (maximum protein expression rate) and end levels were determined. The mean end levels (mean of 3 biological replicates) are shown for the YFP reporter in Figure 2 as bar graphs for the single lysis settings. The data for RFP, GFP and CFP and the mean maximum protein expression rates (mean of 3 biological replicates) are shown in Figure S6.
Phage assembly. Phage assembly was performed according to the protocol of Rustad et al. [4] with the following adjustments: Phage DNA was mixed with cell extract, an energy solution and an amino acid solution as described in Sun et al. [1] using the same TXTL buffer composition as described in the TXTL test section. [5] For 6 reactions (of 13 µL volume each), 2.5 µL PEG 8000 (36% w/v), 4 µL dNTPs (25 mM), 0.8 µL ATP (500 mM), 37.5 µL TXTL buffer, 2 µL GamS (150 µM), 28.5 µL cell extract and 1.6 µL DNA (10 nM) were mixed with nuclease-free water to a final volume of 80 µL. All constituents were mixed (except DNA) on ice and incubated for 5 min, followed by the addition of DNA. This 13-µL assembly mix was incubated for 4 h at 29 °C to express the bacteriophages.
Plaque assay. The plaque-assay was performed with the top-agar method, with 0.5% agarose in NZCYM (Carl Roth), a standard medium for E. coli cultures and bacteriophages. [6] The agar was melted and stored before use in a water bath at 48 °C. Separately, phage dilutions of 10 2 -10 8 -fold in phage buffer (1x PBS, 1mM MgCl2, 1mM MgSO4) were prepared. 100 µL of each dilution was mixed with an equal volume of an overnight culture of the corresponding host bacterium. This mixture was added to the 0.5% agarose NZCYM medium aliquots and poured on a 1% NZCYM agar plate. After solidified at room temperature, the plates were incubated at 37 °C until plaques became visible.
Sample preparation for mass spectrometry. All cell extracts were dried to completeness using a centrifugal evaporator (Centrivap Cold Trap -50, Labconco, US). The resulting pellets were dissolved in lysis buffer (8 M Urea, 5 mM EDTA, 100 mM NH4HCO3) to a final concentration of 1 mg/mL. Next, 45 µg of each sample was reduced with 10 mM dithiothreitol (DTT) (30 min at 30 °C) and alkylated with 55 mM 2-chloroacetamide (CAA) (30 min in the dark at 25 °C). The samples were diluted 1:4 in 50 mM NH4HCO3 and doubledigested with trypsin (1 h and 13 h at 30 °C, Trypsin gold Mass Spectrometry Grade, Promega), which was added twice at a ratio of trypsin:protein = 1:100 (by mass). The reaction was stopped with 1% formic acid (FA) and the resulting peptides were purified. For that in-house built C18 tips (5 disks of Sep-Pak Vac C18 material, Waters, US) were equilibrated with 250 μl 100% acetonitrile (ACN), 250 µl elution solution (40% ACN, 0.1% FA) and 250 μl washing solution (2% ACN, 0.1% FA) at 1500 g. The samples were loaded into the tips (centrifugation for 2 min at 500 g) and washed three times with washing solution for 2 min at 1500 rcf. Finally, the peptides were eluted with 100 µl elution solution for 2 min at 500g. The samples were dried to completeness and resuspended in washing solution 45 µl right before the MS measurement.
Proteomics data acquisition. Generated peptides were analyzed on an Dionex Ultimate 3000 RSLCnano system coupled to an Orbitrap Fusion Lumos Tribrid Mass Spectrometer (Thermo Fisher Scientific, Bremen, GER). For each analysis an injection amount of ≈ 0.1 μg of peptides was delivered to a trap column (ReproSil-pur C18-AQ, 5 μm, Dr. Maisch, 20 mm x 75 μm, self-packed) at a flow rate of 5 μL/min in 100% solvent A (0.1% formic acid in HPLC grade water). After 10 min of loading, peptides were transferred to an analytical column (ReproSil Gold C18-AQ, 3 μm, Dr. Maisch, 400 mm x 75 μm, self-packed) and separated using a 50 min gradient from 4% to 32% of solvent B (0.1% formic acid in acetonitrile and 5% (v/v) DMSO) at 300 nL/min flow rate. Both nanoLC solvents contained 5% (v/v) DMSO. The Fusion Lumos Tribrid Mass Spectrometer mass spectrometer was operated in data dependent acquisition and positive ionization mode. MS1 spectra (360-1300 m/z) were recorded at a resolution of 60,000 using an automatic gain control (AGC) target value of 4 × 10 5 and maximum injection time (maxIT) of 50 ms. After peptide fragmentation using higher energy collision induced dissociation (HCD), MS2 spectra of up to 20 precursor peptides were acquired at a resolution of 15,000 with an automatic gain control (AGC) target value of 5 × 10 4 and maximum injection time (maxIT) of 22 ms. The precursor isolation window width was set to 1.3 m/z and normalized collision energy to 30%. Dynamic exclusion was enabled with 20 s exclusion time (mass tolerance +/-10 ppm). MS/MS spectra of species that were singly-charged, unassigned or with charge states > 6+ were excluded.
Proteomics data analysis. Peptide identification and quantification was performed using the software MaxQuant (version 1.6.3.4) with its built-in search engine Andromeda. [7] MS2 spectra were searched against the E. coli (strain B / BL21-DE3) reference proteome from Uniprot (UP000002032, 4156 protein entries), supplemented with common contaminants (built-in option in MaxQuant). Trypsin/P was specified as proteolytic enzyme.
Precursor tolerance was set to 4.5 ppm, and fragment ion tolerance to 20 ppm. Results were adjusted to 1% false discovery rate (FDR) on peptide spectrum match (PSM) level and protein level employing a target-decoy approach using reversed protein sequences. The minimal peptide length was defined as 7 amino acids, the "match-between-run" function was disabled. For full proteome analyses carbamidomethylated cysteine was set as fixed modification and oxidation of methionine and N-terminal protein acetylation as variable modifications.
Proteins were quantified using Label Free Quantification (maxLFQ [8] ). The maxLFQ intensity was log transformed before downstream analysis. T-tests were used in the differential analysis (using R version 3.6.3). The false discovery rates (FDRs) were calculated from the p-value using the Benjamini-Hochberg method. [9] Proteins with FDR < 0.05 and unique proteins (proteins which were present in all three replicates of one sample, but not present in one of the three replicates of the other sample) were selected as significantly differentially expressed proteins and passed to the DAVID functional annotation [10] for enrichment analysis. To simplify the presentation, proteins in GO terms with FDR below 0.05 were roughly classified according to keywords using the UniProt database [11] related to protein expression and energy regeneration and their influence on gene expression was interpreted accordingly.    Figure S4: (A) For the shaking flask replicates lysozyme incubation had a higher impact on cell lysis than sonication cycles. The protein content was generally higher in samples without sonication cycles (S0/L0.5 and S0/L1) than in samples not incubated with lysozyme (S5/L0 and S15/L0). Samples which were treated with a combination of lysozyme incubation and sonication cycles showed a high protein content. (B) In contrast to the shaking flask replicates, the bioreactor replicates showed a higher deviation in protein content among the biological replicates than among the different lysis settings.   Not normalized fluorescence data for the YFP plasmid and the GFP control plasmid. The levels are very different, but these levels depend not just on protein concentration but also on quantum yield, brightness and other parameters. (C) p70-GFP control plasmid in commercial cell extract. We purified the p70-GFP control plasmid using our standard technique and performed a TXTL test. The commercial plasmid has a high signal whereas our self-purified one has almost no signal. The commercial cell extracts seems to be sensitive to residual chemicals in our plasmid. Figure S8: Plaque assays for the shaking flask and bioreactor replicates. For all self-made cell extracts except for the S12/L0.8 samples comparable phage titers were measured. For the commercial kit one order of magnitude less phages could be assembled and the bead beating batch performed much worse, just 500 phages could be counted in one replicate, no plaques could be counted for the second replicate. For the shaking flask samples just 3 proteins were found in GO terms with GO FDR < 0.05. These proteins can be assigned to anaerobic growth conditions. In summary shaking flask extracts and bioreactor extracts, which were treated with the same lysis conditions show no differences in their proteome except for abundance of proteins related to anaerobic growth. This is an expected result as we in contrast to bioreactor cultivation did not provide additional oxygen in shaking flask cultivation. Figure S10: Comparison of the proteomes of extracts prepared from bioreactor and shaking flask cultures and a commercial kit. (A) Volcano plot of shaking flask samples S0/L1 against S15/L1. The t-test of S0/L1 against S15/L1 extracts gives 1 and 6 proteins with FDR < 0.05 (i.e., -log10(FDR) > 1.3). In addition, 3 (S0/L1) and 47 (S15/L1) unique proteins could be found and were also subjected to an enrichment analysis. (B) Enrichment analysis derived from the comparison of S0/L1 against S0/L5 and from the comparison shown in (A). No GO terms with FDR<0.05 could be found for the S0/L1 extracts in both comparisons but in total 7 (S0/L1 against S5/L1) and 21 (S0/L1 against S15/L1) proteins were found in GO terms with FDR < 0.05 for the S5/L1 and S15/L1 samples respectively. For both cases these could be assigned to keywords related to DNA replication, relaxation, as no fold change can be calculated). 27% of the proteins are less than 1.5 fold enriched, 67% less than 2 fold and 90% less than 3 fold.

Proteomic differences between our shaking flask replicates SF1, SF2 and SF3 with lysis setting S5/L0.5 and the commercial extract
The proteomics analysis revealed the greatest differences between our shaking flask batches S5/L0.5 and the commercial kit. The differences can be a result of differences in the culture conditions, the lysis procedure and other cell extract processing steps. Even though we had no detailed information about the preparation procedure of the commercial kit, we compared the two extracts, as the GFP expression yield was comparable. We found in total 172 (commercial extract, thereof were 15 unique proteins) and 31 (shaking flask batches, thereof were no unique proteins) proteins in GO terms with FDR<0.05, thereof were 90% of the proteins less than 3 fold enriched (67% less than 2 fold and 27% less than 1.5 fold enriched, see Figure S10F). The potential role of the single proteins is discussed in the following: Energy metabolism. Compared to commercial extract, in our home-made batches 9 out of 10 enzymes from the glycolysis pathway were enriched, including glucokinase, which converts cytosolic glucose into glucose-6-phosphate (triosephosphate isomerase was the only enzyme that was not present at a higher abundance). Further, we found enzymes such as adenylate kinase or guanylate kinase enriched in our self-made batches, which potentially play a role in ATP and GTP regeneration in vitro.
On the other hand, we detected higher abundance of the subunits IIB and IIC of E. coli's major transmembrane carbohydrate transport system (the Pst system) in the commercial extract. This indicates that the lysis conditions used for the commercial extract might cause a higher fragmentation of the membrane and thus a more efficient release of trans-membrane proteins. In contrast to glycolysis, enzymes of the TCA cycle were enriched in the commercial extract. In shaking flask batches, we also detected an increased amount of dehydrogenases encoded by the glpABC operon, which belongs to the glycerol kinase pathway and is responsible for anaerobic energy generation. No differences were detected in the pentose phosphate pathway.
Transcription/translation. Together with core RNA polymerase, sigma factor RpoD (σ 70 ) dominates the transcription in exponentially growing cells. We found RpoD more highly abundant in the commercial extract. Furthermore, during envelope stress several sigma factors are upregulated in E. coli, including sigma factor RpoE (σ 24 ), which was also found more abundant in the commercial extract. Other stress factors such as translational regulator CsrA (envelope/periplasmic stress) and nitrogen-limitation factor RpoN (σ 54 ) were enriched in the commercial cell-free system, as well as the ribosomal subunit S22, which is associated with stationary bacterial growth. Apart from these stress indicators, a variety of other transcription regulators such as LacI, MarR, GntR, DeoR or LysR were enriched compared to our home-made shaking flask batches.
In addition, also translational capacity appeared to be enriched in the commercial system. In particular, we found 20 out of the 22 ribosomal proteins of the 30S subunit at higher abundance in the commercial extract, including S22 (see above) and the essential ribosomal protein S12, which takes part in both tRNA and ribosomal subunit interactions. In case of the 50S subunit, we found 24 out of 33 ribosomal proteins more abundant in the commercial extract, including the small ribosomal protein L34 (5.3 kDa). Also other translation-related proteins showed higher abundance in the commercial extract such as initiation (IF-2, IF-3) and elongation (EF-4, EF-Tu, SelB) factors, but also the ribosomal silencing factor (RsfS) which inhibits ribosome association and prevents translation.
Degradation of nucleic acids and proteins. We also found notable differences between the cell extracts in degradation pathways. In the commercial extract ribonucleases 2 and E were enriched (p-value 0.05), which are mainly involved in mRNA degradation. Other ribonucleases participating in RNA maturation and processing (RNAse 3, G, PH and R) were also more abundant in the commercial extract. On the other hand, endoribonuclease L-PSP, also acting on mRNA, was found more highly concentrated in self-made batches. We also investigated the presence of proteases in the cell extracts. We found both subunits HslV and HslU (annotation at the transcript level) of the proteasomelike degradation complex HslVU (ClpQY) enriched in the commercial extract, which unfolds proteins under ATP consumption.
Biosynthesis. Interestingly, many proteins involved in amino acid biosynthesis were more abundant in our self-made CF system compared to the commercial extract, suggesting their potential use for amino acid production inside the extract starting from inexpensive precursors. Finally, the chaperone cofactor GroES was more highly expressed in the commercial batch, but not its chaperone complex GroEL, even though it is encoded by the same operon.   [1] . Anyway the composition of the growth medium and the buffers needed for cell washing and cell extract dialysis are listed below.
A detailed protocol for the TXTL buffer preparation is shown in the protocol of Sun et al. [1] .   [12] The codon usage table of Escherichia coli B and of Shigella flex. 2a were used for the calculation. The second one is more related to MRE600, which is the origin strain of the purified tRNA used for the TXTL buffer. The translation rate was predicted using an RBS calculator. [13] Protein mScarlet I [14] mVenus [15] GFP mut3 [16] mTurquoise 2 [17] p70-GFP (deGFP3) [18] Ex.  Table S4: T-test for TXTL data. We aimed to proof our hypothesis, that our data show an optimum in the number of sonication cycles. The data shown in Figure 2 B and D were split in two sets with fixed lysozyme concentration and different sonication cycles: For shaking flask cell extracts these were 0, 5, and 15 sonication cycles in combination with a lysozyme concentration of 0.5 or 1 mg/ml respectively (data set 1 and 2; samples S5/L0 and S15/L0 were excluded from the t-test). For the bioreactor extracts these were 4, 8, 12, 16, and 20 sonication cycles in combination with a lysozyme concentration of 0.5 or 0.8 mg/ml respectively (data set 4 and 5). As we observe the same trend in the data independent on the lysozyme concentration, we also introduced combined data sets (data set 3 and 6). A parabola y=a·x 2 +b·x+c with x being the number of sonication cycles and y being the TXTL end level of the YFP reporter was fitted to the data sets. The fit parameter a was tested against the hypothesis 'Fit parameter a is zero' and the according p-value was calculated. We could reject the null hypothesis for all data sets (p-value < 0.05) except for data set 4, which has a p-value of 0.06. So the observed optima in the TXTL data are statistically significant.   Figure S11: Plasmid maps for TXTL tests. A mScarlet, mVenus, E0040 GFP or mTurquoise reporter was cloned in a pSB1A3 backbone containing a J23106 promoter, a B0034 ribosome binding site (RBS) and a B0015 terminator.