Teaching molecular biology to undergraduate biology students: An illustration of protein expression and purification*


  • *

    This work was supported by the Brazilian research funding institution, Fundação de Amparo à Pesquisa do Estado de São Paulo (Centro de Biotecnologia Molecular Estrutural-Centro de Pesquisa Inovação e Difusão-Proc. 98/14138-2). César Adolfo Sommer is a fellowship recipient of Coordenação de Aperfeiçoamento de Pessoal de Nível Superior. The cost of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact.


Practical classes on protein expression and purification were given to undergraduate biology students enrolled in the elective course “Introduction to Genetic Engineering.” The heterologous expression of the green fluorescent protein (GFP)* of Aequorea victoria is an interesting system for didactic purposes because it can be viewed easily during experiments. The students were provided with basic information about the molecular features and applications of the GFP in molecular biology, the available heterologous expression systems, and the theoretical and experimental details of GFP expression in Escherichia coli and its purification. E. coli BL21-competent cells were transformed with the pET28a expression vector containing the GFP gene fused to a histidine (His) tag. During the induction of a transformed clone by isopropylthiogalactoside, a time course for GFP expression was analyzed by SDS-PAGE, and the expression was also visualized by the increasing green fluorescence of the bacterial culture. After cellular disruption, protein purification was illustrated by affinity chromatography of the His-tagged protein in a nickel column. Eluted fractions containing imidazole in increasing concentrations were analyzed visually and also by SDS-PAGE, demonstrating the role of imidazole in protein recovery by competition with nonspecific proteins and the His-tagged protein. The results obtained and the experimental factors involved in protein expression, solubilization, and folding were discussed following the laboratory experiments. These practical classes allowed several current approaches to molecular biology to be demonstrated rapidly and helped underscore some of the topics taught during the course.

The elective course “Introduction to Genetic Engineering” was given to third- and fourth-year biology students at the Federal University of São Carlos (São Carlos-SP, Brazil). Laboratory experiments were carried out at the end of the course to illustrate protein expression in Escherichia coli cells and purification by affinity chromatography. The practical classes lasted 3 days during which the eight students were divided into two groups for the laboratory experiments. Students were asked to answer a questionnaire assessing the overall quality of the course and their attendance and participation in class.

Protein expression followed by protein purification are current tools extensively used in molecular biology for the large-scale preparation of pure proteins for many purposes, such as crystallization, kinetic studies, or the production of antibodies. To illustrate these molecular tools, our didactic approach involved the use of the green fluorescent protein (GFP), 11 a naturally occurring fluorescent protein found in the jellyfish Aequorea victoria. GFP has already been used as an educational tool for undergraduate and graduate students, particularly to demonstrate protein purification [1]. The experimental approach reported here focuses not only on protein purification but also on protein expression from the lac promoter at basal and induced levels. To this end, we have previously cloned the GFP gene in pET28a, a vector that follows the two-step mechanism of expression under isopropylthiogalactoside (IPTG) induction. This mechanism of expression is often used for producing proteins in E. coli because of the selective high-level expression of the cloned genes [2]. Our students were provided with information about the theoretical and practical aspects involved in the experiments, as described below.

The GFP is a 238-aa protein containing the trimer serine-65, tyrosine-66, and glycine-67, which forms a fluorescent chromophore after cyclization and oxidation reactions. Cells expressing GFP emit a green fluorescence when irradiated, a property that renders this protein an excellent tool in molecular biology. Various applications include its use as a reporter gene and as a fusion tag for the subcellular location of a given protein in mammalian cells [36].

E. coli is the most commonly used heterologous expression system, mainly due to its high level of expression, the relative simplicity of DNA manipulations, and the short time required. The major drawbacks when using a bacterial system are the nonsecretion of the recombinant protein, and the lack of post-translational modifications. Also, many eukaryotic proteins frequently form inclusion bodies when expressed in bacteria, especially due to improper folding.

Expression using the pET28a vector involves two levels of amplification and provides larger amounts of a desired protein than other simplified systems. For this system, E. coli host cells engineered to carry the gene encoding T7 RNA polymerase downstream of the lac promoter are required. These cells are transformed with a plasmid that carries a copy of the T7 late promoter and, adjacent to it, the cDNA encoding the gene to be expressed. When IPTG, a lactose analog, is added to the culture medium of transformed E. coli cells, T7 RNA polymerase is expressed by transcription from the lac promoter. The enzyme recognizes the T7 late promoter on the plasmid and catalyzes the transcription of the gene of interest at a high rate [2]. Although expression from the lac promoter is stimulated in the presence of IPTG, there is a low basal level of expression in the absence of the inducer.

Cellular disruption may be achieved by several methods, such as sonication, osmotic shock, treatment with lysozyme, or by using a French press. After centrifugation of the lysate, detection of the expressed protein in the supernatant or in the pellet by SDS-PAGE analysis indicates if it is in the soluble or insoluble form. Proteins in the soluble form are especially desirable for crystallization and can be purified away from the other E. coli proteins by affinity chromatography.

The pET28a vectors carry an N-terminal His6 tag followed by a thrombin cleavage site. After expression, recombinant his-tagged proteins are purified by interaction of histidine residues, with nickel or cobalt ions immobilized on a matrix. Imidazole is used for protein elution because it is also able to bind to nickel and disrupts the interaction of the recombinant protein with the metal ion. The presence of a recognition site for the protease thrombin in the recombinant protein facilitates the removal of the His6 tag after purification.



The following equipment was required for the laboratory work: a water bath (42° and 100°C), an incubator (37 °C), a biological safety cabinet for sterile work, an incubator shaker for bacterial growth, a sonicator, a visible spectrophotometer, a microcentrifuge, a centrifuge, a vertical minigel electrophoresis apparatus, and an electrophoresis power supply.

E. coli Transformation—

The pET28a vector carrying an open reading frame coding for GFP was constructed before the beginning of the course. To construct this plasmid, the pET28a vector (Novagen, Madison, WI) was digested with NdeI, filled in with Klenow fragment, digested with NotI, and then ligated to the insert GFP, which was obtained by digestion of pEGFP-N1 vector (Clontech, Palo Alto, CA) with SmaI and NotI.

E. coli-BL21 competent cells previously prepared by calcium chloride treatment were transformed with pET28a-GFP. The recombinant plasmid (50 ng) was added to the competent cells (100 μl), and the mixture was kept on ice for 10 min. After 90 s of incubation at 42 °C, the tubes were returned to the ice bath for 2 min. An 800-μl volume of Luria-Bertani (LB) medium (1% tryptone, 0.5% yeast extract, and 1% NaCl) was added, and the mixture was held at 37 °C for 45–60 min. The suspension was plated onto LB solid medium containing kanamycin (25 μg/ml), and the plates were incubated overnight at 37 °C.

GFP Expression and Solubility Tests—

A single pET28a-GFP-transformed colony was inoculated in 5 ml of LB medium containing kanamycin the day preceding the experiment. After overnight growth at 37 °C and 200 rpm, 1 ml of the culture was diluted in 250 ml of LB medium containing kanamycin. A negative control was also prepared (250 μl culture in 10 ml medium). Bacterial growth was carried out at 37 °C and 200 rpm until OD600 was ∼ 0.6. At that point, protein expression was induced by adding IPTG at 1 mM concentration (except for the control). A time course for protein expression was performed with 500 μl aliquots of culture 1, 2, and 3 h after induction. Cells were harvested by centrifugation (14,000 rpm for 1 min) and suspended in 60 μl of SDS gel-loading buffer (50 mM Tris-Cl, pH 6.8; 2% SDS; 10% glycerol; 100 mM 2-mercaptoethanol, 0.1% bromphenol blue) for SDS-PAGE analysis. Throughout the procedure, aliquots of the induced culture taken 1, 2, and 3 h after induction were analyzed under a UV transilluminator to observe the increase in fluorescence compared with the control (uninduced culture).

The induced culture was harvested at 7,000 rpm and 4 °C for 5 min, and the pellet was suspended in 6 ml lysis buffer (100 mM NaCl; 10 mM Tris-HCl, pH 8; 50 mM NaH2PO4). Cells were then sonicated on ice (eight 1-min pulses), and the debris was centrifuged at 14,000 rpm and 4 °C for 10 min. Aliquots of 15 μl from the recovered supernatant and from the pellet suspended in 6 ml lysis buffer were analyzed by SDS-PAGE to detect the presence of GFP in the soluble and/or insoluble fractions.

Purification of His6-tagged GFP

Purification of the His6-tagged GFP from the supernatant of the sonicated cells was performed by affinity chromatography, using a nickel nitrilo-triacetic acid (Qiagen, Valencia, CA) minicolumn or a Talon (Clontech) column containing cobalt as the metal ion. The columns were washed with two volumes (10 ml) of Milli-Q (Millipore Corp., Billerica, MA) water and equilibrated with two volumes of lysis buffer. The soluble fraction of the cellular lysate was then added, and the columns were washed with two volumes of lysis buffer. Proteins were eluted with one volume of lysis buffer containing increasing concentrations (25, 50, 75, 100 and 250 mM) of imidazole. Quantitation of purified GFP was carried out as described elsewhere [7]. Aliquots of all the recovered fractions were analyzed by SDS-PAGE and viewed under a UV transilluminator. The resins were washed with three volumes of water and stored at 4 °C in 20% ethanol.

SDS-PAGE and Gel Staining—

SDS-PAGE was performed essentially as described by Laemmli [8] using a vertical minigel electrophoresis apparatus. Samples were electrophoresed at 200 V and 20 mA in running buffer (25 mM Tris-HCl, pH 8.3; 192 mM glycine; 0.1% SDS). The gel was stained with Coomassie Brilliant Blue R-250 and destained in 7% acetic acid. The presence and intensity of the approximately 29-kDa GFP protein band was compared with the fluorescence of the different eluted fractions examined under a UV transilluminator.


Bacterial colonies transformed with pET28a-GFP displayed a green fluorescence 2 days after they were plated onto LB solid medium. An examination of the plates under UV light allowed for a simple demonstration of a low basal level of expression from the lac promoter in the absence of IPTG.

Fig. 1 shows an analysis in SDS-PAGE of the induction of recombinant GFP in E. coli cells. A gradual increase of a protein band, approximately 29 kDa, was observed under IPTG induction but was not detected in the uninduced culture (without IPTG). This protein has the expected size for GFP (∼26.3 kDa), plus the fusion peptide containing His6 tag, and the thrombin site (∼ 3 kDa). Evidence of the increase in GFP expression was also shown by an increase in fluorescence of the induced culture (in comparison with the uninduced culture) when examined under UV light.

The recombinant protein was detected in both soluble and insoluble fractions of the induced cells (Fig. 1). Its presence in the insoluble fraction may have resulted from inefficient cellular disruption during sonication.

Fig. 2 shows an analysis of the recovered fractions during protein purification by affinity chromatography. Many nonspecific proteins that did not bind the resin were released in the flow-through solution and with the washing buffer. Proteins having low affinity for nickel or cobalt ions were eluted with low imidazole concentrations (e.g. 25 mM). The recombinant GFP began to be eluted in a range of imidazole concentrations, starting from 50 mM (Fig. 2A) or 75 mM (Fig. 2B). At these concentrations, some contaminant proteins were also eluted with the protein of interest. At higher concentrations of imidazole, the recombinant protein was almost exclusively eluted. A high recovery level took place at 100 mM (Fig. 2A) or 250 mM of imidazole (Fig. 2B), depending on the affinity of the recombinant GFP for the matrices from different suppliers. At least 12 mg of GFP was purified per 250 ml of bacterial culture (i.e. 48 mg per liter). Elution with the imidazole gradient solutions allowed for a more selective recovery of the His-tagged protein, because low-affinity proteins were eluted earlier from the recombinant protein. As shown in Fig. 2C, eluted fractions containing purified GFP could be identified directly due to the natural fluorescence of the protein.

Other Topics Discussed

After the classes, the following topics were discussed in detail with the students.

Protein Folding and Chemical Modification—

The nascent polypeptide chain synthesized on the ribosome undergoes folding, and all the molecules of any protein acquire a single conformation called the native state, rather than many other possible conformations. Most of the proteins present within cells are in their soluble native conformation, which is required for proper biological activity.

Some cellular mechanisms can prevent misfolded proteins from forming. Chaperones are a family of proteins found in all organisms, from bacteria to humans. They are located in every cellular compartment, binding different protein types, and probably participating in a general protein-folding mechanism. Molecular chaperones, which bind and stabilize unfolded or partially folded proteins, can prevent degradation of these proteins, whereas chaperonins directly facilitate the folding [9].

A common problem of heterologous expression in E. coli is inappropriate protein folding, which leads to the formation of insoluble aggregates called inclusion bodies. These aggregates may form due to the interaction of hydrophobic groups with similar regions of other unfolded molecules or to the formation of disulfide bonds between distinct molecules.

The intracellular conditions of the bacterial cells can affect the correct folding of certain eukaryotic proteins. The distinct redox state of the bacterial cytoplasm compared with mammalian cells can affect the formation of disulfide bonds in different ways and consequently protein folding and solubility.

The protein to be purified is typically most desirable if it is as close to the native conformation as possible. However, some post-translational modifications important for the protein function and solubility in eukaryotic cells (e.g. glycosylation) are absent in E. coli. Nonetheless, glycosylation rarely plays a key role in in vitro studies with a recombinant protein. Furthermore, the structure may not be essential if the protein is to be used as an antigen for the production of antibodies [10].

Other Factors Involved in Protein Expression—

Several factors that prevent or disturb the protein expression in E. coli as well as some strategies to overcome them were discussed in the classes.

Some proteins, particularly those that are smaller than 10 kDa, are not stable in E. coli and may be degraded rapidly by proteases [11]. Fusion with a larger protein such as glutathione S-transferase may overcome this problem. Also, many insoluble proteins become soluble in E. coli when expressed in fusion with glutathione S-transferase and can be purified easily by affinity chromatography, using a column containing immobilized glutathione. Alternatively, insoluble proteins can be solubilized using chaotrophic agents such as SDS or urea followed by refolding into the native form by gradual withdrawal of these agents.

Slowing the rate of expression can also improve protein solubility, and this can be achieved by lowering the concentration of the inducing agent (e.g. IPTG) or lowering the induction temperature. These approaches can also be adopted when the heterologous protein expressed in E. coli is toxic to the cell.

The presence of abundant, unusual codons in the gene to be expressed in E. coli can cause poor translation due to ribosomal pausing [12]. Because transcription and translation are closely linked in E. coli, a premature termination of transcription can occur. The in vitro replacement of such codons by mutation with the codons most commonly used for E. coli may overcome this problem. Protein translation can be also affected by the presence of secondary structures in the messenger RNA, which prevent ribosome binding to the ribosome binding site.


The pET28a-GFP system is a useful, didactic tool for demonstrating protein expression in E. coli and protein purification. The natural fluorescence of GFP makes its visual detection possible during expression in bacteria and during purification by affinity chromatography, in parallel with conventional electrophoresis. The pET28a expression vector allows for illustration of the two-step mechanism of protein induction by IPTG in E. coli and of the role of fused histidine residues in the purification of the desired protein.

An analysis of the questionnaire filled out by the undergraduate students led us to conclude that the objective of the course was successfully achieved. We believe this course has encouraged several of them to join a research group, since six of the eight students are currently working in molecular biology laboratories.

Figure FIGURE 1..

Time course for recombinant GFP expression in E. coli BL21 analyzed by SDS-PAGE.Lane M, protein standards; lane 0, uninduced culture; lanes 1–3, induced culture with 1 mM IPTG for 1 (lane 1), 2 (lane 2), and 3 (lane 3) hours; lanes SF (soluble fraction) and IF (insoluble fraction) of E. coli cells induced for 3 h with IPTG. The gel was stained with Coomassie Blue. The arrow indicates the recombinant GFP protein (∼29 kDa).

Figure FIGURE 2..

A and B, Purification of recombinant GFP by affinity chromatography using a column from Qiagen (A) and Clontech (B).Lane M, protein standards; lane FT, flow-through; lane W, buffer wash; lysis buffer with increasing concentrations of imidazole (indicated in the figure). The gel was stained with Coomassie Blue. The arrow indicates the recombinant GFP protein (∼29 kDa). C, Visualization of GFP under natural light during purification. The figure shows a nickel column before (left) and after (middle) elution and the eluted fractions with 25, 50, 100, and 250 mM imidazole, containing highly purified GFP (right).


We thank the students who participated in this discipline as subjects of this research.


  1. 1

    The abbreviations used are: GFP, green fluorescent protein; IPTG, isopropylthiogalactoside; His, histidine; LB, Luria-Bertani.