• Open Access

Structural genomics and drug discovery


*Correspondence to: Kenneth LUNDSTROM, PhD
Flamel Technologies, 33 Avenue du Dr. Georges Lévy
69693 Vénissieux, France.
Tel.: +33 472 78 34 41
Fax: +33 472 78 34 24
E-mail: lundstrom@flamel.com


  • • Background to drug discovery
  • • Structure determination and drug discovery
  • • Expression systems
    • - Cell-free translation
    • - Bacterial expression
      • - Expression in E. coli
      • - Other prokaryotic systems
    • - Expression in yeast cells
    • - Expression in insect cells
      • - Baculovirus
      • - Other insect vectors
    • - Expression in mammalian cells
      • - Transient and stable expression
      • - Viral vectors
  • • Protein purification
  • • Structure determination
    • - X-ray crystallography
    • - Nuclear magnetic resonance
    • - Electron microscopy
    • - Structural genomics programs
  • • Conclusions and future prospects


Structure determination has already proven useful for lead optimization and direct drug design. The number of high-resolution structures available in public databases today exceeds 30,000 and will definitely aid in structure-based drug design. Structural genomics approaches covering whole genomes, topologically similar proteins or gene families are great assets for further progress in the development of new drugs. However, membrane proteins representing 70% of current drug targets are poorly characterized structurally. The problems have been related to difficulties in obtaining large amount of recombinant membrane proteins as well as their purification and structure determination. Structural genomics has proven successful in developing new methods in areas from expression to structure determination by studying a large number of target proteins in parallel.

Background to drug discovery

Drug discovery has relied to a large extent on medicinal chemistry, which obviously has evolved significantly during the past two decades [1]. Although screening of large numbers of compounds and gigantic combinatorial libraries have generated a number of efficient drugs, increasing interest has been dedicated to structure-based drug design. The advantage of this approach is that drug molecules can be ‘tailor-made’ to interact with the drug target. This property should improve the drug efficacy and also specificity. Improved efficacy might allow the administration of lower drug doses and/or increasing the distribution frequency. Drugs presenting enhanced target specificity should limit the interaction with other non-specific target molecules and hence reduce the side effects of the drug substantially. Furthermore, the drug development process might be shortened.

To apply structure-based drug design, it is necessary to have access to structural information of the drug target. More than 30,000 high-resolution structures have been deposited in public databases, the majority of these however are on soluble proteins. In contrast, some 70% of current drugs are targeted against membrane proteins, for which only more than 100 structures are available [2]. This discrepancy relates mainly to the topological composition of membrane proteins. A large number of them such as G protein-coupled receptors (GPCRs) and ion channels have a topology, including multispanning transmembrane domains [3]. For this reason, it has been much more difficult to recombinantly express membrane proteins in quantitatively and qualitatively sufficient amounts compared with those of their soluble counterparts. Moreover, as the membrane proteins are embedded in membranes detergents are required for purification, which affects both yields and stability of the protein [4]. Furthermore, structure determination by X-ray crystallography has been negatively affected by low yields and presence of detergents [5]. Membrane proteins also possess characteristically flexible regions and short hydrophilic loops, which reduces significantly the potential crystal contacts and thereby crystallization success. The application of nuclear magnetic resonance (NMR) for structure determination is also more complicated for membrane proteins. Typically, only a single high-resolution structure has been solved for the family of GPCRs consisting of some 800 members [6]. Moreover, the structure for bovine rhodopsin was obtained on material isolated from native tissue, where the receptor occurs at high abundance.

The structural genomics approach allows the study of a large number of gene products in parallel. By targeting not only whole genomes but also specifically certain types or families of proteins, the understanding of the requirements for obtaining high levels of expression and the conditions for purification and structure determination can significantly improve the success rate for membrane proteins.

In this review, although a general overview on structural genomics is presented, the focus is on membrane proteins. The status of applying structure-based drug discovery is briefly described. Much attention is given to the methodological development for structural biology. An overview of currently applied expression systems is given. Also various purification and structure determination procedures are described. Finally, examples of various structural genomics consortia are discussed.

Structure determination and drug discovery

Structural biology has facilitated drug discovery already for some time. The application of structure-based drug design has become relatively common in lead optimization [7]. Structural knowledge of proteins and their ligands has aided in improving drug potency and selectivity [8]. This approach has resulted in faster definition of drug-binding properties and has made it easier to identify ‘hit’ compounds through screening programs [9]. Both the use of X-ray crystallography and NMR have allowed high throughput approaches for structure-based lead discovery [10]. Rapid structure resolution of protein-ligand complexes has been obtained applying such automated procedures as AutoSolve®[11]. The technology revealed that from cocktails of 100 molecules, electron density variations could be used to distinguish their different shapes [12]. Applying cocktails of smaller fragments at very high concentrations candidate fragment ranking can be performed automatically and up to 1000 compounds can be screened within 2–3 days [13]. In this context, fragments of successful drug-like molecules can be characterized in high throughput format taking into account molecular weight, the presence of hydrogen bond donors and acceptors and solubility. The technology has been expanded to virtual screening approaches by systematically docking large libraries of candidate fragments in pre-defined binding sites of target proteins applying three-dimensional (3D) computer models [11]. Furthermore, in structure-based drug design an NMR-based screening approach was taken to design small molecule drug candidates to inhibit the aberrant over-expression of c-myc in different tumours by targeting the far upstream element (FUSE) binding protein (FBP) [14].

There are a number of drugs for which direct applications of structure-based design has been achieved. So far, more than 40 compounds based on structural data have entered clinical trials and at least 7 drugs have reached the market [15]. In this context, the high-resolution structure of the human immunodeficiency virus (HIV) proteinase was the basis for the AIDS drugs Agenerase® and Viracept®[16]. Another structure-based drug is the flu drug Relenza®, which is based on the structure of the influenza virus neuraminidase [17]. Furthermore, the design of the protein kinase drug Gleevec® profited from the 3D structure of the kinase domain of c-Abl especially in addressing resistance issues [18]. New structure-based drugs will certainly appear as novel high-resolution structures of relevant drug targets will become available. For instance, the crystal structure of AKT kinase has triggered structure-based drug discovery on kinase inhibitors [19].

Perhaps the most peculiar example of structure-based drug discovery comes from the program on novel phophodiesterase (PDE) inhibitors to treat hypertension and other cardiovascular indications [8]. According to the structural information, novel PDE inhibitors were designed of which the compound UK 92480 (sildenafil) showed a 100-fold increase in PDE5 inhibition compared with zaprinast [20]. Despite the expected pharmacological profile of sildenafil its clinical performance in treating coronary heart disease was disappointing. However, as sildenafil inhibited efficiently PDE5 and potentiated nitrose oxide activity, the drug could be used for the treatment of erectile dysfunction. Although originally aimed for another indication structure-based drug design assisted in generating the globally well-known blockbuster drug Viagra®.

Expression systems

As the minority of proteins is available in high abundance allowing direct isolation and purification from native tissues and furthermore ethical issues to large extent prevent any such action, structural biology strongly relies on heterologous expression systems for the production of recombinant proteins. During the years since the advent of genetic engineering technologies, a number of various expression systems have been evaluated. Among these are heterologous expression in bacterial, yeast, insect and mammalian cells applying different types of expression vectors. More recently, cell-free translation systems have become available for general use. Today, the two most frequently used systems are based on expression in Escherichia coli and Baculovirus-infected insect cells. Generally, either of these two systems is applicable for expression of almost any soluble protein in quantities acceptable for structural studies. However, concerning membrane proteins the situation is quite different.The yields are substantially lower and the stability of the produced proteins low, which has forced major development activities for expression vectors and systems (Table 1). The main expression systems are briefly described later. Expression levels are commonly described for GPCRs based on binding activity (Bmax pmol receptor per milligram protein) or receptor yields (milligrams per liter). For an easier comparison, a Bmax value of 10 pmol/mg corresponds to 0.5 mg/l, 50 pmol/mg to 2 mg/l and 150 pmol/mg to 10 mg/l.

Table 1.  Expression vectors and systems and their applications
SystemTargetsExpression levels*Applications/comments
  1. Bop = bacteriopsin protein; IB = inclusion bodies; MP = membrane protein; PM = plasma membrane; S = soluble protein *Estimations of expression levels of GPCRs are presented as binding activity (Bmax) in pmol receptor per milligram protein or as receptor yields in milligrams per liter. Approximately, 10–20 pmol/mg is equivalent to 0.5–1 mg/l, 40–70 pmol/mg corresponds to 2–5 mg/l and 100–200 pmol/mg represents 7–15 mg/l.

Cell-free systemsS, MP, GPCRrelatively highparallel expression, also PCR products
E. coli extracts mg/ml (S)isotope labelling for NMR analysis
Wheat germ extracts scale-up costly, fusion proteins helpful
E. coliS, MP, GPCRrelatively high in PM high in IBstructural biology: solubilization (PM) refolding from IBs required
H. salinarumMP, GPCRhigh for Bopfusion proteins often helpful
B. subtilissecretedhighhigh protease activity
L. lactistransportersrelatively highgood for mitochondrial transporters
YeastS, MP, GPCRrelatively highstructural biology
S. cerevisiaeS, MP, GPCR1–10 pmol/mgstructure of SERCA 1a
S. pombeS, MP, GPCR1–20 pmol/mgimproved expression from S. cerevisiae
Pichia pastorisS, MP, GPCR>100 pmol/mg; 5 mg/lstructure SoP1P2, Shaker K+ channel
Insect cellsS, MP, GPCRhigh-very highstructure biology, large-scale
Baculovirus 200 pmol/mg; 5 mg/lstructures on soluble proteins
Drosophila 10 pmol/mgstable expression
Mammalian cellsS, MP, GPCR 
Transient medium 20 pmol/mg; <1 mg/lscale-up more expensive, complicated
Stable low to highvaries, inducible systems improvement
200 pmol/mgcodon-optimized, mutant HEK293 cells
ViralS, MP, GPCR 
Adenovirus medium, 10 pmol/mgbroad host range
Lentivirus highcommercial system
SFV 287 pmol/mg; 10 mg/leasy scale-up
Vaccinia virus high-very highT7 based vectors

Cell-free translation

The recent development of cell-free translation systems has made them attractive and competitive [21]. Currently, commercial systems based on E. coli and wheat germ extracts exist. Cell-free translation systems are applicable to rapid high throughput expression evaluation especially since the systems can be directly used for PCR fragments, which omits cloning procedures. The advantages of cell-free translation is the controlled expression in defined minimal medium and simple amino-acid selective and uniform stable isotope labelling for direct sample analysis by NMR [22].

Cell-free translation has supported structural biology well for soluble proteins. Recently, a modified E. coli S30 extract has improved the yields for membrane proteins and the bacterial multi-drug transporters TehA and YfiK were expressed at levels of 2.7 mg/ml [23]. Expression of three GPCRs (human β2 adrenergic, human muscarinic acetylcholine M2 and rat neurotensin receptor) resulted in functional binding activity albeit only after applying fusion protein partners [24]. Recently, the first demonstration of cell-free translation–based high-resolution structure determination came from solving the X-ray structure of the EmrE multidrug transporter complex [25].

Bacterial expression

Expression in E. coli

The by far most commonly used expression system is based on E. coli[26]. The fast and simple expression profile, cheap and safe large-scale production and versatility for different types of proteins are factors that have made E. coli the system of choice. However, although a number of high-resolution structures have been solved for soluble proteins recombinantly expressed in bacteria, membrane proteins have been less successful. Because of their transmembrane topology, membrane proteins have generally proven highly toxic to the bacterial host cells and therefore restricted their growth and thereby reducing the recombinant protein yields. Despite these shortcomings, a number of bacterial membrane proteins have successfully been expressed in E. coli and subjected to purification and structural analysis [27]. Successful expression of eukaryotic membrane proteins has required modifications (deletions, mutagenesis) of the gene sequence as well as the application of appropriate fusion partners and purification tags. In this context, the fusion of the maltose-binding protein (MBP) to the N-terminal of the rat neurotensin receptor generated milligram quantities of purified functionally active receptor [28]. In a similar way, the adenosine A2a receptor with a truncated C-terminal resulted in 10–20 nM/l of receptor when expressed as a fusion protein with MBP [29]. Over-expression of GPCRs in bacterial membranes has recently been reviewed [30].

A completely different approach has been to over-express recombinant proteins in bacterial inclusion bodies as aggregates. The yields have been substantially higher by this procedure, but re-folding is necessary to obtain functionally active recombinant protein. Unfortunately, the refolding process has been difficult and inefficient [31]. However, recent technology development has allowed refolding of glucagon-like peptide 1 (GLP-1) [32], leukotriene B4 [33] and serotonin 5-HT4 [34] receptors.

Other prokaryotic systems

In addition to E. coli, other bacterial expression systems have been evaluated for production of recombinant proteins. For instance, the Gram-positive bacterium Lactococcus lactis has been verified for the expression of both prokaryotic and eukaryotic proteins [35]. Specific vectors using the nisin NisA promoter and NisR and NisK regulatory sequences as well as transformation methods have been established for L. lactis[36]. A number of membrane proteins have been verified for expression. The prokaryotic ABC transporters and Major Facilitator Subfamily (MFS) efflux pumps [35] as well as yeast mitochondrial carrier proteins CTP1 and AAC3 were well expressed [36]. However, expression of the human KDEL receptor resulted in yields less than 0.1% of the total membrane protein [36]. In a recent study, 11 yeast mitochondrial transporter proteins were expressed at structural biology compatible levels in L. lactis[37]. Expression levels could be enhanced 10-fold by replacing the N-terminus of the transporter with a signal sequence from L. lactis. Bacillus subtilis, another Gram-positive organism, was already in the 1980s considered as a good host for recombinant protein expression generating large quantities of recombinant interferon [38]. However, serious high endogenous protease activity and mainly restriction to secreted proteins has limited the use of the system, although recently recombinant human cystatins have been efficiently produced in B. subtilis [39].

Halinobacterium salinarum became an interesting alternative as a prokaryotic expression host because of the accumulation of bacterio-opsin protein (Bop), which with the chromophore retinol gives the bacteria their characteristic purple colour [40]. Various membrane proteins such as the E. coli aspartate transcarbamylase (AT), the yeast α mating factor receptor and two human GPCRs (muscarinic M1 and serotonin 5-HT2 receptors) have been expressed from H. salinarum vectors [41]. It was essential to use Bop as fusion partner. The Bop-AT yielded 7 mg/l protein, whereas the GPCRs were expressed at extremely low levels. A fusion construct of Bop and the human adrenergic α2B receptor resulted in functional binding activity, although with 10 times lower binding values than obtained in yeast or mammalian cells [42].

Expression in yeast cells

Yeast expression systems can be characterized by their easy use and large-scale production and their eukaryotic post-translation machinery. Various types of recombinant proteins have successfully been expressed in yeast cells [43]. Baker's yeast Saccharomyces cerevisiae has been used for heterologous expression of hepatitis B surface antigen (HBsAg) [44], α1-Antitrypsin [45], human –interferon [46] and β-endorphin [47]. S. cerevisiae has also been frequently used for expression of membrane proteins. For instance, the yeast α-factor Ste2p receptor [48] and human dopamine D1A receptor [49] generated high levels of expression. Moreover, large-scale production of the human β2 adrenergic receptor generated yields of 20–30 mg of functional receptor [50]. Recently, the rabbit sarcoplasmic-endoplasmic reticulum Ca2+-ATPase isoform 1a (SERCA 1a) was purified from yeast cells by metal affinity chromatography and HPLC filtration and the structure solved at 3.3 Å resolution [51]. In addition to S. cerevisiae, the fission yeast Schizosaccharomyces pombe has also been applied to hetereologous gene expression [52]. Comparison of GPCR expression demonstrated that the human dopamine D2 receptor was expressed at fivefold higher levels in S. pombe than in S. cerevisiae[53]. In contrast, the rat dopamine D2 and human NK1 receptors were expressed at lower levels in S. pombe[54].

Currently, the most frequently used yeast strain for recombinant protein expression is Pichia pastoris[55]. This methylotrophic yeast expression system is based on chromosomal integration of the gene of interest and utilization of strong inducible promoters such as the alcohol oxidase (AOX) promoter [56]. One of the advantages of P. pastoris is the high biomasses obtained in large-scale cultures allowing production of g/L quantities of recombinant proteins [57]. Today, more than 200 recombinant proteins have been expressed in P. pastoris. A number of membrane proteins have been expressed in P. pastoris, particularly GPCRs [58, 59]. In a structural genomics-type study, 100 GPCRs were expressed in P. pastoris, for which a large number structural biology compatible levels (1–10 mg/L) were obtained [60]. The highest levels of expression of 180 pmol/mg receptor were measured for the human adenosine A2a receptor. Applying P. pastoris vectors, expression and purification allowed to obtain a high-resolution structure of the spinach aquaporin SOPIP2 channel [61]. Likewise, single particle imaging and cryo-EM two-dimensional (2D) crystals could be obtained for the rat neuronal voltage-sensitive K+ channel over-expressed in P. pastoris[62]. Finally, the mammalian Shaker voltage-dependent K+ channel was expressed in P. pastoris, which allowed efficient purification and high-resolution structure determination [63].

Expression in insect cells


The second most commonly used expression system after E. coli is based on Baculovirus vectors for infection of insect cell lines [64]. High expression levels of topologically different recombinant proteins have been obtained in different insect cell lines from Spodoptera frugiperda (Sf9 and Sf21), Mamestra brassica and Trichoplusia ni (high five). Eukaryotic and especially mammalian membrane proteins have been relatively favourably expressed in insect cells because of the similar post-translational processing mechanisms present in insect cells. In this context, expression of rhodopsin resulted in 80% functional receptor and yields up to 6 mg/l [65]. GPCRs have been popular targets for baculovirus expression, and in many cases expression levels of 40–60 pmol/mg have been achieved [66]. Expression levels between 1 pmol/mg and 250 pmol/mg were obtained in a study of 16 GPCRs in three insect cell lines [67].

Other insect vectors

In addition to Baculovirus-based systems, expression of recombinant proteins has been conducted mainly in stable insect cell lines, typically Drosophila Schneider cells [68]. A soluble deglycosylated form of the human interleukin 5 (IL-5) alpha subunit was expressed in Drosophila cells in active form, purified and a 2.6 Å resolution crystal structure was solved [69]. Schneider cells have also been used as hosts for GPCR expression. For instance, the human mu opioid receptor (hMOR) showed a similar pharmacological profile as in mammalian cells and the functional coupling to G proteins was demonstrated by cAMP stimulation and GTPγS binding assays [70]. Engineering of an N-terminal EGFP tag to the hMOR for localization studies suggested that a large number of receptors were retained in intracellular compartments and not present on the plasma membrane.

Expression in mammalian cells

Transient and stable expression

Mammalian cell lines provide the most native environment for expression of recombinant mammalian proteins. However, immortalized cell lines are severely compromised and have significantly different features to primary cells. Furthermore, specific proteins may require accessory proteins for transport, folding and proper function. Recently, it was demonstrated that certain transmembrane proteins (RTP1, RTP2 and REEP1) [71], and the guanine nucleotide exchange factor Ric-8B [72] can facilitate the transport and enhance the expression of olfactory receptors. Transient expression has been conducted in various cell lines (BHK-21, CHO-K1, COS-7 and HEK293) [73]. Alternatively, stable cell lines have been generated [74]. Generally, the expression levels have been higher in transient expression, and recently the codon-optimized hamster β2 adrenergic receptor (β2-AR) was expressed in COS-1 cells at 18 pmol/mg [75]. Typically, GPCRs are expressed in the pmol range in stable cell lines [76]. However, the development of a tetracycline-inducible system resulted in up to 6 mg/l of rhodopsin production [77]. When a stable inducible HEK293-β2-AR cell line was established, 220 pmol/mg receptor was obtained [75]. This expression level resulted in up to 50 μg of β2-AR per 15-cm-cell culture plate.

Viral vectors

Because of their broad host range and generally high expression levels, viral vectors have presented attractive alternatives for heterologous gene expression. However, the high transduction rate has also raised some concerns related to biosafety, which has required engineering of mutant replication-deficient vectors. A number of viral vectors such as aden-oviruses, alphaviruses, lentiviruses and vaccinia viruses have been applied for recombinant protein expression [78]. In this context, adenovirus-based expression of the non-structural NS1 glycoprotein of tick borne encephalitis virus (TBEV) resulted in yields up to 25% of total protein [79]. A number of GPCRs have also been expressed from adenovirus vectors resulting in relatively high expression levels [80]. Owing to the broad host range of adenoviruses, expression studies could be conducted in various cell lines, and for instance the β2-AR was expressed in rabbit myocytes [81].Vaccinia virus vectors have also been frequently used for heterologous gene expression especially applying replication-deficient vectors [82]. More than 100 proteins have been expressed from vaccinia virus vectors [83]. Among GPCRs neu-ropeptide Y [84] and dopamine D2 and D4 receptors [85] have been expressed at densities of 5–10 million receptors per cell. Lentivirus vectors, characterized for their long-term expression pattern, have found more applications for recombinant protein expression with the commercialization of complete lentivirus expression systems [86]. Concerning membrane proteins, the human retinal pigment epithelium (RPE) retinal GPCR was expressed in COS-7 cells and in the retinal pigment epithelial cell line ARPE-19 [87]. Interestingly, the expression levels were 100 times higher in the ARPE-19 cells and long-term expression was detected up to 6 months.

The viral system probably most frequently used for recombinant protein expression and particularly membrane proteins is based on the Semliki Forest virus (SFV), a single-stranded enveloped RNA virus [88]. The easy and rapid production of recombinant SFV stocks has made it feasible to express a large number of proteins in parallel in various mammalian cell lines [89]. In this context, 103 GPCRs were evaluated for expression levels in three mammalian cell lines in a structural genomics program described in more detail below [90]. As large-scale production in mammalian suspension cultures has been established for SFV, ligand-gated ion channels [91] and GPCRs [92] have been expressed at levels of 5–10 mg/L, purified and subjected to structural biology.

Protein purification

Structural characterization of protein generally requires access to highly homogenous and pure protein preparations although the solution NMR approaches has allowed analysis of labelled samples in the presence of a relatively high background on non-labelled proteins. Genetic engineering has strongly facilitated the purification procedure as various affinity tags have been introduced at the N- or C-terminal or even within the coding sequence of the gene of interest. The most commonly used tag is multi-histidine (either hexa or deca histidine), which binds to Ni2+ and therefore allows purification based on immobilized metal affinity chromatography (IMAC) [93]. Other common purification tags are streptavidin (Strep), biotin, FLAG and hemagglutinin tags. Obviously, when available, antigen-based affinity chromatography can be applied. Other means for purification include ammonium sulphate precipitation and sucrose gradients, although these methods require large quantities of material and might therefore not be suitable for recombinant proteins expressed at low levels or membrane proteins with low recovery yields. In addition, gel filtration, size exclusion chromatography, hydrophobic interaction and reverse-flow chromatography are methods to be considered for protein purification.

Naturally, membrane proteins require special conditions for purification. Because of their transmembrane topology, separation of proteins and lipids is necessary by the addition of detergents [94]. Solubilization by detergents is a complicated process, and a vast number of detergents have been tested. In general, detergents are highly target-specific, which means that each target has to be screened for appropriate detergents [95]. Commonly used detergents are CHAPS (3-[(3-cholamidopropyl) dimethylaminio]-1-propane-sulfonate, Triton X-100, n-Octylglucoside, n-Nonylglucoside, n-Dodecylmaltoside and FOS-Choline and cocktails thereof [95].

Structure determination

X-ray crystallography

The highest structure resolution, below 2 Å, can be achieved by X-ray crystallography only. Although a large number of X-ray structures are available today, crystallography still faces some serious challenges. This is mainly due to the success of purification and the stability of the purified protein. Needless to say, membrane proteins present additional obstacles as the purification is often inefficient and the presence of detergents may interfere with the crystallization process. However, major development has taken place with the introduction of automation and minia-turization [96]. The reduction of volumes to nanoliter scale has significantly reduced the material quantities required, and together with high throughput crystallization in 96 micro plate and higher format has permitted screening of numerous crystallization parameters and conditions in parallel [97]. In this context variables such as pH, ionic strength, temperature and concentration of salts and detergents can be screened and up to 100,000 crystallization trials conducted per day. The increasing number of parallel experiments also requires improved data collection and handling capacity. A drawback of the miniaturization process for the crystal screening might be an increase in production of smaller crystals, which can be addressed by improved micro-diffractometer technologies.

Nuclear magnetic resonance

Complementary to X-ray crystallography, NMR can serve structure determination [98]. Especially, cell-free translation systems for isotope labelling have been heavily tuned towards NMR applications. NMR has been routinely used for the identification and evaluation of chemical leads [99] and recent technology development for probes, software and NMR itself has made it possible to obtain high-resolution structures and subject NMR technology to larger-sized proteins. Improved technology has allowed to apply NMR also to iterative ligand-protein complexes [100]. Recent development of solid state and solution NMR technologies has further expanded the application range in structural biology especially for membrane proteins [101].

Electron microscopy

In addition to X-ray crystallography and NMR, electron microscopy (EM) and atomic force microscopy (AFM) can also be used to obtain information at atomic resolution levels for protein structures [102]. Cryoelectron microscopy has successfully been applied for reconstituted membrane proteins in 2D crystals. In this context, a 3.5 Å resolution was achieved for bacteriorhodopsin [103] and aquaporin AQP1 [104]. Despite this rather low resolution it was possible to define the atomic structure, which was subsequently confirmed by X-ray crystallography. Structural characterization by AFM of polypeptide loops on native and reconstituted membranes in aqueous solutions demonstrated that rhodopsin in the disc membranes of vertebrate photoreceptor rod outer segments occurred as dimers or higher oligomeric forms [105].

Structural genomics programs

Continuous technology development in the areas of molecular biology, protein expression and purification, structure determination requires increased expertise and resources in various areas. To facilitate substantial efforts in applying several expression systems in parallel for numerous targets it has been advantageous to form large national and international networks (Table 2). This development strongly encouraged structure biology in large scale and formed the basis for structural genomics. Many of the networks have selected their targets from a specific organism (whole genomes), topologically similar types of proteins or protein families. Understandably, quite a few of the networks have focused on the so-called low hanging fruits, which are the soluble proteins, relatively easy to express, purify and crystallize. In target selection a strong emphasis has been put to disease- and drug-related proteins. For instance, structural genomics on Mycobacterium tuberculosis[106] and Helicobacter pylori[107] aims at developing improved structure-based drugs against these microbes. Another approach has been to study genomes from thermophilic organisms such as Thermotoga maritima[108] and Thermus thermophilus[109], which present the advantage of possessing highly stable and temperature resistant proteins with good crystallization properties. Moreover, a structural genomics network has been initiated on Caenorabditis elegans, a worm that has served as a model organism in neurobiology and developmental biology [110]. The EU-funded SPINE (Structural Proteomics in Europe) consortium consists of 20 European partners and has set the goal to determine 500 structures of soluble proteins [111].

Table 2.  Overview of selected structural genomics networks applications
  1. Bop = bacteriopsin protein; IB = inclusion bodies; MP = membrane protein; PM = plasma membrane; S = soluble protein *Estimations of expression levels of GPCRs are presented as binding activity (Bmax) in pmol receptor per milligram protein or as receptor yields in milligrams per liter. Approximately, 10–20 pmol/mg is equivalent to 0.5–1 mg/l, 40–70 pmol/mg corresponds to 2–5 mg/l and 100–200 pmol/mg represents 7–15 mg/l.

Berkeley Structural Genomics Center (BSGC) http://www.strgen.orgstudies on Mycoplasma genitalium and Mycoplasma pneumoniae genomes; expression in E. coli
Center for Eukaryotic Structural Genomics (CESG) http://www.uwstructuralgenomics.orgstudies on Arabidopisis thaliana transcriptome; expression in E. coli
European Membrane Proteins (E-MeP) http://www.e-mep.orgstudies on 100 prokaryotic, 200 eukaryotic MPs; expression in E. coli, Lactococcus lactis, P. pastoris, Saccharomyces cerevisiae, baculovirus, SFV
Joint Center for Structural Genomics (JCSG) http://www.jcsg.orgstudies on Thermotoga maritima proteome; human GPCRs expression in E. coli, baculovirus, adenovirus, SFV
Membrane Protein Network (MePNet) http://www.mepnet.orgstudies on >100 GPCRs; expression in E. coli, P. pastoris and SFV/mammalian cells
Membrane Protein Platform (MPP) http://www.swegene.orgbacterial and yeast membrane proteins; human GPCRs; expression in E. coli, S. cerevisiae, P. pastoris; lipid cubic phase crystallography
Midwest Center for Structural Genomics (MCSG) http://www.mcsg.anl.govtargets from all three kingdoms of life; expression in E. coli
Northeast Structural Genomics Consortium (NESG) http://www.nigms.nih.gov/Initiatives/PSI/Centers/NECSG.htmsmall proteins: S. cerevisiae, C. elegans and D. melanogaster; expression in E. coli, yeast and insect cells
New York Structural Genomics Research Consortium (NYSGXRC) http://www.nysgrc.orgbacterial, yeast and C. elegans proteins; expression in E. coli and yeast
Paris-Sud Yeast Structural Genomics (YSG) http://www.genomics.eu.org250 non-membrane yeast proteins; expression in E. coli
Protein Structure Factory (PSF) http://www.proteinstrukturfabrik.demedically and biotechnologically valid proteins; expression in E. coli, S. cerevisiae and P. pastoris
Protein Wide Analysis of Membrane Proteins (ProAMP) http://www.pst-ag.comSalmonella typhimurium and Helicobacter pylori MPs; expression in E. coli
RIKEN Structural Genomics Initiative (RSGI) http://www.rsgi.riken.go.jp/rsgi_e/index.htmlmouse, Arabidopsis thaliana and Thermus thermophilus proteins; expression in E. coli and cell-free systems
Southeast Collaboratory for Structural Genomics (SECSG) http://www.secsg.orgPyrococcus furiosus, C. elegans and human proteins; expression in E. coli, baculovirus and lentivirus
Structural Proteomics in Europe (SPINE) http://www.spineurope.orgproteins and protein complexes with direct relevance to human health and diseases; expression in E. coli, baculovirus, transient mammalian cells
Structural Genomics Consortium (SGC) http://www.sgc.ox.ac.uktargets related to human health: diabetes, cancer, infectious diseases (malaria); expression in E. coli, baculovirus
Structure 2 Function Project (S2FP) http://www.s2f.carb.nist.govstructural genomics initiative on Haemophilus influenzae proteins; expression in E. coli
Swiss National Center of Competence in Research (NCCR) http://www.structuralbiology.ethz.chbacterial membrane proteins, transporters and GPCRs; expression in E. coli and baculovirus
TB Structural Genomics Consortium (TBSGC) http://www.mbi-doe.ucla.edu/TBstructural genomics initiative on Mycobacterium tuberculosis proteins; expression in E. coli

A number of networks also work on membrane proteins. American and Japanese networks have included membrane proteins, especially GPCRs, in their programs (Table 2). However, at least two networks are completely dedicated to structural genomics on membrane proteins. The EU funded network E-MeP studies 100 prokaryotic and 200 eukaryotic membrane proteins in a consortium consisting of 18 European research groups. Among the eukaryotic targets 100 are GPCRs and the rest non-GPCR proteins such as ion channels, transporters, efflux pumps and other integral membrane proteins. Within E-MeP, initial expression studies are carried out in E. coli, L. lactis, S. cerevisiae, P. pastoris, baculovirus-infected insect cells and SFV-infected mammalian cells. A limited number of targets are also expressed in a cell-free translation system. The privately funded Membrane Protein Network (MePNet) uniquely concentrates on GPCRs. Expression systems based on E. coli, P. pastoris and SFV have been utilized to over-express more than 100 GPCRs [60, 90, 112]. More than 60 GPCRs were expressed at structural biology compatible levels (1–10 mg/L) in one or several expression systems. Selected GPCRs refolded from E. coli inclusion bodies have been subjected to crystallization attempts. Likewise, GPCRs purified from membranes of yeast and mammalian cells have been introduced into screens to find optimal conditions for crystallization.

Conclusions and future prospects

The drop in success rate in drug development programs has generated plenty of concern worldwide. It is not only of economical interest but, also naturally of medicinal reasons that the pharmaceutical industry should be able to deliver more efficient and safer drugs. Structure-based drug discovery and design presents an interesting approach to optimize the drug efficacy and selective and thereby reduce serious side effects.This approach has already proven its feasibility as demonstrated for flu [17] and HIV [16] drugs. Structure determination has also become a routine tool in lead discovery and optimization [7]. Current initiatives in structural genomics will certainly have a major impact on the number new drug target structures that will become available for modelling and drug design purposes. The established networks in the field, which generally have a strong orientation towards technology development should also be able to improve the success rate for structure determination of membrane proteins. As membrane proteins represent more than 70% of current drug targets, it is very likely that structural biology therefore will strongly influence the future of drug discovery.