- • Background to drug discovery
- • Structure determination and drug discovery
- • Expression systems
- - Cell-free translation
- - Bacterial expression
- - Expression in E. coli
- - Other prokaryotic systems
- - Expression in yeast cells
- - Expression in insect cells
- - Baculovirus
- - Other insect vectors
- - Expression in mammalian cells
- - Transient and stable expression
- - Viral vectors
- • Protein purification
- • Structure determination
- - X-ray crystallography
- - Nuclear magnetic resonance
- - Electron microscopy
- - Structural genomics programs
- • Conclusions and future prospects
Structure determination has already proven useful for lead optimization and direct drug design. The number of high-resolution structures available in public databases today exceeds 30,000 and will definitely aid in structure-based drug design. Structural genomics approaches covering whole genomes, topologically similar proteins or gene families are great assets for further progress in the development of new drugs. However, membrane proteins representing 70% of current drug targets are poorly characterized structurally. The problems have been related to difficulties in obtaining large amount of recombinant membrane proteins as well as their purification and structure determination. Structural genomics has proven successful in developing new methods in areas from expression to structure determination by studying a large number of target proteins in parallel.
Background to drug discovery
Drug discovery has relied to a large extent on medicinal chemistry, which obviously has evolved significantly during the past two decades . Although screening of large numbers of compounds and gigantic combinatorial libraries have generated a number of efficient drugs, increasing interest has been dedicated to structure-based drug design. The advantage of this approach is that drug molecules can be ‘tailor-made’ to interact with the drug target. This property should improve the drug efficacy and also specificity. Improved efficacy might allow the administration of lower drug doses and/or increasing the distribution frequency. Drugs presenting enhanced target specificity should limit the interaction with other non-specific target molecules and hence reduce the side effects of the drug substantially. Furthermore, the drug development process might be shortened.
To apply structure-based drug design, it is necessary to have access to structural information of the drug target. More than 30,000 high-resolution structures have been deposited in public databases, the majority of these however are on soluble proteins. In contrast, some 70% of current drugs are targeted against membrane proteins, for which only more than 100 structures are available . This discrepancy relates mainly to the topological composition of membrane proteins. A large number of them such as G protein-coupled receptors (GPCRs) and ion channels have a topology, including multispanning transmembrane domains . For this reason, it has been much more difficult to recombinantly express membrane proteins in quantitatively and qualitatively sufficient amounts compared with those of their soluble counterparts. Moreover, as the membrane proteins are embedded in membranes detergents are required for purification, which affects both yields and stability of the protein . Furthermore, structure determination by X-ray crystallography has been negatively affected by low yields and presence of detergents . Membrane proteins also possess characteristically flexible regions and short hydrophilic loops, which reduces significantly the potential crystal contacts and thereby crystallization success. The application of nuclear magnetic resonance (NMR) for structure determination is also more complicated for membrane proteins. Typically, only a single high-resolution structure has been solved for the family of GPCRs consisting of some 800 members . Moreover, the structure for bovine rhodopsin was obtained on material isolated from native tissue, where the receptor occurs at high abundance.
The structural genomics approach allows the study of a large number of gene products in parallel. By targeting not only whole genomes but also specifically certain types or families of proteins, the understanding of the requirements for obtaining high levels of expression and the conditions for purification and structure determination can significantly improve the success rate for membrane proteins.
In this review, although a general overview on structural genomics is presented, the focus is on membrane proteins. The status of applying structure-based drug discovery is briefly described. Much attention is given to the methodological development for structural biology. An overview of currently applied expression systems is given. Also various purification and structure determination procedures are described. Finally, examples of various structural genomics consortia are discussed.
Structure determination and drug discovery
Structural biology has facilitated drug discovery already for some time. The application of structure-based drug design has become relatively common in lead optimization . Structural knowledge of proteins and their ligands has aided in improving drug potency and selectivity . This approach has resulted in faster definition of drug-binding properties and has made it easier to identify ‘hit’ compounds through screening programs . Both the use of X-ray crystallography and NMR have allowed high throughput approaches for structure-based lead discovery . Rapid structure resolution of protein-ligand complexes has been obtained applying such automated procedures as AutoSolve®. The technology revealed that from cocktails of 100 molecules, electron density variations could be used to distinguish their different shapes . Applying cocktails of smaller fragments at very high concentrations candidate fragment ranking can be performed automatically and up to 1000 compounds can be screened within 2–3 days . In this context, fragments of successful drug-like molecules can be characterized in high throughput format taking into account molecular weight, the presence of hydrogen bond donors and acceptors and solubility. The technology has been expanded to virtual screening approaches by systematically docking large libraries of candidate fragments in pre-defined binding sites of target proteins applying three-dimensional (3D) computer models . Furthermore, in structure-based drug design an NMR-based screening approach was taken to design small molecule drug candidates to inhibit the aberrant over-expression of c-myc in different tumours by targeting the far upstream element (FUSE) binding protein (FBP) .
There are a number of drugs for which direct applications of structure-based design has been achieved. So far, more than 40 compounds based on structural data have entered clinical trials and at least 7 drugs have reached the market . In this context, the high-resolution structure of the human immunodeficiency virus (HIV) proteinase was the basis for the AIDS drugs Agenerase® and Viracept®. Another structure-based drug is the flu drug Relenza®, which is based on the structure of the influenza virus neuraminidase . Furthermore, the design of the protein kinase drug Gleevec® profited from the 3D structure of the kinase domain of c-Abl especially in addressing resistance issues . New structure-based drugs will certainly appear as novel high-resolution structures of relevant drug targets will become available. For instance, the crystal structure of AKT kinase has triggered structure-based drug discovery on kinase inhibitors .
Perhaps the most peculiar example of structure-based drug discovery comes from the program on novel phophodiesterase (PDE) inhibitors to treat hypertension and other cardiovascular indications . According to the structural information, novel PDE inhibitors were designed of which the compound UK 92480 (sildenafil) showed a 100-fold increase in PDE5 inhibition compared with zaprinast . Despite the expected pharmacological profile of sildenafil its clinical performance in treating coronary heart disease was disappointing. However, as sildenafil inhibited efficiently PDE5 and potentiated nitrose oxide activity, the drug could be used for the treatment of erectile dysfunction. Although originally aimed for another indication structure-based drug design assisted in generating the globally well-known blockbuster drug Viagra®.
As the minority of proteins is available in high abundance allowing direct isolation and purification from native tissues and furthermore ethical issues to large extent prevent any such action, structural biology strongly relies on heterologous expression systems for the production of recombinant proteins. During the years since the advent of genetic engineering technologies, a number of various expression systems have been evaluated. Among these are heterologous expression in bacterial, yeast, insect and mammalian cells applying different types of expression vectors. More recently, cell-free translation systems have become available for general use. Today, the two most frequently used systems are based on expression in Escherichia coli and Baculovirus-infected insect cells. Generally, either of these two systems is applicable for expression of almost any soluble protein in quantities acceptable for structural studies. However, concerning membrane proteins the situation is quite different.The yields are substantially lower and the stability of the produced proteins low, which has forced major development activities for expression vectors and systems (Table 1). The main expression systems are briefly described later. Expression levels are commonly described for GPCRs based on binding activity (Bmax pmol receptor per milligram protein) or receptor yields (milligrams per liter). For an easier comparison, a Bmax value of 10 pmol/mg corresponds to 0.5 mg/l, 50 pmol/mg to 2 mg/l and 150 pmol/mg to 10 mg/l.
|Cell-free systems||S, MP, GPCR||relatively high||parallel expression, also PCR products|
|E. coli extracts||mg/ml (S)||isotope labelling for NMR analysis|
|Wheat germ extracts||scale-up costly, fusion proteins helpful|
|E. coli||S, MP, GPCR||relatively high in PM high in IB||structural biology: solubilization (PM) refolding from IBs required|
|H. salinarum||MP, GPCR||high for Bop||fusion proteins often helpful|
|B. subtilis||secreted||high||high protease activity|
|L. lactis||transporters||relatively high||good for mitochondrial transporters|
|Yeast||S, MP, GPCR||relatively high||structural biology|
|S. cerevisiae||S, MP, GPCR||1–10 pmol/mg||structure of SERCA 1a|
|S. pombe||S, MP, GPCR||1–20 pmol/mg||improved expression from S. cerevisiae|
|Pichia pastoris||S, MP, GPCR||>100 pmol/mg; 5 mg/l||structure SoP1P2, Shaker K+ channel|
|Insect cells||S, MP, GPCR||high-very high||structure biology, large-scale|
|Baculovirus||200 pmol/mg; 5 mg/l||structures on soluble proteins|
|Drosophila||10 pmol/mg||stable expression|
|Mammalian cells||S, MP, GPCR|
|Transient||medium 20 pmol/mg; <1 mg/l||scale-up more expensive, complicated|
|Stable||low to high||varies, inducible systems improvement|
|200 pmol/mg||codon-optimized, mutant HEK293 cells|
|Viral||S, MP, GPCR|
|Adenovirus||medium, 10 pmol/mg||broad host range|
|SFV||287 pmol/mg; 10 mg/l||easy scale-up|
|Vaccinia virus||high-very high||T7 based vectors|
The recent development of cell-free translation systems has made them attractive and competitive . Currently, commercial systems based on E. coli and wheat germ extracts exist. Cell-free translation systems are applicable to rapid high throughput expression evaluation especially since the systems can be directly used for PCR fragments, which omits cloning procedures. The advantages of cell-free translation is the controlled expression in defined minimal medium and simple amino-acid selective and uniform stable isotope labelling for direct sample analysis by NMR .
Cell-free translation has supported structural biology well for soluble proteins. Recently, a modified E. coli S30 extract has improved the yields for membrane proteins and the bacterial multi-drug transporters TehA and YfiK were expressed at levels of 2.7 mg/ml . Expression of three GPCRs (human β2 adrenergic, human muscarinic acetylcholine M2 and rat neurotensin receptor) resulted in functional binding activity albeit only after applying fusion protein partners . Recently, the first demonstration of cell-free translation–based high-resolution structure determination came from solving the X-ray structure of the EmrE multidrug transporter complex .
Expression in E. coli
The by far most commonly used expression system is based on E. coli. The fast and simple expression profile, cheap and safe large-scale production and versatility for different types of proteins are factors that have made E. coli the system of choice. However, although a number of high-resolution structures have been solved for soluble proteins recombinantly expressed in bacteria, membrane proteins have been less successful. Because of their transmembrane topology, membrane proteins have generally proven highly toxic to the bacterial host cells and therefore restricted their growth and thereby reducing the recombinant protein yields. Despite these shortcomings, a number of bacterial membrane proteins have successfully been expressed in E. coli and subjected to purification and structural analysis . Successful expression of eukaryotic membrane proteins has required modifications (deletions, mutagenesis) of the gene sequence as well as the application of appropriate fusion partners and purification tags. In this context, the fusion of the maltose-binding protein (MBP) to the N-terminal of the rat neurotensin receptor generated milligram quantities of purified functionally active receptor . In a similar way, the adenosine A2a receptor with a truncated C-terminal resulted in 10–20 nM/l of receptor when expressed as a fusion protein with MBP . Over-expression of GPCRs in bacterial membranes has recently been reviewed .
A completely different approach has been to over-express recombinant proteins in bacterial inclusion bodies as aggregates. The yields have been substantially higher by this procedure, but re-folding is necessary to obtain functionally active recombinant protein. Unfortunately, the refolding process has been difficult and inefficient . However, recent technology development has allowed refolding of glucagon-like peptide 1 (GLP-1) , leukotriene B4  and serotonin 5-HT4  receptors.
Other prokaryotic systems
In addition to E. coli, other bacterial expression systems have been evaluated for production of recombinant proteins. For instance, the Gram-positive bacterium Lactococcus lactis has been verified for the expression of both prokaryotic and eukaryotic proteins . Specific vectors using the nisin NisA promoter and NisR and NisK regulatory sequences as well as transformation methods have been established for L. lactis. A number of membrane proteins have been verified for expression. The prokaryotic ABC transporters and Major Facilitator Subfamily (MFS) efflux pumps  as well as yeast mitochondrial carrier proteins CTP1 and AAC3 were well expressed . However, expression of the human KDEL receptor resulted in yields less than 0.1% of the total membrane protein . In a recent study, 11 yeast mitochondrial transporter proteins were expressed at structural biology compatible levels in L. lactis. Expression levels could be enhanced 10-fold by replacing the N-terminus of the transporter with a signal sequence from L. lactis. Bacillus subtilis, another Gram-positive organism, was already in the 1980s considered as a good host for recombinant protein expression generating large quantities of recombinant interferon . However, serious high endogenous protease activity and mainly restriction to secreted proteins has limited the use of the system, although recently recombinant human cystatins have been efficiently produced in B. subtilis .
Halinobacterium salinarum became an interesting alternative as a prokaryotic expression host because of the accumulation of bacterio-opsin protein (Bop), which with the chromophore retinol gives the bacteria their characteristic purple colour . Various membrane proteins such as the E. coli aspartate transcarbamylase (AT), the yeast α mating factor receptor and two human GPCRs (muscarinic M1 and serotonin 5-HT2 receptors) have been expressed from H. salinarum vectors . It was essential to use Bop as fusion partner. The Bop-AT yielded 7 mg/l protein, whereas the GPCRs were expressed at extremely low levels. A fusion construct of Bop and the human adrenergic α2B receptor resulted in functional binding activity, although with 10 times lower binding values than obtained in yeast or mammalian cells .
Expression in yeast cells
Yeast expression systems can be characterized by their easy use and large-scale production and their eukaryotic post-translation machinery. Various types of recombinant proteins have successfully been expressed in yeast cells . Baker's yeast Saccharomyces cerevisiae has been used for heterologous expression of hepatitis B surface antigen (HBsAg) , α1-Antitrypsin , human –interferon  and β-endorphin . S. cerevisiae has also been frequently used for expression of membrane proteins. For instance, the yeast α-factor Ste2p receptor  and human dopamine D1A receptor  generated high levels of expression. Moreover, large-scale production of the human β2 adrenergic receptor generated yields of 20–30 mg of functional receptor . Recently, the rabbit sarcoplasmic-endoplasmic reticulum Ca2+-ATPase isoform 1a (SERCA 1a) was purified from yeast cells by metal affinity chromatography and HPLC filtration and the structure solved at 3.3 Å resolution . In addition to S. cerevisiae, the fission yeast Schizosaccharomyces pombe has also been applied to hetereologous gene expression . Comparison of GPCR expression demonstrated that the human dopamine D2 receptor was expressed at fivefold higher levels in S. pombe than in S. cerevisiae. In contrast, the rat dopamine D2 and human NK1 receptors were expressed at lower levels in S. pombe.
Currently, the most frequently used yeast strain for recombinant protein expression is Pichia pastoris. This methylotrophic yeast expression system is based on chromosomal integration of the gene of interest and utilization of strong inducible promoters such as the alcohol oxidase (AOX) promoter . One of the advantages of P. pastoris is the high biomasses obtained in large-scale cultures allowing production of g/L quantities of recombinant proteins . Today, more than 200 recombinant proteins have been expressed in P. pastoris. A number of membrane proteins have been expressed in P. pastoris, particularly GPCRs [58, 59]. In a structural genomics-type study, 100 GPCRs were expressed in P. pastoris, for which a large number structural biology compatible levels (1–10 mg/L) were obtained . The highest levels of expression of 180 pmol/mg receptor were measured for the human adenosine A2a receptor. Applying P. pastoris vectors, expression and purification allowed to obtain a high-resolution structure of the spinach aquaporin SOPIP2 channel . Likewise, single particle imaging and cryo-EM two-dimensional (2D) crystals could be obtained for the rat neuronal voltage-sensitive K+ channel over-expressed in P. pastoris. Finally, the mammalian Shaker voltage-dependent K+ channel was expressed in P. pastoris, which allowed efficient purification and high-resolution structure determination .
Expression in insect cells
The second most commonly used expression system after E. coli is based on Baculovirus vectors for infection of insect cell lines . High expression levels of topologically different recombinant proteins have been obtained in different insect cell lines from Spodoptera frugiperda (Sf9 and Sf21), Mamestra brassica and Trichoplusia ni (high five). Eukaryotic and especially mammalian membrane proteins have been relatively favourably expressed in insect cells because of the similar post-translational processing mechanisms present in insect cells. In this context, expression of rhodopsin resulted in 80% functional receptor and yields up to 6 mg/l . GPCRs have been popular targets for baculovirus expression, and in many cases expression levels of 40–60 pmol/mg have been achieved . Expression levels between 1 pmol/mg and 250 pmol/mg were obtained in a study of 16 GPCRs in three insect cell lines .
Other insect vectors
In addition to Baculovirus-based systems, expression of recombinant proteins has been conducted mainly in stable insect cell lines, typically Drosophila Schneider cells . A soluble deglycosylated form of the human interleukin 5 (IL-5) alpha subunit was expressed in Drosophila cells in active form, purified and a 2.6 Å resolution crystal structure was solved . Schneider cells have also been used as hosts for GPCR expression. For instance, the human mu opioid receptor (hMOR) showed a similar pharmacological profile as in mammalian cells and the functional coupling to G proteins was demonstrated by cAMP stimulation and GTPγS binding assays . Engineering of an N-terminal EGFP tag to the hMOR for localization studies suggested that a large number of receptors were retained in intracellular compartments and not present on the plasma membrane.
Expression in mammalian cells
Transient and stable expression
Mammalian cell lines provide the most native environment for expression of recombinant mammalian proteins. However, immortalized cell lines are severely compromised and have significantly different features to primary cells. Furthermore, specific proteins may require accessory proteins for transport, folding and proper function. Recently, it was demonstrated that certain transmembrane proteins (RTP1, RTP2 and REEP1) , and the guanine nucleotide exchange factor Ric-8B  can facilitate the transport and enhance the expression of olfactory receptors. Transient expression has been conducted in various cell lines (BHK-21, CHO-K1, COS-7 and HEK293) . Alternatively, stable cell lines have been generated . Generally, the expression levels have been higher in transient expression, and recently the codon-optimized hamster β2 adrenergic receptor (β2-AR) was expressed in COS-1 cells at 18 pmol/mg . Typically, GPCRs are expressed in the pmol range in stable cell lines . However, the development of a tetracycline-inducible system resulted in up to 6 mg/l of rhodopsin production . When a stable inducible HEK293-β2-AR cell line was established, 220 pmol/mg receptor was obtained . This expression level resulted in up to 50 μg of β2-AR per 15-cm-cell culture plate.
Because of their broad host range and generally high expression levels, viral vectors have presented attractive alternatives for heterologous gene expression. However, the high transduction rate has also raised some concerns related to biosafety, which has required engineering of mutant replication-deficient vectors. A number of viral vectors such as aden-oviruses, alphaviruses, lentiviruses and vaccinia viruses have been applied for recombinant protein expression . In this context, adenovirus-based expression of the non-structural NS1 glycoprotein of tick borne encephalitis virus (TBEV) resulted in yields up to 25% of total protein . A number of GPCRs have also been expressed from adenovirus vectors resulting in relatively high expression levels . Owing to the broad host range of adenoviruses, expression studies could be conducted in various cell lines, and for instance the β2-AR was expressed in rabbit myocytes .Vaccinia virus vectors have also been frequently used for heterologous gene expression especially applying replication-deficient vectors . More than 100 proteins have been expressed from vaccinia virus vectors . Among GPCRs neu-ropeptide Y  and dopamine D2 and D4 receptors  have been expressed at densities of 5–10 million receptors per cell. Lentivirus vectors, characterized for their long-term expression pattern, have found more applications for recombinant protein expression with the commercialization of complete lentivirus expression systems . Concerning membrane proteins, the human retinal pigment epithelium (RPE) retinal GPCR was expressed in COS-7 cells and in the retinal pigment epithelial cell line ARPE-19 . Interestingly, the expression levels were 100 times higher in the ARPE-19 cells and long-term expression was detected up to 6 months.
The viral system probably most frequently used for recombinant protein expression and particularly membrane proteins is based on the Semliki Forest virus (SFV), a single-stranded enveloped RNA virus . The easy and rapid production of recombinant SFV stocks has made it feasible to express a large number of proteins in parallel in various mammalian cell lines . In this context, 103 GPCRs were evaluated for expression levels in three mammalian cell lines in a structural genomics program described in more detail below . As large-scale production in mammalian suspension cultures has been established for SFV, ligand-gated ion channels  and GPCRs  have been expressed at levels of 5–10 mg/L, purified and subjected to structural biology.
Structural characterization of protein generally requires access to highly homogenous and pure protein preparations although the solution NMR approaches has allowed analysis of labelled samples in the presence of a relatively high background on non-labelled proteins. Genetic engineering has strongly facilitated the purification procedure as various affinity tags have been introduced at the N- or C-terminal or even within the coding sequence of the gene of interest. The most commonly used tag is multi-histidine (either hexa or deca histidine), which binds to Ni2+ and therefore allows purification based on immobilized metal affinity chromatography (IMAC) . Other common purification tags are streptavidin (Strep), biotin, FLAG and hemagglutinin tags. Obviously, when available, antigen-based affinity chromatography can be applied. Other means for purification include ammonium sulphate precipitation and sucrose gradients, although these methods require large quantities of material and might therefore not be suitable for recombinant proteins expressed at low levels or membrane proteins with low recovery yields. In addition, gel filtration, size exclusion chromatography, hydrophobic interaction and reverse-flow chromatography are methods to be considered for protein purification.
Naturally, membrane proteins require special conditions for purification. Because of their transmembrane topology, separation of proteins and lipids is necessary by the addition of detergents . Solubilization by detergents is a complicated process, and a vast number of detergents have been tested. In general, detergents are highly target-specific, which means that each target has to be screened for appropriate detergents . Commonly used detergents are CHAPS (3-[(3-cholamidopropyl) dimethylaminio]-1-propane-sulfonate, Triton X-100, n-Octylglucoside, n-Nonylglucoside, n-Dodecylmaltoside and FOS-Choline and cocktails thereof .
The highest structure resolution, below 2 Å, can be achieved by X-ray crystallography only. Although a large number of X-ray structures are available today, crystallography still faces some serious challenges. This is mainly due to the success of purification and the stability of the purified protein. Needless to say, membrane proteins present additional obstacles as the purification is often inefficient and the presence of detergents may interfere with the crystallization process. However, major development has taken place with the introduction of automation and minia-turization . The reduction of volumes to nanoliter scale has significantly reduced the material quantities required, and together with high throughput crystallization in 96 micro plate and higher format has permitted screening of numerous crystallization parameters and conditions in parallel . In this context variables such as pH, ionic strength, temperature and concentration of salts and detergents can be screened and up to 100,000 crystallization trials conducted per day. The increasing number of parallel experiments also requires improved data collection and handling capacity. A drawback of the miniaturization process for the crystal screening might be an increase in production of smaller crystals, which can be addressed by improved micro-diffractometer technologies.
Nuclear magnetic resonance
Complementary to X-ray crystallography, NMR can serve structure determination . Especially, cell-free translation systems for isotope labelling have been heavily tuned towards NMR applications. NMR has been routinely used for the identification and evaluation of chemical leads  and recent technology development for probes, software and NMR itself has made it possible to obtain high-resolution structures and subject NMR technology to larger-sized proteins. Improved technology has allowed to apply NMR also to iterative ligand-protein complexes . Recent development of solid state and solution NMR technologies has further expanded the application range in structural biology especially for membrane proteins .
In addition to X-ray crystallography and NMR, electron microscopy (EM) and atomic force microscopy (AFM) can also be used to obtain information at atomic resolution levels for protein structures . Cryoelectron microscopy has successfully been applied for reconstituted membrane proteins in 2D crystals. In this context, a 3.5 Å resolution was achieved for bacteriorhodopsin  and aquaporin AQP1 . Despite this rather low resolution it was possible to define the atomic structure, which was subsequently confirmed by X-ray crystallography. Structural characterization by AFM of polypeptide loops on native and reconstituted membranes in aqueous solutions demonstrated that rhodopsin in the disc membranes of vertebrate photoreceptor rod outer segments occurred as dimers or higher oligomeric forms .
Structural genomics programs
Continuous technology development in the areas of molecular biology, protein expression and purification, structure determination requires increased expertise and resources in various areas. To facilitate substantial efforts in applying several expression systems in parallel for numerous targets it has been advantageous to form large national and international networks (Table 2). This development strongly encouraged structure biology in large scale and formed the basis for structural genomics. Many of the networks have selected their targets from a specific organism (whole genomes), topologically similar types of proteins or protein families. Understandably, quite a few of the networks have focused on the so-called low hanging fruits, which are the soluble proteins, relatively easy to express, purify and crystallize. In target selection a strong emphasis has been put to disease- and drug-related proteins. For instance, structural genomics on Mycobacterium tuberculosis and Helicobacter pylori aims at developing improved structure-based drugs against these microbes. Another approach has been to study genomes from thermophilic organisms such as Thermotoga maritima and Thermus thermophilus, which present the advantage of possessing highly stable and temperature resistant proteins with good crystallization properties. Moreover, a structural genomics network has been initiated on Caenorabditis elegans, a worm that has served as a model organism in neurobiology and developmental biology . The EU-funded SPINE (Structural Proteomics in Europe) consortium consists of 20 European partners and has set the goal to determine 500 structures of soluble proteins .
|Berkeley Structural Genomics Center (BSGC) http://www.strgen.org||studies on Mycoplasma genitalium and Mycoplasma pneumoniae genomes; expression in E. coli|
|Center for Eukaryotic Structural Genomics (CESG) http://www.uwstructuralgenomics.org||studies on Arabidopisis thaliana transcriptome; expression in E. coli|
|European Membrane Proteins (E-MeP) http://www.e-mep.org||studies on 100 prokaryotic, 200 eukaryotic MPs; expression in E. coli, Lactococcus lactis, P. pastoris, Saccharomyces cerevisiae, baculovirus, SFV|
|Joint Center for Structural Genomics (JCSG) http://www.jcsg.org||studies on Thermotoga maritima proteome; human GPCRs expression in E. coli, baculovirus, adenovirus, SFV|
|Membrane Protein Network (MePNet) http://www.mepnet.org||studies on >100 GPCRs; expression in E. coli, P. pastoris and SFV/mammalian cells|
|Membrane Protein Platform (MPP) http://www.swegene.org||bacterial and yeast membrane proteins; human GPCRs; expression in E. coli, S. cerevisiae, P. pastoris; lipid cubic phase crystallography|
|Midwest Center for Structural Genomics (MCSG) http://www.mcsg.anl.gov||targets from all three kingdoms of life; expression in E. coli|
|Northeast Structural Genomics Consortium (NESG) http://www.nigms.nih.gov/Initiatives/PSI/Centers/NECSG.htm||small proteins: S. cerevisiae, C. elegans and D. melanogaster; expression in E. coli, yeast and insect cells|
|New York Structural Genomics Research Consortium (NYSGXRC) http://www.nysgrc.org||bacterial, yeast and C. elegans proteins; expression in E. coli and yeast|
|Paris-Sud Yeast Structural Genomics (YSG) http://www.genomics.eu.org||250 non-membrane yeast proteins; expression in E. coli|
|Protein Structure Factory (PSF) http://www.proteinstrukturfabrik.de||medically and biotechnologically valid proteins; expression in E. coli, S. cerevisiae and P. pastoris|
|Protein Wide Analysis of Membrane Proteins (ProAMP) http://www.pst-ag.com||Salmonella typhimurium and Helicobacter pylori MPs; expression in E. coli|
|RIKEN Structural Genomics Initiative (RSGI) http://www.rsgi.riken.go.jp/rsgi_e/index.html||mouse, Arabidopsis thaliana and Thermus thermophilus proteins; expression in E. coli and cell-free systems|
|Southeast Collaboratory for Structural Genomics (SECSG) http://www.secsg.org||Pyrococcus furiosus, C. elegans and human proteins; expression in E. coli, baculovirus and lentivirus|
|Structural Proteomics in Europe (SPINE) http://www.spineurope.org||proteins and protein complexes with direct relevance to human health and diseases; expression in E. coli, baculovirus, transient mammalian cells|
|Structural Genomics Consortium (SGC) http://www.sgc.ox.ac.uk||targets related to human health: diabetes, cancer, infectious diseases (malaria); expression in E. coli, baculovirus|
|Structure 2 Function Project (S2FP) http://www.s2f.carb.nist.gov||structural genomics initiative on Haemophilus influenzae proteins; expression in E. coli|
|Swiss National Center of Competence in Research (NCCR) http://www.structuralbiology.ethz.ch||bacterial membrane proteins, transporters and GPCRs; expression in E. coli and baculovirus|
|TB Structural Genomics Consortium (TBSGC) http://www.mbi-doe.ucla.edu/TB||structural genomics initiative on Mycobacterium tuberculosis proteins; expression in E. coli|
A number of networks also work on membrane proteins. American and Japanese networks have included membrane proteins, especially GPCRs, in their programs (Table 2). However, at least two networks are completely dedicated to structural genomics on membrane proteins. The EU funded network E-MeP studies 100 prokaryotic and 200 eukaryotic membrane proteins in a consortium consisting of 18 European research groups. Among the eukaryotic targets 100 are GPCRs and the rest non-GPCR proteins such as ion channels, transporters, efflux pumps and other integral membrane proteins. Within E-MeP, initial expression studies are carried out in E. coli, L. lactis, S. cerevisiae, P. pastoris, baculovirus-infected insect cells and SFV-infected mammalian cells. A limited number of targets are also expressed in a cell-free translation system. The privately funded Membrane Protein Network (MePNet) uniquely concentrates on GPCRs. Expression systems based on E. coli, P. pastoris and SFV have been utilized to over-express more than 100 GPCRs [60, 90, 112]. More than 60 GPCRs were expressed at structural biology compatible levels (1–10 mg/L) in one or several expression systems. Selected GPCRs refolded from E. coli inclusion bodies have been subjected to crystallization attempts. Likewise, GPCRs purified from membranes of yeast and mammalian cells have been introduced into screens to find optimal conditions for crystallization.
Conclusions and future prospects
The drop in success rate in drug development programs has generated plenty of concern worldwide. It is not only of economical interest but, also naturally of medicinal reasons that the pharmaceutical industry should be able to deliver more efficient and safer drugs. Structure-based drug discovery and design presents an interesting approach to optimize the drug efficacy and selective and thereby reduce serious side effects.This approach has already proven its feasibility as demonstrated for flu  and HIV  drugs. Structure determination has also become a routine tool in lead discovery and optimization . Current initiatives in structural genomics will certainly have a major impact on the number new drug target structures that will become available for modelling and drug design purposes. The established networks in the field, which generally have a strong orientation towards technology development should also be able to improve the success rate for structure determination of membrane proteins. As membrane proteins represent more than 70% of current drug targets, it is very likely that structural biology therefore will strongly influence the future of drug discovery.