Structural biology of hepatitis C virus



Hepatitis C virus (HCV) causes acute and chronic liver disease in humans, including chronic hepatitis, cirrhosis, and hepatocellular carcinoma. Studies of this virus have been hampered by the lack of a productive cell culture system; most information thus has been obtained from analysis of the HCV genome, heterologous expression systems, in vitro and in vivo models, and structural analyses. Structural analyses of HCV components provide an essential framework for understanding of the molecular mechanisms of HCV polyprotein processing, RNA replication, and virion assembly and may contribute to a better understanding of the pathogenesis of hepatitis C. Moreover, these analyses should allow the identification of novel targets for antiviral intervention and development of new strategies to prevent and combat viral hepatitis. This article reviews the current knowledge of HCV structural biology. (HEPATOLOGY 2004;39:5–19.)

Hepatitis C virus (HCV) is a small, enveloped RNA virus belonging to the Flaviviridae family, genus Hepacivirus, that causes acute and chronic liver disease in humans, including chronic hepatitis, cirrhosis, and hepatocellular carcinoma. The HCV genome is a single-stranded RNA molecule of positive polarity. It contains a single open reading frame (ORF) encoding a polyprotein of about 3,000 amino acids (Fig. 1). The ORF is flanked by 5′ and 3′ untranslated regions (UTR) of 341 and approximately 230 nucleotides in length, respectively. Both 5′ and 3′ UTR bear highly conserved RNA structures essential for polyprotein translation and genome replication. The 5′ UTR contains an internal ribosome entry site (IRES) that binds the 40S ribosomal subunit and initiates polyprotein translation in a cap-independent manner. The polyprotein precursor is cotranslationally and posttranslationally processed by both cellular and viral proteases at the level of the endoplasmic reticulum (ER) membrane to yield 10 mature proteins.1 The structural proteins include the core (C), which forms the viral nucleocapsid, and the envelope glycoproteins E1 and E2. They are released by host-cell signal peptidases. The structural proteins are separated from the nonstructural proteins by the short membrane peptide p7, thought to be a viroporin. The nonstructural (NS) proteins NS2 to NS5B are involved in polyprotein processing and viral replication. The proteolytic processing of NS polyprotein part is complex and requires two distinct proteinases: the NS2-NS3 zinc-dependent metalloproteinase, and the NS3 serine proteinase located in the N-terminal region of NS3. The NS2-NS3 proteinase appears to be dedicated solely to cleavage at the NS2/NS3 site that occurs rapidly and by a conformation-dependent, autocatalytic mechanism.2 The remaining NS proteins are released by the NS3 proteinase associated with its cofactor, NS4A. The C-terminal region of NS3 protein includes RNA helicase and NTPase activities. NS4B is an integral membrane protein of unknown function. NS5A is a polyphosphorylated protein of unknown function, and NS5B is the RNA-dependent RNA polymerase (RdRp). The existence of one or more previously unknown HCV proteins potentially synthesized by ribosomal frameshifts has been suggested recently.3–6

Figure 1.

HCV genome organization (top) and polyprotein processing (bottom). The 5′ untranslated region (UTR) consists of four highly structured domains and contains the internal ribosome entry site (IRES). The 3′ UTR consists of stable stem-loop structures and an internal poly (U)/polypyrimidine tract. The central 9.6-kb ORF codes for a polyprotein of slightly more than 3000 aa depending on the HCV genotype. S and NS correspond to regions coding for structural and nonstructural proteins, respectively. The polyprotein processing and the location of the 10 HCV proteins relative to the ER membrane are schematically represented. Scissors indicate ER signal peptidase cleavage sites; cyclic arrow, autocatalytic cleavage of the NS2-NS3 junction; black arrows, NS3/NS4A proteinase complex cleavage sites; intramembrane arrow, cleavage by the signal peptide peptidase. The transmembrane domains of E1 and E2 are shown after signal-peptidase cleavage and reorientation of the respective C-terminus hydrophobic stretches (dotted rectangles). Green spots denote glycosylation sites of the E1 and E2 envelope proteins.


HCV, hepatitis C virus; ORF, open reading frame; UTR, untranslated region; IRES, internal ribosome entry site; ER, endoplasmic reticulum; NS, nonstructural; 3D, three-dimensional; cEM, cryo-electron microscopy; NMR, nuclear magnetic resonance; RdRp, RNA-dependent RNA polymerase; PDB, Protein Data Bank; HVR1, hypervariable region 1; TBE, tick-borne encephalitis virus; eIF3, eukaryotic translation initiation factor 3; ARFP, alternative reading frame protein; TM, transmembrane; TMD, transmembrane domain.

Protein Folding and 3D Structure Determination

Protein folding and stabilization are ensured by specific spatial interactions between constituent atoms of amino acids, including electrostatic interactions, van der Waals forces, hydrogen bonds, and hydrophobic interactions. These result in the formation of local secondary structures, including α-helices and β-sheets connected by turns and loops, that are closely packed together to yield a highly compact and roughly spherical structure. The protein core is not accessible to solvent molecules and contains mainly nonpolar amino acids, whereas soluble proteins bear polar and charged amino acid side chains at their surface.

Atomic-level information on protein three-dimensional (3D) structure throws light on the overall shape and topology, together with the location and conformation of active sites. This is essential to understanding the mechanisms underlying viral protein function and to identifying molecular targets for specific inhibitors (i.e., drug design). However, the properties of viral proteins can be modulated by interactions with host factors and other viral components. Three main approaches are used to determine the 3D structure of individual proteins, genomic regions, and their complexes, including large assemblies such as intact virions, namely cryo-electron microscopy (cEM), nuclear magnetic resonance (NMR), and X-ray crystallography.

Cryo-electron microscopy, in conjunction with powerful image-processing methods, yields medium-resolution structures of large assemblies that cannot be crystallized. It visualizes hydrated samples frozen in their native environment. Several projections of a given macromolecular assembly are obtained on each micrograph, and the 3D structure can be deduced by combining them. Recent cEM studies of viral particles have yielded images with a resolution of approximately 6 Å, in which the individual α-helices could be assigned.7 When studying membrane proteins, it is sometimes possible to obtain two-dimensional crystals of a given macromolecule, from which the structure can be studied by electron diffraction up to a resolution approaching 3 Å.8 Although cEM requires highly specialized facilities, its use is likely to expand significantly in coming years, with improvements in resolution and in the number of structures that can be deciphered.

NMR can be used to study the atomic structure of relatively small molecules (less than 25–30 kDa) and also dispenses with the need for crystallization.9 The protein or nucleic acid solution is placed in a strong permanent magnetic field, and the magnetic properties of 1H, 13C, and 15N isotopes are measured. The signals (resonances) emitted by individual atoms and interactions between pairs of atoms are captured in multidimensional NMR spectra and are ascribed to each atom in each amino acid composing the protein. Distances between hydrogen atoms (<5 Å) can be deduced from magnetic interactions between pairs of atoms that produce crosspeaks at the resonances of the two atoms (nuclear Overhauser effect). Large collections of known interatom distances are used to impose structural constraints when modeling 3D protein structure. Liquid and solid-state NMR are particularly useful for structural analysis of membrane protein domains10 that are very difficult to crystallize.

X-ray crystallography can identify very detailed atomic structures, sometimes at very high resolution (often better than 1.6 Å), provided that high-quality crystals of the macromolecule or macromolecular assembly can be obtained. A macromolecule crystal consists of an ordered 3D array of approximately 105 rows of identical molecules in each dimension. The X-rays are scattered by electrons in the crystal, producing a characteristic diffraction pattern. The 3D structure of the crystallized object is deduced by indirect calculation of the phase of each reflection, followed by computing Fourier transformation of the pattern, yielding the electron density map of the macromolecule. An atomic model is then built up thanks to this electron density map and using graphics softwares.

Combined use of X-ray crystallography and NMR can offer a detailed 3D structure of small proteins together with dynamic features for efficient inhibitor design. Major advances in structure–function relationships also have been made by combining low-resolution electron microscopy studies of large complexes with X-ray crystallography or NMR of individual components. Past applications include the organization of intact enveloped viral particles.11

HCV Variability and Sequence Analyses

HCV genome shows remarkable sequence variation because of the lack of proofreading activity of the NS5B RdRp. More than 90 genotypes distributed into six main types (1, 2, …) and subtypes (a, b, c, …) were identified on the basis of nucleic sequence identity around the world.12, 13 In addition to genotypes, HCV exists within its hosts as a pool of genetically distinct but closely related variants referred to as “quasispecies.”14, 15 The relationship between HCV sequence variability and liver disease status or resistance to current antivirals is unclear and likely is multifactorial.16–18 Intensive sequencing of HCV genomes currently are conducted, and nearly 22,300 HCV sequences, including 181 full-length genomes, have been deposited so far in generic data banks (Genbank, European Molecular Biology (EMBL), and DDBJ). To perform sequence classification, annotation, and analysis of such a large collection, a number of sequence databases are dedicated specifically to HCV, such as the hepatitis virus database ( and HCVDB ( HCVDB offers numerous bioinformatics tools for nucleic acid and amino acid sequence analysis, for prediction of secondary protein structure, and for initial analysis of overall protein structure. Hyperlinks lead to generic and specialized databases, such as the Protein Data Bank (PDB), which contains resolved 3D structures of HCV genomic regions and proteins that can be accessed by their PDB entry codes. In particular, the impact (if any) of amino acid variations on protein structure-function relationship can be predicted.

In the absence of known 3D structure, protein sequence alignment and comparison reveals the conserved or variable nature of individual amino acids or amino acid stretches, along with the physicochemical properties of amino acids (e.g., their charge or hydropathic character) at specific positions.19–22 Conservation of certain amino acids or of their hydropathic characteristics among different HCV genotypes, strains, and/or quasispecies variants indicates that they are needed to maintain a vital protein function, or that they confer a significant survival advantage. Potentially important functional sites can be mapped in this way. In general, the hydrophobicity of residues located within the hydrophobic protein core is highly conserved because it is needed for proper protein folding and stability. This results in a global conservation of protein 3D structure, even in sequence-variable regions such as the hypervariable region 1 (HVR1) of the HCV envelope glycoprotein E2.19 The hydrophilic amino acids generally located at the protein surface are more variable, offering a high potential to modulate interactions with host components.

Structure of HCV Virions

HCV circulates in various forms in the serum of an infected host, including (i) virions bound to very–low-density lipoproteins and low-density lipoproteins, which appear to represent the infectious fraction; (ii) virions bound to immunoglobulins; and (iii) free virions.23, 24 In addition, viral particles exhibiting physicochemical, morphologic, and antigenic properties of nonenveloped HCV nucleocapsids recently have been detected in plasma.25 HCV virions have not yet been visualized conclusively by means of electron microscopy, and information on their 3D structure therefore is lacking. The main obstacles are the limited amount of virus than can be recovered from serum and the lack of an efficient cell culture system. HCV particles are believed to have a diameter of 55 to 65 nm.26, 27 By analogy with the known 3D structures of closely related flaviviruses (dengue virus28 and tick-borne encephalitis (TBE) virus29) and alphaviruses (e.g., the Semliki forest virus11), HCV is thought to adopt a classical icosahedral scaffold in which its two envelope glycoproteins, E1 and E2, are anchored to the host cell-derived double-layer lipid envelope. Underneath the membrane is the nucleocapsid that likely is composed of multiple copies of the core protein, forming an internal icosahedral viral coat that encapsidates the genomic RNA. HCV core particles isolated from detergent-treated sera and observed by immunoelectron microscopy and optical rotation techniques revealed a capsid-like structure exhibiting sixfold axis in agreement with such icosahedral arrangement.30

Structure of 5′ and 3′ UTR

The 5′ UTR is 341 nucleotides long and contains four highly structured domains numbered I to IV (Fig. 1). Domain III contains a pseudoknot, and the ORF translation initiation codon is located in domain IV. Domains II, III, and IV, together with the first 24 to 40 nucleotides of the core-encoding region, constitute the IRES. IRES-mediated entry of the ribosome on viral RNA takes place a few nucleotides upstream of the AUG translation initiator codon. The 40S ribosomal subunit is positioned in contact with the AUG codon by means of IRES domains II and III. Binding of eIF3 to both the IRES and the 40S ribosomal subunit creates a complex that brings the AUG initiation codon into contact with the Met-tRNA anticodon. The free IRES recently has been visualized by electron microscopy.31 Moreover, cEM, combined with single-particle image processing, has yielded a 20-Å resolution 3D model of the mammalian 40S ribosomal subunit complexed with either the complete IRES or a fragment lacking domain II.32 These reconstructions show that IRES binding induces a significant conformational change in the 40S ribosomal subunit that requires domain II, indicating that the HCV IRES is a dynamic manipulator of the host translational machinery. NMR has also yielded atomic structures of stem-loops IIIb,33 IIId, and IIIe,34 and X-ray crystallography has been used to solve the structure of the IIIabc four-way junction35 that is important for binding the 40S subunit, along with eukaryotic translation initiation factor 3 (eIF3). This X-ray structure revealed how junction nucleotides interact with an adjacent helix to position regions directly involved in eIF3 recognition. A distorted stack of helices has been identified that probably serves as an important recognition surface for the translational machinery. These results have provided important insights into RNA structures that make direct contact with the ribosome and are useful for further design of inhibitors.

The 3′ UTR consists of a variable 40-nt sequence and an internal poly (U)/polypyrimidine tract, followed by a 98-nt sequence that is highly conserved among HCV genotypes and contains stable stem-loop structures.36, 37 The 3′ UTR is thought to play an important role in initiating viral genome replication (reviewed in ref. 38).

Structural Biology of HCV Proteins

Core Protein

During translation of the HCV polyprotein, the nascent polypeptide is targeted to the host ER membrane for translocation of the E1 ectodomain into the ER lumen, a process mediated by an internal signal sequence located between the core and E1 sequences (Fig. 1). Cleavage of the signal sequence by the host signal peptidase yields the immature form of the core protein (191 aa), which contains the E1 signal sequence at its C-terminus. This signal peptide is processed further by a host signal peptide peptidase, thereby yielding the mature core protein, the C-terminus of which lies at or close to amino acid 179.39 Most of the C protein is found in the cytoplasm, where it is bound to ER membranes or located at the surface of lipid droplets.40 A small proportion of the core protein also may be found in the nucleus.

The first 120 N-terminal amino acids of the core protein constitute a mainly hydrophilic domain that contains three highly basic clusters bearing predicted nuclear localization signals. The N-terminal region of this fragment contains immunodominant antigenic sites, and NMR 3D structural analysis has revealed a helix–loop–helix motif at amino acids 17–37, carrying at least one conformational epitope (see PDB entry 1CWX and ref. 41). The region between amino acids 82 and 102 contains a tryptophan-rich sequence believed to be involved in homotypic core interactions. The 120-aa N-terminal region of the core protein shares the characteristics of native unfolded proteins, which are believed to undergo “induced folding” on binding to their natural ligands.42 This conformational plasticity allows the core protein to interact with many different cellular partners and may account for the wide range of functions attributed to this protein.

The C-terminus of the mature core protein (segment 121–179 or so) is mainly hydrophobic. This segment is predicted to fold into α-helices and is responsible for core association with lipid droplets in mammalian cells40 and with ER membranes. The presence of conserved glycine residues on predicted α-helices suggests the presence of an oligomerization motif.

Little is known about the mechanisms of HCV nucleocapsid assembly. In vitro nucleocapsid reconstitution experiments using 1–124 or 1–179 core segments and structured RNA molecules so far have yielded irregular particles larger than those isolated from infected subjects.43 A core protein interaction with cell membranes or lipids and/or envelope glycoproteins thus may be required for correct viral particle morphogenesis. In addition to its role in nucleocapsid formation, HCV core protein may modulate gene transcription, cell proliferation, cell death, and cell signaling; may interfere with lipid metabolism; and may suppress host immune responses (reviewed in refs. 44–46). However, most of these findings are controversial, and detailed structure–function analyses are required to elucidate core protein multifunctionality.

Alternative “Frameshifted” Core Proteins

Evidence of potentially overlapping ORFs was reported recently,3 and an additional HCV protein, designated F (for “frameshift”) or ARFP (for “alternative reading frame protein”), has been shown to be produced as a result of a −2/+1 ribosomal frameshift in the N-terminal core-encoding region of the polyprotein.4, 5 Expression of the F/ARFP protein of HCV genotype 1a in vitro or in mammalian cells yields a 17 kDa protein. Radiosequencing of in vitro-labeled F/ARFP protein indicated that the frameshift junction likely occurs at codons 9 to 11 of the core protein sequence.4 However, multiple frameshifting events recently have been reported in this region, and a 1.5 kDa protein also could be produced by −1/+2 frameshifting.6 In addition, the frameshift position seems to be genotype dependent, as a +1 frameshift at codon 42 was recently reported for genotype 1b.47 Detection of anti-F/ARFP antibodies in the serum of HCV-infected patients suggests that F/ARFP protein is expressed during HCV infection. The production of alternative, frameshifted forms of HCV core protein in the liver of infected patients remains to be demonstrated, however. It will be interesting to determine the structure and function of these proteins in the replicative cycle and/or pathogenicity of HCV.

E1 and E2 Envelope Glycoproteins and p7

E1 and E2 Envelope Glycoproteins.

The two envelope glycoproteins, E1 and E2, are thought to play pivotal roles at different steps of HCV replicative cycle. There is now strong evidence that they are essential for host-cell entry, by binding to receptor(s) and inducing fusion with a host-cell membrane.48 As in other Flaviviridae,28, 29 they are thought to play a major role in viral particle assembly. These distinct functions imply that the envelope proteins adopt markedly different conformations and that the latter must be controlled tightly to occur at the appropriate phases of the replicative cycle.

E1 and E2 are type-I transmembrane (TM) glycoproteins, with N-terminal ectodomains of 160 and 334 amino acids or so, respectively, and a short C-terminal transmembrane domain (TMD) of approximately 30 amino acids. These proteins assemble as noncovalent heterodimers,49 and their TMDs play a major role in the biogenesis of the E1-E2 heterodimer.50 Both E1 and E2 TMDs are composed of two short stretches of hydrophobic amino acids separated by a short polar segment containing fully conserved charged residues.51 The second hydrophobic stretch acts as an internal signal peptide for the downstream protein.52 Before signal-sequence cleavage, the E1 and E2 TMDs adopt a hairpin structure (Fig. 1). However, dynamic changes in these domains have been observed.52 Indeed, after cleavage by a host signal peptidase, the signal-like sequence is reoriented toward the cytosol, leading to a single TM passage (Fig. 1 and Fig. 2, top and middle). The final TMD is predicted to form a single α-helix, on the basis of NMR structural analysis of the N-terminal segment of the E1 TMD.53 The E1 and E2 TMDs contribute to important protein functions, including membrane anchoring,54 ER retention,55, 56 and E1-E2 heterodimer formation,53 the latter being thought to be the prebudding E1-E2 form of the virus.

Figure 2.

Behavior of the transmembrane domains of HCV envelope glycoproteins E1 and E2 during the early steps of their biogenesis. Top, The N-terminus of E1 is translocated into the lumen of the ER up to the N-terminus of the E1 transmembrane domain, which acts as a stop transfer signal. The C-terminal half of the transmembrane domain of E1 acts as the signal sequence of E2 and has its C-terminus oriented toward the lumen of the ER to allow the translocation of E2. At this step, and before signal-like sequence cleavage, the transmembrane domain of E1 forms a hairpin structure. Middle, After cleavage by the signal peptidase, the signal-like sequence is reoriented toward the cytosol, yielding a single transmembrane passage. Similarly, the transmembrane domain of E2 transiently adopts a hairpin structure to allow the translocation of p7, and its C-terminal half is reoriented toward the cytosol after signal peptidase cleavage (not shown). Adapted from ref. 52. Bottom, Summary of HVR1 structural properties and predicted amino acid functions. The consensus hydropathic pattern was established from 1382 HVR1 sequences of various genotypes; o, n, i, and v refer to hydrophobic, neutral, hydrophilic, and variable positions, respectively; G is a fully conserved Gly residue. The black boxes indicate the main anchoring regions, and the gray box indicates a putative turn. The + signs indicate positions often occupied by basic residues. Adapted from Penin et al.19 with permission.

The ectodomains of E1 and E2 contain numerous proline and highly conserved cysteine residues that may form 4 and 9 intramolecular disulfide bonds, respectively. E1 and E2 are highly glycosylated, containing up to 5 and 11 glycosylation sites, respectively. HCV envelope glycoprotein maturation and folding is thus a highly complex process involving the ER chaperone machinery, including calnexin,57–59 and depending on glycosylation and, potentially, on core protein coexpression.59 The folding of HCV envelope glycoproteins is relatively slow, and this is likely responsible for the heterogeneous disulfide-linked aggregates that can be observed in in vitro expression systems.60

Like other viral envelope proteins involved in host cell entry, HCV envelope proteins are thought to induce fusion between the viral envelope and a host cell membrane. X-ray crystallography of E envelope glycoprotein of TBE virus revealed a structure for this fusion protein distinct of class I fusion proteins (e.g., hemagglutinin of influenza virus), which possess a “fusion peptide” at or near the amino terminus and a so-called “hairpin” arrangement of coiled α-helices. In contrast, E of TBE is a three-domain (I–III) protein containing mostly antiparallel β-sheets that possesses an internal fusion peptide located at the tip of domain II, stabilized by dicysteine linkages.61 The scaffold of this class II fusion protein is remarkably similar to that observed for the Semliki forest virus, an alphavirus,11 suggesting that all genus of the Flaviviridae family could have class II fusion proteins. However, the small size of HCV E1 and E2 ectodomains relative to E of flaviviruses suggests differently truncated class II fusion protein structures. Using the 3D structure of the E glycoprotein of TBE as a template,61 structural models of HCV of E1 and E2 ectodomains have been proposed.62, 63 However, because of the poor amino acid sequence homologies between these proteins, these theoretical models must be considered only as working hypotheses.

Hypervariable regions have been identified in the E2 envelope glycoprotein sequence.14, 46 These amino acid stretches differ by up to 80% among HCV genotypes, and even among subtypes of the same genotype. The first 27 amino acids of the E2 ectodomain form HVR1, the only neutralization epitope so far identified in HCV.64 HVR1 variability apparently is driven by antibody selection of immune-escape variants. An HCV clone lacking HVR1 was found to be infectious but strongly attenuated in chimpanzees,65 supporting a functional role of this domain, likely in host cell entry. Despite the sequence variability of HVR1, the physicochemical properties of the residues at each position and the overall conformation of HVR1 are highly conserved among the various genotypes (Fig. 2, bottom).19 Because HVR1 is a globally basic region, with positively charged residues located at specific sequence positions, it could interact with negatively charged molecules at the cell surface. Such an interaction could play a role in host cell recognition and attachment, as well as in the cellular compartmentalization of the virus. Another hypervariable region, HVR2, has been described in the E2 glycoprotein of HCV genotype 1 strains.46 HVR2 is a stretch of 7 amino acids (positions 91–97) showing up to 100% sequence diversity. Together with HVR1, HVR2 recently was reported to modulate E2 receptor binding, possibly through intramolecular interactions.66, 67


The p7 polypeptide is a small, intrinsic membrane protein 63 amino acids long. Although most cleavage events in the HCV polyprotein precursor proceed to completion during or immediately after translation, cleavage by the ER signal peptidase is delayed at sites E2/p7 and p7/NS2,1 suggesting that p7 is released in a regulated process. The p7 polypeptide has a double membrane-spanning topology, its N- and C-terminal ends face the ER lumen.21 Both TM passages have been predicted to form α-helices, and the C-terminal TM passage has been shown to function as an internal signal-like peptide. Amino acid variability analysis and helix projections have revealed that each TM passage of p7 has strictly conserved helix faces, suggesting their involvement in specific helix–helix interactions.21 Very recent data indicate that p7 can mediate membrane ion permeability and can form hexamers.68, 69 These structural and membrane-permeability properties suggest that p7 belongs to the viroporin family and could have an important role in viral particle release and maturation.70 This is supported by functional data obtained with pestiviruses (other members of the Flaviviridae family), indicating that p7 is essential for the production of progeny virus.71 Although p7 seems to be located mainly in ER membranes, it can be exported to the plasma membrane and may have functional roles in the secretory pathway.21 These properties make p7 an attractive candidate target for new antiviral drugs.

Nonstructural Proteins and the Replication Complex


NS2 is a nonglycosylated integral membrane protein that does not seem to be essential for the formation of the replication complex.72, 73 The only known function of NS2 is its participation in proteolytic cleavage at the NS2-NS3 junction of the polyprotein. Its membrane topology is unclear, but the presence of two internal signal-like sequences points to the existence of four TM segments.74 Most of the NS2 sequence is required for the zinc-dependent NS2-NS3 proteinase function, which is responsible for the autocatalytic cleavage that separates NS2 from the downstream portion of the precursor polyprotein, a process that occurs rapidly after translation and involves a conformation-dependent mechanism.75 No significant amino acid sequence similarities between NS2-NS3 and any other proteases have been found. However, NS2 is a metal-dependent proteinase, because its activity can be inhibited by EDTA and is stimulated by ZnCl2.2, 76 Site-directed mutagenesis has shown that amino acids His 143 and Cys 184 are essential for NS2 catalytic activity.77 Because the N-terminal domain of NS3, which is an integral part of this NS2-NS3 proteinase, is known to bind a structural Zn ion (Fig. 3), it is possible that Zn2+ may be required only for proper conformation of this domain and may not actually be involved in the catalytic process. This would imply that the two essential amino acids mentioned above may represent the catalytic diad of a novel type of thiol protease.78

Figure 3.

The serine proteinase domain of protein NS3 (NS3-SPD). Crystal structure of a covalent complex with a peptide-like inhibitor (FKI).84 (A) Ribbon diagram of NS3-SPD (PDB accession code 1DY9). The two characteristic β-barrels of the chymotrypsinlike fold are indicated in two shades of blue (darker, N-terminal; lighter, C-terminal). In gold the β-strand is contributed by NS4A, which completes the N-terminal β-barrel. The side-chain atoms of the catalytic-triad amino acids are represented as colored spheres of the corresponding van der Waals radius (cyan, orange, and green for His 57, Asp 81, and Ser 139, respectively). The inhibitor, which is covalently linked to Ser 139, is shown as a silver ball-and-stick structure. The structural Zn2+ ion (labeled) is indicated as a pink sphere at the bottom of the structure. The amino- and carboxytermini of the two polypeptide chains are shown with the same color code as the ribbons. (B) Structural representation colored according to electrostatic potential at the surface. Blue indicates positively charged regions, red indicates negatively charged areas, and white indicates neutral regions. The area corresponding to the surface of the amino acids of the catalytic triad is labeled. Note the long surface groove in which the target polypeptide has to be accommodated for proteolysis to occur. The inhibitor is represented as in (A). This view is roughly at right angles to the view in (A) (looking from the top).


The 3D structures of full-length NS3, the NS3 serine proteinase domain (free and complexed with NS4A and/or with inhibitors), and the NS3 helicase domain (free and complexed with single-stranded DNA) have recently been determined.

NS3 Serine Proteinase Domain.

The NS3 serine proteinase comprises the 189 N-terminal amino acids of protein NS3. As already mentioned, it is part of the NS2-NS3 proteinase responsible for autocatalytic cleavage at the NS2-NS3 site. The NS3 serine proteinase domain associates with the NS4A cofactor (54 amino acids), stabilizing the proteinase and activating it to cleave sites 4A/4B, 4B/5A, and 5A/5B. The 3D structure of NS3 serine proteinase, both free79 and complexed with NS4A,80, 81 shows that the central part of NS4A is mandatory for proper NS3 folding, through the formation of a β-strand (represented in gold, Fig. 3A) inserted into the N-terminal β-barrel of NS3. The hydrophobic N-terminal part of NS4A appears to form a TM segment required for NS3 targeting and anchoring to the ER membrane.82 The structure of the NS3 serine proteinase domain includes a chymotrypsinlike fold, with two six-stranded β-barrel subdomains of identical topology (Fig. 3A) that are believed to have arisen from an ancient gene-duplication event.83 The catalytic triad is formed by residues from the same loops of the two β-barrels: His 57 and Asp 81 in the N-terminal β-barrel (dark blue in Fig. 3A) and Ser 139 in the C-terminal β-barrel (light blue). Figure 3A shows the structure of a peptide-like inhibitor bound covalently to Ser 139.84 As shown in Fig. 3B, there is a fairly long surface groove containing the catalytic triad, into which the target polypeptide binds. The serine proteinase activity of NS3 is an attractive target for new drugs that could block viral replication efficiently.

NS3 Helicase Domain.

The NS3 helicase–NTPase domain consists of the 442 C-terminal amino acids of NS3. The crystal structure has been determined for this domain, both free85, 86 and complexed with DNA,87 and is representative of superfamily-2 helicases.88 It contains two structurally related subdomains folded with β–α–β topology, and a third C-terminal subdomain containing seven α-helices and three β-strands. As indicated in Fig. 4, the first β–α–β subdomain carries the NTPase activity and the second the RNA-binding function. The three domains are separated by deep clefts (in particular, see the large groove in Fig. 4A). The conserved helicase motifs all lie in regions that face the clefts. The NS3 helicase–NTPase domain probably has multiple functions, including RNA-stimulated NTPase activity, RNA binding, and unwinding of RNA regions with extensive secondary structure by coupling unwinding and NTP hydrolysis. This enzyme acts as an ATP-driven motor and is thought to switch between alternative conformations during active unwinding of double-stranded RNA, as suggested by intersubdomain flexibility at the level of the clefts. 3D structural analysis of the helicase domain complexed with a single-stranded DNA oligonucleotide87 revealed that the latter bound within the region of the groove indicated in Fig. 4A. However, the biological significance of this finding with respect to productive binding-unwinding of double-stranded RNA is unclear. Despite abundant structural data obtained by X-ray crystallography, the mechanism of action of the helicase domain and its precise role during replication are not fully understood. Nevertheless, this activity remains an important potential target for new drugs, which could be combined with other drugs targeting the viral proteases and the RNA-dependent RNA polymerase.

Figure 4.

Crystal structure of the helicase domain of NS3 (PDB accession code 8OHM). Surface representation is colored according to electrostatic potential (as in Fig. 3B). The individual subdomains are labeled, as well as the (blue) patch formed by the conserved helicase RRGRTGRGRRG motif. Note the important clefts between subdomains, which suggest possible intersubdomain functional hinges on NTP hydrolysis necessary for double-stranded RNA unwinding. (A) Front view of the enzyme. The negatively charged (red) large groove between the RNA binding domain and the C-terminal domain is indicated. (B) Top view, showing a flat enzyme.

Full-Length NS3.

The 3D structure of full-length NS389 shows that its C-terminus lies within the active site of the serine proteinase domain, where it is expected to be located during the cis cleavage that separates NS3 from NS4A (Fig. 5). The C-terminal α-helix of the helicase domain is shorter in the full-length NS3 version, giving rise to a C-terminal β-strand that fits into the groove containing the NS3 proteinase catalytic triad (Figs. 3 and 5). In contrast to the proteinase domain, the catalytic activity of which is similar to that of the full-length NS3 protein,90, 91 the helicase activity is strongly enhanced in the full-length protein.90, 92 The structure of full-length NS389 shows that the two β−α−β subdomains of the NS3 helicase are folded more rigidly than the isolated helicase domain. This may contribute to more effective use of the energy arising from ATP hydrolysis. In addition, the NS3 serine proteinase domain contributes to the helicase RNA substrate binding sites.90, 93, 94

Figure 5.

Structure of the full-length NS3 protein (PDB accession code 1CU1). The orientation is the same as in Fig. 4B. The helicase domain is shown in a surface representation, colored according to electrostatic potential as in Figs. 3B and 4. The proteinase domain is shown as a yellow ribbon. The amino acids of the catalytic triad are colored as in Fig. 3A. The C-terminus of the helicase domain (indicated in white) lies in the active site of the proteinase domain. Note that the two β−α−β domains (NTPase and RNA binding) are closer to each other than in the isolated helicase domain (Fig. 4B).

In addition to its role as a multifunctional enzyme, NS3 interacts directly with NS5B,95 and also with NS4B and NS5A via NS4A96 within the replication complex. Numerous NS3 interactions with cellular components also have been reported, including protein kinases A and C, p53, and histones H2B and H4, but their significance is unclear.97


NS4B is an integral membrane protein that cotranslationally associates with the ER membrane, presumably via an internal signal-like sequence.98 Both the N-terminus and the C-terminus of NS4B are believed to be oriented toward the cytosol, and prediction algorithms suggest the presence of four to six TM segments.98, 99 Cytosolic processing and orientation of the bulk of the protein has been confirmed experimentally, but it has been shown recently that the N-terminal tail of NS4B could be translocated into the lumen, most likely by a posttranslational process, yielding a fifth transmembrane segment.100 It was recently found that expression of NS4B induces the formation of a seemingly ER-derived membranous web that harbors all HCV structural and nonstructural proteins,101 as well as replicating viral RNA.102 Thus, one function of NS4B may be to induce a specific membrane alteration that serves as a scaffold for the formation of the HCV replication complex (see below).


NS5A is a membrane-associated phosphoprotein of unknown structure and function. It is found in a basally phosphorylated form of 56 kDa and in a hyperphosphorylated form of 58 kDa. The NS5A protein of HCV and the related bovine viral diarrhea virus, as well as the NS5 protein of yellow fever virus, are phosphorylated by unidentified serine and threonine kinases, suggesting that these proteins share a common function related to their phosphorylation status.103

It was recently shown that an amino-terminal amphipathic α-helix mediates NS5A membrane association.22 Biophysical properties of a synthetic peptide representing the membrane anchor of NS5A, together with the nature and location of its amino acid residues, point to a membrane association mechanism involving the hydrophobic side of this α-helix, yielding a topology parallel to the lipid bilayer in the cytoplasmic leaflet of the ER membrane. The 3D structure of the N-terminal α-helical domain of NS5A was recently determined by NMR.104 Structure-function analyses have demonstrated that this helix displays fully conserved polar residues at the membrane surface, which define a unique platform probably involved in specific protein–protein interactions essential for the formation of a functional HCV replication complex. Overall, NS5A appears to have a modular domain organization. The roughly 180 amino acids immediately following the membrane-anchoring α-helix form a domain that, according to secondary structure predictions, is likely to achieve a stable fold on its own. The remainder of the sequence, from amino acid 200 to the C-terminus, has all the characteristics of natively unfolded proteins, which are thought to undergo “induced folding” by binding to their natural ligands.42 These natively unfolded proteins are involved in numerous intracellular protein–protein interactions.

The function of NS5A in the HCV replication cycle is unknown. Adaptive mutations have been found to cluster in the central region of NS5A in the replicon system,73, 105 suggesting that NS5A is involved—either directly and/or by interaction with cellular proteins and pathways—in the viral replication process. This, together with the modulation of NS5A hyperphosphorylation by the nonstructural proteins 3, 4A, and 4B,106, 107 supports the view that NS5A is an essential component of the HCV replication complex. However, adaptive mutations found to confer a significant replication advantage in the replicon system failed to initiate productive infection after inoculation of in vitro-transcribed RNA into the liver of chimpanzees.108

NS5A has attracted considerable interest because of its potential role in modulating the response to interferon alpha therapy.109, 110 Japanese studies suggested the existence of a discrete region of NS5A, tentatively named the interferon sensitivity determining region,111 and NS5A has been reported to interfere with the activity of the double-stranded-RNA-activated protein kinase in vitro.112 However, these findings are controversial, and the role of NS5A in the viral replication resistance to interferon alpha remains elusive.113 Many other potential functions recently have been attributed to NS5A, including transcriptional activation and involvement in the regulation of cell growth and cellular signaling pathways (reviewed in refs. 97,114). However, the role of NS5A in the pathogenesis of hepatitis C remains to be established. Determination of the 3D structure of NS5A is key to further progress in this direction.


HCV replication proceeds via the synthesis of a complementary minus-strand RNA using the genome as template, and subsequent synthesis of genomic plus-strand RNA from this minus-strand RNA template. The key enzyme in both steps is the NS5B RdRp. This viral enzyme has been extensively characterized at the biochemical115–117 and structural levels118–121 and has emerged as a major target for antiviral intervention.

The HCV RdRp belongs to a class of membrane proteins termed tail-anchored proteins.20 Characteristic features of these proteins include: (i) posttranslational membrane targeting via a hydrophobic C-terminal insertion sequence (which, in NS5B, maps to the 21 C-terminal amino acid residues), (ii) integral membrane association, and (iii) cytosolic orientation of the functional protein domain.122, 123 Two independent experimental strategies recently demonstrated that the HCV RdRp insertion sequence crosses the phospholipid bilayer of the ER (or of an ER-derived modified compartment) as a TM segment.124 Previous structural predictions indicated that this segment adopts an α-helical fold in the hydrophobic membrane context.20

The crystal structure of NS5B118–120 reveals a catalytic domain followed by a C-terminal extension that connects to the TM region via the active-site groove. The catalytic domain, formed by the 530 N-terminal amino acids (of the 591 amino acids in the full-length protein) exhibits the classical “fingers,” “palm,” and “thumb” subdomains (illustrated in Fig. 6A) of all single-chain polymerases, as initially defined in the Klenow fragment of Escherichia coli DNA polymerase I.125 A special feature of NS5B is that the fingers subdomain contains an extension that interacts with the thumb subdomain and restricts the mobility of one subdomain with respect to the other. This results in a fully enclosed active site into which NTP molecules readily can bind with no further rearrangement of the domains, conversely to other polymerases. The catalytic aspartic acids (Asp 220 and Asp 318) are located in the palm subdomain and chelate two catalytic metal ions that are responsible for the polymerization reaction. The RNA template binds in a groove that leads directly to the active site, and the NTPs access this site through an NTP tunnel, as indicated in Fig. 6B. Initiation of RNA synthesis by NS5B has been found to proceed through a de novo mechanism in the absence of primer. Recently, structural elucidation of an initiation complex for the polymerase of bacteriophage ϕ6,126 an unrelated double-stranded RNA phage, revealed that this polymerase is folded in a very similar way to NS5B, and superimposition of their structures visualized the RNA path along the template groove (Fig. 6B). Superimposition with the electron density map of NS5B complexed with oligonucleotides also identified NTP binding sites for de novo initiation, both at the catalytic site and at a “priming” site.121 The similarity of the two structures strongly suggests that the two enzymes initiate de novo RNA synthesis through a similar mechanism.

Figure 6.

Crystal structure of the catalytic domain of NS5B, the HCV RNA-dependent RNA polymerase (PDB accession code 1GX6). (A) Ribbon diagram of the NS5A-δ55 protein complexed with UTP and Mn2+. Alpha helices are colored blue, beta strands are red, and connecting loops are silver. The bound nucleotide and the side chains of the catalytic aspartic acids (Asp 220 and Asp 318) in the center of the structure are represented as ball-and-stick and are colored black and magenta, respectively. Mn2+ ions are shown as orange spheres. Also labeled is the triphosphate (tP) moiety of a nucleotide bound to the “priming” site. (B) Superimposition with the bacteriophage ϕ6 polymerase, showing the path of the template strand and the NTP tunnel. The thumb domain is omitted from this image for clarity. After superimposition of the structures of NS5B complexed with UTP (A) on the initiation complex of ϕ6 polymerase, the protein atoms of the latter were removed, leaving only its template nucleic acid (in red) and NTPs (in black and green), displayed with the atoms of NS5B. The coordinates of the two nucleotides in the initiation complex closely match the triphosphate moiety of the nucleotides in the NS5B/NTP binary complexes, suggesting that the two enzymes are in the de novo initiation mode conformation.

It is likely that the NS5B structural models so far reported correspond to a closed conformation of the enzyme that is adopted for the initiation step. The enzyme is thought to adopt a different conformation when proceeding to the elongation phase of replication. Current structural models contain no obvious exit path for double-stranded RNA. It is not known whether the newly synthesized strand emerges base-paired to the template (as double-stranded RNA) or if it is forced to unwind after a few nucleotides to leave the active site region. The structure of NS5B complexed with two different nonnucleoside inhibitors was reported recently.127 Interestingly, these drugs were found to bind at a surface site in the thumb approximately 30 Å from the active site. This binding site is very close to the allosteric GTP site previously identified at the fingers–thumb interface.121 Identification of these structures suggests that nonnucleoside inhibitors may act by blocking the enzyme in the initiation mode through inhibition of a conformational change needed to proceed with elongation. Similarly to poliovirus RdRp,128, 129 oligomerization of HCV NS5B may be important for cooperative RNA synthesis activity.127, 130 The significance of these observations currently is unclear.

The HCV RNA Replication Complex

Formation of a membrane-associated replication complex composed of viral proteins, replicating RNA, and altered host-cell membranes is a hallmark of all plus-strand RNA viruses investigated thus far.131, 132 This strategy may offer multiple advantages, including: (i) compartmentalization and local concentration of viral products, (ii) physical support and organization of the RNA replication complex, (iii) tethering of viral RNA during unwinding, (iv) supply of lipid constituents important for replication, and (v) protection of viral RNA from double-strand RNA-mediated host defenses and RNA interference. For HCV, physical interactions among nonstructural proteins, for example, between NS5A and NS5B,133 have been described. In addition, the determinants of membrane association of HCV nonstructural proteins have been mapped.20, 22, 82 A candidate HCV replication complex, named membranous web, recently was identified in tetracycline-regulated cell lines inducibly expressing the entire HCV polyprotein.101 The membranous web harbored all the viral proteins and was very similar to the spongelike inclusions previously revealed by electron microscopy in liver cells of HCV-infected chimpanzees. It was recently shown that the membranous web is indeed the viral replication complex in Huh-7 cells harboring autonomously replicating HCV RNAs (Fig. 7).102 The protein–protein interactions and cellular pathways involved in the formation of this replication complex are focuses of active investigation.

Figure 7.

HCV RNA replication complex. (A) Low-power overview of a Huh-7 cell harboring a subgenomic HCV replicon. A distinct membrane alteration, named membranous web (arrows), is found in the juxtanuclear region. Note the circumscript nature of this specific membrane alteration and the otherwise unaltered cellular organelles. Bar, 1 μm. (B) Higher magnification of a membranous web (arrows) composed of small vesicles embedded in a membrane matrix. Note the close association of the membranous web with the rough endoplasmic reticulum. Bar, 500 nm. The membranous web harbors all HCV nonstructural proteins and nascent viral RNA in Huh-7 cells harboring subgenomic replicons, and therefore represents the HCV RNA replication complex. N, nucleus; ER, endoplasmic reticulum; M, mitochondria. Adapted from Gosert et al.,102 with permission.


Tremendous progress has been made in the characterization of HCV component structures, and this despite the lack of a productive cell culture system. These insights have helped in understanding the mechanisms of HCV replication and the pathogenesis of HCV-related liver disease. Most importantly, this detailed knowledge of the 3D structure of key viral components holds the promise that efficient and specific inhibitors of HCV replications could be developed within the foreseeable future.


The authors thank Sophana Ung and Rainer Gosert for help in preparing the figures. The authors are grateful to all their co-workers who contributed to the studies cited here.