Carbohydrates can exist as simple sugars and as complex conjugates known as glycans. Glycans mediate a wide variety of events in cell–cell and cell–matrix interactions that are crucial to the development and function of complex multicellular organisms. Glycomic technologies for exploring the structure of complex sugar molecules have emerged in the past two decades, opening up a new frontier which has been called ‘glycobiology’ (1). This review provides an introduction to the structural properties of the linear chain glycans called glycosaminoglycans (GAGs) and their interactions with proteins.
Glycosaminoglycans (GAGs) are important complex carbohydrates that participate in many biological processes through the regulation of their various protein partners. Biochemical, structural biology and molecular modelling approaches have assisted in understanding the molecular basis of such interactions, creating an opportunity to capitalize on the large structural diversity of GAGs in the discovery of new drugs. The complexity of GAG–protein interactions is in part due to the conformational flexibility and underlying sulphation patterns of GAGs, the role of metal ions and the effect of pH on the affinity of binding. Current understanding of the structure of GAGs and their interactions with proteins is here reviewed: the basic structures and functions of GAGs and their proteoglycans, their clinical significance, the three-dimensional features of GAGs, their interactions with proteins and the molecular modelling of heparin binding sites and GAG–protein interactions. This review focuses on some key aspects of GAG structure–function relationships using classical examples that illustrate the specificity of GAG–protein interactions, such as growth factors, anti-thrombin, cytokines and cell adhesion molecules. New approaches to the development of GAG mimetics as possible new glycotherapeutics are also briefly covered.
Basic Features and Functions of GAGs
Glycosaminoglycans are large complex carbohydrate molecules that interact with a wide range of proteins involved in physiological and pathological processes (2,3). Glycosaminoglycans are sometimes known as mucopolysaccharides because of their viscous, lubricating properties, as found in mucous secretions. These molecules are present on all animal cell surfaces in the extracellular matrix (ECM), and some are known to bind and regulate a number of distinct proteins, including chemokines, cytokines, growth factors, morphogens, enzymes and adhesion molecules (2,4). The key properties of GAGs are summarized in Table 1.
|Physico-chemical properties of GAGs: negatively charged, viscous, lubricating, unbranched polysaccharides, contain repeating disaccharide units, bind large amounts of water, low compressibility.|
|Classification of GAGs: chondrotin sulphates, keratan sulphate, dermatan sulphate, hyaluronan, heparin and heparan sulphate.|
|Function of GAGs: cell adhesion, cell growth and differentiation, cell signalling anticoagulation.|
Glycosaminoglycans in aqueous solution are surrounded by a shell of water molecules, which makes them occupy an enormous hydrodynamic volume in solution (5). When a solution of GAGs is compressed, the water is squeezed out and the GAGs are forced to occupy a smaller volume. When the compression is removed, GAGs regain their original hydrated volume because of the repulsion arising from their negative charges (5).
Classification of GAGs
Glycosaminoglycans are linear, sulphated, negatively charged polysaccharides that have molecular weights of roughly 10–100 kDa. There are two main types of GAGs. Non-sulphated GAGs include hyaluronic acid (HA), whereas sulphated GAGs include chondroitin sulphate (CS), dermatan sulphate (DS), keratan sulphate (KS), heparin and heparan sulphate (HS). Glycosaminoglycans chains are composed of disaccharide repeating units called disaccharide repeating regions (Table 2). The repeating units are composed of uronic acid (d-glucoronic acid or l-iduronic acid) and amino sugar (d-galactosamine or d-glucosamine). Hence, GAGs differ according to the type of hexosamine, hexose or hexuronic acid unit that they contain, as well as the geometry of the glycosidic linkage between these units. Chondroitin sulphate and DS, which contain galactosamine, are called galactosaminoglycans, whereas heparin and HS, which contain glucosamine, are called glucosaminoglycans. The amino sugar may be sulphated on carbons 4 or 6 or on the non-acetylated nitrogen; however, the sugar backbone of GAGs can be sulphated at various positions. As a result, a simple octasaccharide can have over 1 000 000 different sulphation sequences (6). Glycosaminoglycans also vary in the geometry of the glycosidic linkage (α or β). At physiological pH, all carboxylic acid and sulphate groups are deprotonated, giving GAGs very high negative charge densities (heparin has the highest negative charge density of any known biomolecule) (7).
Nomenclature of GAG fragments
The names of the monosaccharides present in GAGs are frequently abbreviated. The most common are three-letter abbreviations for simple monosaccharides (e.g. Gal for galactose, Glc for glucose, Xyl for xylose, Man for mannose). Most of the monosaccharides are assumed to be in the d-configuration, except for iduronic acid (IdoA). All monosaccharides are assumed to be in the pyranose (p) form (six-membered ring), while all glycosidic linkages are assumed to originate from the anomeric C1 hydroxyl group. These monosaccharides are further classified based on the substitution position and the substituent (such as O-sulphate). If sulphate is attached to the C2 carbon of iduronic acid, it is referred to as IdoA2S. Similarly, sulphation at positions 2 and 6 in N-acetylglucosamine can be written as GlcNS6S. The glycosidic linkage between the monosaccharides is in either α or β configuration, involving the anomeric hydroxyl of one monosaccharide and any available hydroxyl group in a second monosaccharide. For example, α(1→4) refers to the α linkage in a disaccharide between the anomeric carbon of the first monosaccharide and the hydroxyl at position 4 of the other monosaccharide. The name begins at the non-reducing end (which is usually highly sulphated) proceeding towards the reducing end (the sugar with the free anomeric carbon that can be oxidized). The GAG nomenclature system (8) is based on Roman numerals and acronyms. The simple heparin pentasaccharide with Δ4,5-unsaturated uronic acid at the non-reducing end can be written in this system as Δ UA2S(1→4)GlcNS6S(1→4)IdoA2S(1→4)GlcNS6S(1→4)IdoA2S.
Recently, a disaccharide structural code, a new shorthand nomenclature, has been introduced for designating the disaccharide subunit structure of all GAGs (9). Disaccharide structural code assigns each GAG disaccharide a four-character code. The first character assigns the stereochemistry of the uronic acid: U, G, I or D for an undesignated uronic acid, glucuronic acid, iduronic acid or Δ4,5-unsaturated uronic acid, respectively, or g for galactose. The second character is used to define the location of sulphate groups: 0, 2, 3 or 6, representing no sulphate, sulphate at C2 or C3 of uronic acid, or sulphate at C6 of galactose, respectively. The third character designates the type of hexosamine (upper case for glucosamine and lower case for galactosamine) and the N substituent (H, A, S and R for free amine, acetate, sulphate or some other substituent, respectively). The fourth character in the code identifies the pattern of sulphation on the hexosamine (0 for no sulphate, 3 for sulphate at C3, 6 for sulphate at C6, 9 for disulphated glucosamine at C3 and C6, and 10 for disulphated galactosamine at C4 and C6). The pentasaccharide described above can be described as D2-S6I2-S6I2, with all of the isomeric information specified in the code.
A linear, canonical description of carbohydrates, LInear Notation for Unique description of Carbohydrate Sequences (LINUCS) (10), has been implementeda to describe carbohydrate structures according to the IUPAC nomenclature (8), which includes GAGs. LinearCode (11) is another approach developed that allows the compact and unambiguous description of complex structures. It uses a simple one- to two-letter representation of monosaccharide units and linkages and a description of branches using a look-up table.
Clinical significance of GAGs
The simplest non-sulphated GAG HA has many important functional roles (12), including signalling activity during embryonic morphogenesis (13), pulmonary and vascular diseases (14) and wound healing (15). Hyaluronic acid also acts as in the lubrication of synovial joints and joint movement, and its function has been described as space filler, wetting agent, flow barrier within the synovium and protector of cartilage surfaces (16). The influence of HA on cancer progression has been well described recently (14,17). The main receptor for HA is CD44 (18), which is expressed on the surface of virtually all stem cells including cancer stem cells. CD44/HA interactions can mediate leukocyte rolling and extravasation in some tissues and changes in CD44 expression contribute to tumour growth (19). Many cells also express receptor for hyaluronan and motility (RHAMM), which is a major HA-binding membrane protein (20), and this RHAMM pathway induces focal adhesions and signals the cytoskeletal changes, hence elevating cell motility seen in tumour progression, invasion and metastasis.
Anti-coagulation was the first described function for sulphated GAGs (21). Heparin was first discovered in 1917 because of its capacity to prolong the process of blood clotting, an effect due to its potentiating interaction with the natural inhibitor of thrombin, antithrombin III (AT-III), with only about one-third of all heparin chains possessing the structures required for AT binding (21). Heparin is mainly used in pharmaceutical products as an anti-coagulant for the treatment of thrombosis, thrombophlebitis and embolism. Pharmaceutical heparin is usually derived from bovine lung or porcine intestinal mucosa (22). Its name is derived from the fact that it was originally isolated from canine liver cells (from the Greek hepar for liver) (23,24). It has different molecular weights due to variations in chain length and is structurally heterogeneous.
Glycosaminoglycans play a major role in cell signalling and development, angiogenesis (25), axonal growth (26), tumour progression (27,28), metastasis (27,29) and anti-coagulation (30,31). Uncontrolled progenitor cell proliferation leads to malignant tissue transformation and cancer (32,33). Glycosaminoglycans and proteoglycans (PGs, see Proteoglycans) are believed to play a very important role in cell proliferation because they act as co-receptors for growth factors of the fibroblast growth factor (FGF) family. Indeed, members of the FGF family need to interact with both a heparin/HS chain and their high affinity receptor to realize their full signalling potential (25,34). Overexpression of these growth factors may contribute to tumour progression.
Sulphated GAGs are a common constituent in many different types of amyloid, playing an important role in the pathology of amyloid diseases such as amyloid A-amyloidosis, Alzheimer’s disease, type-2 diabetes, Parkinson’s disease and prion diseases (35). These diseases are characterized by the deposition in tissues of fibrillar aggregates of polypeptides. Heparan sulphate is known to bind amyloidogenic peptides in vitro and in vivo, promoting fibril formation and enhancing the disease condition. Heparan sulphate may sometimes be present within the amyloid β-containing deposits in Alzheimer’s diseased brains (36).
Diseases such as rheumatoid arthritis, inflammatory bowel disease and microbial infections are associated with inflammatory responses. Many proteins play a role in the inflammation cascade that leads to the activation of leukocytes and endothelial cells, and ultimately to the extravasation of leukocytes and leukocyte migration into the inflamed or diseased tissue. Glycosaminoglycans such as heparin have important roles in these processes, as adhesion ligands in leukocyte extravasation and carriers/presenters of chemokines and growth factors (37).
Glycosaminoglycans are also known to promote microbial pathogenesis and invasion (38,39) by interacting with several microbial pathogens on cell surfaces and in the ECM. Many pathogenic micro-organisms such as bacteria (e.g. Helicobacter pylori, Bordetella pertussis, Mycobacterium tuberculosis and Chlamydia trachomatis), viruses (e.g. herpes simplex) and protozoa (e.g. Plasmodium and Leishmania) express proteins capable of binding to HS, DS and CS on cell surfaces, and these interactions appear to mediate infection (39). Dengue and foot-and-mouth viruses interact with cell surface heparan sulphate proteoglycans (HSPGs) and promote the concentration of virus particles at the cell surface after subsequent binding to integrin receptors. Heparin is known to exert its anti-HIV-1 activity by binding to the viral surface glycoprotein, gp120 (40,41), thus blocking HIV-1 entry into cells. Similarly, heparin/HS acts as the primary receptor for herpes simplex virus (HSV) (42,43). Heparan sulphate present on cell surfaces interacts with glycoproteins gB, gC and gD (44,45), which are known to be present on the viral envelope and enhance infection (46).
In nature, all GAG chains with the exception of HA are covalently linked to a core protein (Figure 1) to give a PG. The linkage of GAGs to the protein core involves a specific trisaccharide composed of two galactose (Gal) residues and a xylose (Xyl) residue (GAG-GalGalXyl-O-CH2-protein). These saccharide residues are coupled to the protein core through an O-glycosidic bond to a serine residue. Some forms of KSs are linked to the protein core through an N-asparaginyl bond.
Virtually all mammalian cells produce PGs and either secrete them into the ECM, insert them into the plasma membrane, or store them in secretory granules. Examples of large PGs are aggrecan, the major PG in cartilage, and versican, which is present in many adult tissues including blood vessels and skin. A variety of core proteins have been shown to carry HS chains in the ECM and at the cell surface. Some membrane PGs such as sydecan-1 are hybrid structures known to contain both HS and CS (47).
Proteoglycans exhibit tremendous structural variation due to a number of factors, such as the differential expression of genes encoding core proteins, different exon usage of these genes and variations in the length and types of GAG chains (48). Different numbers of GAG chains with different saccharide sequences can be attached to the various serine residues present in the core protein. There are two major types of HSPGs (49): the syndecans and the glypicans. The core protein of each family differs: the syndecans are composed of an integral membrane protein whereas the glypicans have a glycosylphosphatidylinositol-anchored protein as their core protein. A HSPG with a core protein of a completely different structure is formed in the ECM (50). Thus, HSPGs are formed both on the cell surface and in ECMs. Proteoglycans are known to have affinity to a variety of ligands, including growth factors, cell adhesion molecules, matrix components, enzymes and enzyme inhibitors (4).
Sulphated GAGs: Heparin/Heparan Sulphate
The highly sulphated analogues heparin and HS have been studied extensively (7) due to their well understood functions in anti-coagulation. Heparin is known to be highly evolutionarily conserved with similar structures found in a broad range of vertebrate and invertebrate organisms (51), such as turkey (52), whale (53), camel (54), mouse (55), human (56), lobster (57), shrimp (58), mussel (59), marine clam species (60) and crab (61). The difference between HS and heparin is quantitative and not qualitative (62), as can be seen in Table 3. Heparan sulphate contains a higher level of acetylated glucosamine and is less sulphated than heparin (63). Heparin is synthesized by and stored exclusively in mast cells, whereas HS is expressed on cell surfaces and in the ECM as part of a PG (48).
Heparin consists of repeating units of 1→4 linked pyranosyluronic acid and 2-amino-2-deoxyglucopyranose (glucosamine) residues. The uronic acid residues typically consist of 90%l-idopyranosyluronic acid (l-iduronic acid) and 10%d-glucopyranosyluronic acid (d-glucuronic acid). The amino group of the glucosamine residue may be substituted with an acetyl or sulphate group, or remain unsubstituted. The 3- and 6-positions of the glucosamine residues can either be substituted with an O-sulphate group or remain unsubstituted. The uronic acid, which can either be l-iduronic or d-glucuronic acid, may also contain a 2-O-sulphate group. Heparan sulphate is structurally related to heparin but is much less substituted with sulphate groups than heparin. Like heparin, HS is a repeating linear copolymer of an uronic acid 1→4 linked to glucosamine. d-glucuronic acid predominates in HS, but HS can also contain substantial amounts of l-iduronic acid. Heparan sulphate generally contains about one sulphate group per disaccharide, but its sulphate contents tend to vary (48). On the cell surface, the O-sulphonate and N-sulphonate groups are deprotonated in HS and attract positively charged counter ions to form a salt under physiological conditions.
Heparan sulphate chains often also contain domains of extended sequences having low sulphation compared with heparin, as illustrated in Figure 2. The non-sulphated regions that have a GlcA-GlcNAc (acetylated glucosamine) sequence are the most common in the HS chain, with IdoA-containing sulphated regions (called S-domains) usually of about 5–10 disaccharides (64). There are also relatively minor proportions of mixed sequences, which contain both GlcNSO3 and GlcNAc (called NA-domain). A substantial proportion of the HS chain may consist of alternating GlcA–GlcNAc residues with no sulphate substitution.
The sulphated–acetylated–sulphated domain of HS has been subsequently found to be recognized by a number of chemokines, such as interleukin-8 (IL-8) (65), platelet factor 4 (PF4) (66) and macrophage inflammatory protein 1 alpha (MIP-1α) (67). The IL-8 dimer consists of two α-helical monomers lying on top of two β-sheets forming basic clusters on one face of the dimer. The two S-domains, each consisting of five to six saccharides in HS, accelerate the rate of dimer formation in IL-8 (65,68). The flexibility in the N-acetyl–rich ‘spacer’ or NA-domain (six to seven saccharides) in HS allows more conformational freedom for the simultaneous interactions of two S-domains and brings the monomers of IL-8 in close proximity to form a dimer in an anti-parallel arrangement (Figure 3). On the other hand, interferon-gamma (IFN-γ) does not significantly interact with isolated S-domains (69), in contrast to many other heparin binding proteins. Similarly, basic residues are clustered on both faces of the tetramer of PF4, requiring 21 saccharides in HS to form a more extended binding site on the charged surface of PF4. Heparin is assumed to be an analogue of the S-domains of HS, consisting mainly of sequences of sulphated disaccharides with IdoA2S (iduronic acid sulphated at C-2) and GlcNS6S (2,6-disulphoglucosamine).
Heparin and HS can often be structurally distinguished through their sensitivity towards microbial GAG-degrading enzymes, the heparin lyases. Three major polysaccharide lyases, heparin lyases I, II and III, isolated from Flavobacterium heparinum, are capable of cleaving linkages present in heparin and HS chains (70) by a beta elimination mechanism and each has three distinct substrate specificities (71,72). These three enzymes share very little homology at the DNA, protein or structural level, which imparts specificity towards the substrates. Heparin lyase I is involved in heparin binding whereas heparin lyase III exhibits strong specificity for HS. Heparin lyase II is believed to act on heparin and as well as on HS through two distinct active sites. The degrading enzymes work on the non-reducing ends leaving the region at the reducing end of HS usually unmodified (73). Distinct from bacterial heparinases, HS degradation by mammalian endoglycosidic enzyme heparanase has also been described in human placenta and rat liver hepatocytes. Heparanase cleaves the glycosidic bond through a hydrolytic mechanism, yielding HS fragments of appreciable size (10–20 sugar units), suggesting that the enzyme recognizes a particular HS structure (74).
Heparan sulphate proteoglycans are the major component of ECM in mammals (75). The structural heterogeneity of HS with respect to the size of the polysaccharide chain, the ratio of IdoA to GlcA units, and the amount and distribution of sulphate groups along the carbohydrate backbone is the result of variations in the biosynthesis of HSPGs. The fine structure of the chains depends on the regulated expression and action of multiple biosynthetic enzymes, such as glycosyltransferases, sulphotransferases and an epimerase, which are arrayed in the lumen of the Golgi apparatus. The reactions catalysed by these enzymes do not go to completion, yielding individual chains whose sequences are likely to be distinct from all other chains (76).
Conformation of Heparin
Heparin is a linear, unbranched polysaccharide that tends to have an extended conformation in solution because of its highly hydrophilic nature arising from its extensive degree of sulphation. Analysis of the conformations of individual sugars within heparin (Figure 4) indicates that unsubstituted IdoA residues exist predominantly in the 4C1 or 1C4 chair form, whereas IdoA residues, when bearing a sulphate group at position 2 (IdoA2S), exist in equilibrium between a number of different conformations, the most important being the chair (1C4) and skew-boat (2S0) forms (77). Solution NMR studies suggest that IdoA2S prefers the 2S0 and 1C4 conformation, whereas glucosamine sulphated at the N and O positions (GlcNS6S) prefers the 4C1 conformation (78,79). It seems that glucosamine and its derivatives are stable in the 4C1 chair conformation irrespective of substitution (78,80,81).
Heparin oligosaccharides sometimes contain a non-reducing terminal 4-deoxy-l-threo-2-sulphohex-4-enopyranosyluronic acid (unsaturated Δ4-uronic acid, ΔUA2S) residue arising from heparin lyase cleavage of an HS chain. Based on the conformation of the 4,5-double bond, ΔUA2S can exist in either the 2H1 or 1H2 conformations (Figure 4) and the equilibrium between these two conformations is controlled by their substitution pattern. The solution structures of heparin-derived oligosaccharides determined by NMR spectroscopy suggest that the terminal ΔUA2S residue exists predominantly in the 1H2 form, with a minor contribution from the 2H1 conformation (82).
The solution structure of a heparin dodecasaccharide composed of six GlcNS6S–IdoA2S repeat units has been determined using a combination of NMR spectroscopy and molecular modelling techniques (79). These two structures (Figure 5) have been deposited in the protein data bank (PDB) under code 1HPN. One structure has all IdoA2S residues in the 2S0 conformation (Figure 5A) and the other one has all IdoA2S residues in the 1C4 conformation (Figure 5B). The three-dimensional structure of heparin is thus complicated by the fact that iduronic acid may be present in either of two low energy conformations when internally positioned within an oligosaccharide. This conformational equilibrium can be influenced by the sulphation state of adjacent glucosamine sugars (83). The 2S0 form appears to be slightly favoured in terms of conformational stability, as it tends to minimize the unfavourable 1,3 diaxial non-bonded interactions that are expected in the 1C4 form, where four of the substituents are axially oriented and only the carboxylate group is equatorial (84). Whilst the spatial orientation of the 2-O-sulphate group in the IdoA2S residues is altered during 1C4–2S0 interconversion, no significant conformational change can be seen in the backbone of the polysaccharide chain in the NMR structures. In these NMR structures heparin adopts a helical conformation, the rotation of which places clusters of sulphate groups at regular intervals of about 17 Å on either side of the helical axis.
The iduronate ring can adopt either the 2S0 or 1C4 forms in the protein-bound state, which enables it to make specific electrostatic interactions with the electropositive surface regions of a protein. Nonetheless, the helical parameters of heparin oligosaccharides are conserved in spite of the conformational flexibility of the l-iduronate residues. NMR studies of a series of modified heparins with systematically altered substitution patterns indicate that all derivatives in the unbound form, regardless of the sulphation pattern, exhibit similar glycosidic bond ψ and ϕ conformations (85). The conformations of these glycosidic linkages are also observed in the X-ray structures of heparin fragments in complex with proteins such as acidic FGFs (86) (PDB codes 1AXM, 2AXM, 1E0O, 1FQ9) and many other heparin structures bound to proteins, such as AT (PDB codes 1AZX, 1E03), bFGF (PDB code 1BFC), annexin V (PDB code 1G5N) and foot and mouth disease virus (PDB code 1QQP).
Depending on the local sequence, the conformation of heparin/HS oligosaccharides may be affected by the degree of flexibility in the disaccharide subunits and the surrounding water and cations (87). A recent theoretical study (88) determined the stable conformations of 1-OMe IdoA2SNa2 (2H1 and 1H2 conformations), 1-OMe GlcNS6SNa2, 1,4-DiOMe GlcNa, 1,4-DiOMe GlcNS3S6SNa3, 1,4-DiOMe IdoA2SNa2 (4C1, 1C4 and 2S0 conformations) and 1,4-DiOMe GlcNS6SNa2 monomers and their ionized forms both in the gas phase and in the presence of solvent and cations. In the gas phase, the 2H1 conformation of the uronate residue is more stable than the 1H2 form observed in the presence of water. The most stable structure was observed to be the 1,4-DiOMe GlcN-S6SNa2 monomer in the skew-boat 2S0 conformation in water. The 1C4 conformation is the most stable form in the presence of anions. In general, the results indicate that the relative stability of cation-heparin ionic interactions is considerably diminished in aqueous solution. Another study using similar saccharides have revealed that only two negatively charged oxygen atoms in the SO3− group are involved in co-ordination of the sodium cation (89).
Various studies of heparin conformations have revealed similar, well-defined molecular structures in terms of overall chain conformation, both in the solid state and in solution, as a result of the flexibility of the pyranose ring of iduronic acid, which results in either the 1C4 or 2S0 conformations (90). However, variations in the primary sequence of GAGs and the degree of sulphation can result in different binding modes with proteins that can affect their activity.
Conformation of Heparin Fragments Bound to Proteins
Iduronate may exist in skew-boat (2S0), chair (1C4) and intermediate ring (a mixture of 1C4 and 2S0) conformations in heparin–protein complex crystal structures depending on either receptor specificity or predominance in solution. The central iduronate in the crystal structure of a heparin pentasaccharide with the foot and mouth virus (PDB code 1QQP) has an intermediate conformation, whereas the outer iduronates are in the 1C4 and 2,5B conformations (38). The third iduronate ring in the crystal structure of the complex of a heparin hexasaccharide with bFGF (PDB code 1BFC) adopts a 1C4 chair conformation and the other at the fifth position adopts a 2S0 skew boat conformation (91). However, in the crystal structure of a heparin tetrasaccharide with annexin V the IdoA2S residue that interacts with the protein adopts a 2S0 skew conformation, whereas the non-interacting IdoA2S residue is in the 1C4 conformation (92). These structures suggest that when heparin binds to a protein, a change in the conformation of the IdoA2S residue may be induced, resulting in a better fit and enhanced binding, whilst the conformation of the less flexible GlcNS6S residue remains unaltered (i.e. 4C1 conformation).
Studies of the conformation and dynamics of a heparin pentasaccharide (denoted as AGA*IA) have investigated its high affinity interactions with AT, both in the crystal (93) and solution states (94). The protein-bound pentasaccharide has a conformation roughly similar to one of the conformations predicted by molecular mechanics calculations for the pentasaccharide, wherein the iduronate residue adopts a conformation between the 2S0 skew-boat and 2,5B conformations (95). NMR studies of the complex of a heparin tetrasaccharide with AT reveal a distinct change in conformation of the glycosidic linkage upon binding and a stabilization of the 1C4 chair conformation of the iduronate residue (96), which is not observed in the solid state structure. A conformational change in the geometry around the glycosidic linkage between the non-reducing end glucosamine and the adjacent glucuronic acid residue is also observed upon binding of heparin to AT, as compared with the solution state (97,98). Similarly, NMR and simulation studies of heparin octasaccharides containing the AT binding pentasaccharide sequence (AGA*IA) indicated that the non-sulphated IdoA residue preceding AGA*IA can exist in the 1C4 conformation when bound to AT and in the 2S0 conformation in the absence of AT (99). These altered conformations of IdoA do not affect the binding affinity for AT (99).
The conformation of heparin oligosaccharides bound to growth factors in solution has been studied using NMR. Studies of heparin tetrasaccharides in the presence of fibroblast growth factors aFGF and bFGF indicate that FGF binding stabilizes 1C4 conformation of the IdoA2S residue directly involved in binding (98) whereas the solid state structure of the complex of a heparin hexasaccharide with bFGF revealed that one of the IdoA2S residues adopts the 1C4 chair, and the other the 2S0 skew boat (91). NMR studies also confirmed the crucial role of the 6-O-sulphate group on at least one GlcNS6S residue required for the formation of the complex with aFGF, but not with bFGF (98). This also indicates that GAG-binding site specificity varies among family members. These examples suggest that the complexation of GAGs with proteins induces a change in conformation of IdoA2S, resulting in a better binding mode and high affinity, whilst the conformation of GlcNS6S remains unaltered.
Interactions of Heparin/Heparan Sulphate with Proteins
Numerous studies have identified common structural features in the heparin/HS binding sites of proteins. Different structural (NMR spectroscopy and X-ray crystallography) and molecular modelling approaches have been used to elucidate the three-dimensional features and structure-activity relationships of GAG–protein interactions (100). A list of the different proteins that have been crystallized in complex with heparin oligosaccharides and their characteristics, such as the optimal length required for binding and their binding affinities can be found in Table 4. The crystal structure of some of these proteins, such as IL-8, PF-4 and NCAM (Neural Cell Adhesion Molecule), are not available in complex with GAGs. A full list of GAG-binding proteins is available in the GAGPROT databaseb.
|Name of protein||Type of protein||PDB code||Size of oligosaccharide||Kd||References|
|NCAM||Adhesion protein||–||5-mer||52 nm||195,196|
|Fibronectin||Adhesion protein||1FNH||8- to 14-mer||μm||122,197|
|IL-8||Chemokine||–||18- to 20-mer||6 μm||47|
|RANTES||Chemokine||1U4L, 1U4M||16- to 18-mer||32 nm||198|
|Annexin V||Extracellular protein||1G5N||8-mer||20 nm||199|
|Annexin A2||Extracellular protein||2HYU, 2HYV||4- to 5-mer||30 nm||121|
|Amyloid P (AP)||Glycoprotein||–||4-mer||μm||200|
|Basic fibroblast growth factor (bFGF)||Growth factor||1BFB, 1BFC||4- to 6-mer||nm||72|
|Acidic fibroblast growth factor (aFGF)||Growth factor||1AXM, 2AXM||4- to 6-mer||nm||68|
|Heparin binding growth associated molecule (HB-GAM)||Growth factor||16- to 18-mer||10 nm||201|
|aFGF/ecto-domain of FGF receptor 2 (FGFR2)||Growth factor/receptor||1E0O||12-mer||nm||108|
|bFGF/ecto-domain of FGF receptor 1 (FGFR1)||Growth factor/receptor||1FQ9||12-mer||nm||202|
|Secretory leukocyte protease inhibitor (SLPI)||Protease||–||12- to 14-mer||6 nm||204|
|AT-III||Serpin||1AZX, 1E03, 1NQ9||5-mer (synthetic pentasaccharide)||20 nm||74|
|AT-III/factor Xa||Serpin/protease||2GD4||5-mer (Fondaparinux)||100–200 nm||177|
|Cardiotoxin A3, A5, M4 and M1||Toxin||1XT3||5- to 7-mer||μm||205|
|HIV-1-gp120||Viral pathogen||–||10-mer||0.3 μm||28,206,207|
|Dengue viral envelope protein||Viral pathogen||–||10-mer||15 nm||208|
The nature of GAG–protein interactions
Structure and sequence-based statistical analyses indicate that Asn, Asp, Glu, Gln, Arg, His and Trp are more likely to make up the binding sites for non-sulphated carbohydrates than other amino acids (101–103). The aromatic residue Trp has a significantly higher mean solvent accessibility in carbohydrate binding locations, whereas aliphatic residues Ala, Gly, Ile and Leu, hydrophobic residues which are usually buried inside proteins, do not appear to participate in sugar binding. The aromatic ring in Trp can pack against the hydrophobic face of a sugar molecule.
Strong ionic interactions are expected between GAGs and proteins. Clusters of positively charged basic amino acids on proteins form ion pairs with spatially defined negatively charged sulphate or carboxylate groups on heparin chains. Glycosaminoglycans interact with residues that are prominently exposed on the surface of proteins. The main contribution to binding affinity comes from ionic interactions between the highly acidic sulphate groups and the basic side chains of arginine, lysine and, to a lesser extent, histidine (104). The relative strength of heparin binding by basic amino acid residues has been compared and arginine has been shown to bind 2.5 times more tightly than lysine. The guanidino group in arginine forms more stable hydrogen bonds as well as stronger electrostatic interactions with sulphate groups. The ratio of these two residues is said to define, in part, the affinity of a binding site in a protein for GAGs (105).
The interactions of GAGs with proteins also involve a variety of different types of interactions, including van der Waals (VDW) forces, hydrogen bonds and hydrophobic interactions with the carbohydrate backbone. It has also been observed that heparin-binding domains contain amino acids such as asparagine and glutamine which are capable of hydrogen bonding. The affinity of heparin-binding proteins for heparin/HS was also enhanced due to the presence of polar residues with smaller side chains like serine and glycine. These residues provide minimal steric constraints and good flexibility for the interaction with GAGs (106). Ionic interactions contributed 30% to the free energy of binding of heparin to bFGF and non-ionic forces such as hydrogen bonding and hydrophobic interactions also contributed to the affinity of low molecular weight heparin (LMWH) heparin to bFGF (107). Studies of the interaction of heparin with the brain natriuretic peptide (BNP) revealed that only a small portion of the free energy of binding arises from ionic interactions (6%), whereas the major contribution arises from hydrogen bonding (94%) between polar amino acids on BNP and heparin (108). Hydrophobic interactions can also play an important role in heparin-protein interactions. NMR studies reveal that a tyrosine residue in a synthetic AT peptide makes specific hydrophobic interactions with the N-acetyl group of a GAG pentasaccharide from porcine mucosal heparin (108).
Structural studies of the complex between a heparin pentasaccharide and AT have shown that basic amino acids participate in five to six ionic interactions, contributing 40% of the binding energy, whereas non-ionic interactions are responsible for the remaining 60% of the binding energy (109). Two aromatic residues, Phe 121 and Phe 122, which lie near basic amino acids in the heparin-binding domain, make direct contact with the pentasaccharide. Phe 121 and Phe 122 were mutated to Ala and Leu, respectively, resulting in decreased affinity of heparin for AT. These residues thus appear to play a critical role in heparin binding and AT activation through hydrophobic and VDW interactions (109). The positively charged basic amino acid residues Arg 47, Lys 114, Lys 125, and Arg 129 have been identified as the most important in the heparin binding site of AT using chemically modified, naturally occurring mutant and recombinant ATs (110–113). These basic residues participate in ionic interactions with the negatively charged groups of heparin, as observed in the crystal structure of an AT-pentasaccharide complex (93). Residues Arg 129, Lys 114, and Arg 47 are critical for the heparin-induced conformational change of AT, contributing to the resulting high affinity of interaction (113).
Consensus sequences in GAG binding proteins
The X-ray crystal structures of many GAG-binding proteins has helped to determine the existence of amino acid consensus sequences for GAG binding with common features such as the arrangement of basic amino acids. For example, the heparin-binding sequence WQPPRARI and the sequence WSPW have been identified as the GAG binding motifs at the C-terminal region of fibronectin and thrombospondin, respectively (114,115). Cardin and Weintraub (116) analysed the structures of 21 heparin-binding proteins and proposed that typical heparin-binding sites have the sequence XBBXBX or XBBBXXBX, where B is a lysine or arginine (with a very rare occurrence of His) and X is a hydropathic residue. The ‘X’ in the consensus sequences was defined as a hydropathic residue based on the frequency of occurrence of residues at specific positions from known heparin binding proteins. The residues Asn, Ser, Ala, Gly, Ile, Leu and Tyr were more common at positions ‘X’. Residues such as Cys, Glu, Asp, Met, Phe and Trp exhibited a very low occurrence at positions ‘X’ in either the α-helical or β-sheet domains of heparin binding proteins. Table 5 lists the heparin-binding proteins that contain the Cardin–Weintraub consensus sequence.
|Protein C inhibitor||GLSEKTLRKWLKMFKK|
|Insulin-like growth factor-binding protein-3||DKKGFYKKKQCRPSKG|
Depending on the secondary structure of the protein, very few residues in these consensus sequences may actually participate in GAG binding. Glycosaminoglycan-binding sites are often found along one exposed face of a protein and sometimes wrap around multiple faces in the case of β-sheets.
The basic amino acids of the sequence XBBBXXBX, when they belong to an α-helix, are usually displayed on one side forming an amphiphatic helical arrangement (Figure 6A). Therefore, in order to interact with a linear GAG chain, it would be predicted that the positively charged amino acid residues in α-helical proteins would have to line up along the same side of the protein segment. Comparative analysis of heparin binding sequences have shown that basic amino acids are generally located about 20 Å apart (Figure 6B) in an amphipathic helical structure, and the same spatial arrangement is preserved in a β-strand structure (117). For example, the sulphates that mimic HS in the Artemin crystal structure (118) were found to be separated by approximately 8–9 Å and arranged at the vertices of an approximate equilateral triangle in the prehelix (with a positively charged heparin consensus sequence XBBXBX) and amino-terminal regions.
In β-strands, the positively charged residues in a GAG-binding protein are located in a different way compared with α-helical structures. The basic amino acids in the sequence XBBXBX line up on one face of a β-strand, whereas the hydropathic residues points back into the protein core. An example of β-sheet heparin binding proteins is the cobra cardiotoxins, which contain nine discontinuous basic residues (-B-X2n − 1-B-, where X is any residue and B is basic residue) separated by an odd number of any other residue (119).
A third consensus sequence was similarly proposed in the heparin binding protein von Willebrand factor: XBBBXXBBBXXBBX, where ‘B’ represents a cationic residue (120). The consensus sequence TXXBXXTBXXXTBB, as shown in Figure 6C, was also observed in aFGF, bFGF and transforming growth factor β-1 (TGFβ-1), where T defines a turn, B a basic amino acid (arginine or lysine) and X a hydropathic residue. The spatial distance between each of the three turns present in the consensus of these crystal structures was 12–18 Å (105).
It should be noted that the spacing of clusters of basic residues can also provide structural clues about heparin-binding sites that may be important for interactions with GAGs and which can facilitate the design of peptides that bind heparin efficiently (105).
Glycosaminoglycans binding sites are often not conserved between proteins, as observed in the case of chemokines, which have high structural similarity but do not share common GAG binding regions (121). Platelet factor 4 and IL-8 are members of the α-chemokine family that have very similar monomeric three dimensional structures, with anti-parallel β-strands and an α-helix in the C-terminus. PF4 has a heparin/HS binding consensus sequence ‘KKIIKK’, where K is lysine and I is an isoleucine protruding from the α-helix. The GAG consensus sequence in the equivalent α-helical domain of IL-8 is ‘KENWVQRVVEKFLKR’, which is responsible for heparin/HS binding. The heparin/HS binding proteins of the β-chemokine subfamily [e.g. MIP-1α, regulated on activation, normal, T-cell expressed, and secreted (RANTES)] use a different structural motif, ‘KRNR’. Members of both chemokine α and β families have additional residues and hence lack conservation in their GAG-binding regions, allowing specificity and selectivity of HS binding across chemokines.
Structural properties of GAG-binding proteins
The X-ray crystal structures of heparin–protein complexes have provided information on the structural features required for heparin binding, such as the protein fold, the periodicity of clusters of basic residues, the periodicity of sulphate clusters on the GAG chains and the sulphation level required for interactions with the binding site (105). Heparin binding sites can be formed by basic amino acids that are distant in sequence but are brought spatially close together in the final fold of the protein. The end-to-end lengths of these extended clusters are comparable to the minimum GAG chain lengths that are required for binding (typically 6–12 monosaccharide units, approximately 25–50 Å long). The binding of GAG fragments to chemokines has a strong length dependence but it is clearly not the only determinant of selectivity (122).
The periodicity of sulphate group clusters in an oligosaccharide chain can play a key role in determining the structure of the GAG binding site on the surface of either α-helical or β-sheet proteins. The regular periodicity of sulphate group clusters along one side of an oligosaccharide chain was consistent with the ability of heparin to induce an α-helical structure in polylysine peptides, allowing electrostatic interactions every three peptide turns between a HS cluster and a zeta-amino group of a polylysine peptide (123). A heparin octasaccharide was the minimal fragment size required for such interactions to occur with the polylysine peptide. A similar phenomenon has been detected for several lysine-rich regions in the Tau protein (124), wherein a heparin oligosaccharide wraps tightly around the outer surface of the (double) pleated sheets, inducing secondary structural changes and thereby neutralizing the inhibitory charge repulsions that would occur in a parallel stacking of the repeat regions formed by a polylysine stretch.
The relative proportion of N- and O-linked sulphate groups and N-linked acetyl groups in heparin/HS can effect their interaction with proteins. In the case of RANTES, O-sulphation appears to be more important than N-sulphation (122). Macrophage inflammatory protein-1 alpha, monocyte chemoattractant protein-1 (MCP-1) and IL-8 showed preference for both N- and O-sulphation. The binding of GAG fragments to chemokines requires both N- and O-sulphation (122). In addition, binding studies involving chemically modified heparins or HS preparations have shown that 2-O- and N-sulphate groups are important for interactions with bFGF (Figure 7) and do not require 6-O-sulphate group for binding. Glycosaminoglycan fragments requires 2-O-, 6-O- and N-sulphate groups for optimal interaction with the HIV-Tat protein (125).
Heparin–AT interactions: a case study of GAG–protein binding
The anti-coagulant activity of heparin arises primarily through its activation of the AT-mediated inhibition of blood coagulation factors such as thrombin and factor Xa, as depicted in Figure 8. The interaction of AT and its coagulation factors with heparin involves number of affinity states to terminate in a high affinity interaction. Heparin activates AT by two different mechanisms namely, conformational activation and bridging mechanism (126). First, the interaction between GAG and AT is mediated by a well-defined unique pentasaccharide sequence within heparin. This interaction generates a conformational change in the structure of AT, which enables additional interactions between AT and heparin, resulting in stronger binding. The conformational change also expels a protease reactive centre loop (RCL) in AT. This conformational activation mechanism promotes binding of RCL to the active site of factox Xa. After the complex formation, the AT-III interaction reverts to low-affinity binding followed by RCL cleavage, resulting in the release of heparin from the covalent AT-III–factor Xa complex. These conformational changes in RCL does not affect the binding of AT to thrombin caused by difference in active site in this proteinase. Full length heparin promotes the interaction of AT with thrombin by means of bridging mechanism of activation. During this mechanism, a positively charge surface in thrombin binds non-specifically to the extended heparin polysaccharide (Figure 8).
Several theories have been proposed in relation to the length dependence of the interaction of heparin with AT and serine proteases. Heparin chains of at least 16 saccharides in length are required to accelerate the reaction of AT with thrombin, even though only the pentasaccharide sequence is necessary to bind to AT (127). By contrast, heparin chains as small as the AT binding pentasaccharide are able to accelerate the inactivation of the other target coagulation enzymes, such as factor Xa.
The extracellular domains of fibroblast growth factors aFGF (FGF-1) and bFGF (FGF-2) have been extensively studied to determine the thermodynamics and kinetics of their interactions with heparin. These growth factors exert their biological effects by binding to different, specific cell surface FGF receptors (FGFRs). In the high-resolution X-ray crystal structure of a 2:2:2 dimeric ternary complex of bFGF, FGFR-1 and a heparin decasaccharide, heparin makes numerous contacts with both bFGF and FGFR-1, stabilizing the FGF-FGFR interaction (128). Heparin also makes contacts with the FGFR-1 of the adjacent FGF-FGFR complex, thus seemingly promoting FGFR dimerization (Figure 9). The 6-O-sulphate groups of heparin play a major role in promoting these interactions (129).
The crystal structure of a 2:2:1 complex of aFGF, FGFR-2, and a heparin decasaccharide has also been determined at a resolution of 2.8 Å (130). The complex is assembled around a central asymmetric heparin molecule linking two aFGF ligands into a dimer that bridges the interaction between the two receptor chains (Figure 10). The heparin fragment makes contact with both aFGF molecules but only with one receptor chain. It is clear those different members of the FGF family and their respective receptors may interact differently with heparin/HS as a result of the heterogeneity in the structure of HSPGs and FGFRs on cell surfaces in different tissues. It has been reported that aFGF may recognize several conformations of the iduronic residues of a heparin hexasaccharide. It is believed that the hexasaccharide undergoes local 1C4–2S0 equilibrium conformational changes as a result of ionic interactions with flexible Arg and Lys side chains present in the protein (131).
Role of pH in GAG Binding
Certain heparin/HS–protein interactions are regulated by pH. Alteration of the pH can have profound effects on the ability of some proteins to bind heparin or HS. This is the case of the synthetic beta-amyloid peptide (Aβ) (132), selenoprotein P (133), the granulocyte macrophage colony stimulating factor (GM-CSF) (134), the mouse mast cell protease 7 (135), the stromal cell-derived factor-1 (SDF-1) (136) and the platelet endothelial cell adhesion molecule 1 (PECAM-1) (137,138). The modifying effect of pH arises particularly when the GAG binding site in a protein contains histidines, because the side chains of these amino acids have a pKa of approximately 6. Hence, if the pH falls closer to 6 an increasingly larger proportion of histidines will become protonated and hence positively charged, thus favouring the formation of electrostatic interactions with the negatively charged sulphate groups of GAGs (Figure 11).
A further example is that of mouse mast cell protease 6 (MCP-6). Molecular modelling of MCP-6 identified four conserved, pH dependent and surface exposed histidine residues (His 35, His 106, His 108 and His 238) that mediate the interaction of the protein with heparin in a pH dependent fashion (139). The electropositive nature of the surface of the protease, as shown in Figure 11, is due to the presence of protonated histidines that can make favourable interactions with GAGs, as compared with the accessible surface in the presence of deprotonated histidines. Histidine proline-rich glycoprotein (HPRG) is another example wherein binding to heparin is minimal at neutral pH but increases rapidly to a maximum at pH 6.5 (140). At an intermediate pH, both the protonation of histidines and the binding of zinc promote the interaction of HPRG with heparin. It is probable that there is a pH range where all histidines will be protonated, whereas most, if not all, of the glutamic and aspartic acid residues will still be negatively charged. This is likely to be the most favourable situation for heparin binding.
Effect of Metal Ions on GAG Binding
Sulphated GAG chains also bind strongly to divalent metal ions present in proteins or in solution (87). Molecular modelling and NMR studies have indicated the binding preferences of heparin for Ca2+ in solution (141,142) and is similar to the co-ordination observed in heparin-metal-protein complexes such as annexin proteins (92,143). The carboxylate groups of the iduronate residue and the N-sulphate and 6-O-sulphate of GlcNS6S are essential for Ca2+ as compared with 2-O-sulphate of IdoA2S. Combined modelling and NMR studies have indicated that the heparin-Ca2+ binding site has preference for the 1C4 iduronate form.
Divalent cation binding may be expected to influence the specificity and affinity of GAG–protein interactions. The binding of heparin/HS to proteins is enhanced in the presence of divalent cations such as zinc, as is the case of endostatin (144). Crystallographic studies of the complexes of heparin-derived oligosaccharides with human annexin A2 suggest that this protein exhibits significant Ca2+-dependent heparin-binding properties (Figure 12) at pH 7.4, either as a monomeric protein or as a component of an A2t heterotetramer (143). In the complex of a heparin oligosaccharide with annexin V the calcium cation does not interact directly with the heparin fragment but it induces the conformation of protein loops necessary for binding (92). Glycosaminoglycans also bind to prion proteins (PrP) at pH values above the pKa of histidine and in a metal ion-dependent fashion (145). Prion protein–GAG complexes are stabilized by Cu2+or Zn2+and PrP–GAG interactions are mediated largely by protonated and Cu(II)-bound His side-chains present at the N-terminal domain of PrP. Divalent cations were not found to be a prerequisite for the interaction of GAGs with lipoproteins but were found to stabilize their resulting complexes. It has been observed that Mn2+ is better than Mg2+ or Ca2+at promoting stronger binding between the acidic groups of heparin and the phospholipid portion of low density lipoproteins (146).
It is known that Zn2+ binds selectively to heparin rather than to other GAGs (147), which suggests that binding of divalent cations to GAG chains is not always a simple electrostatic interaction between the negatively charged groups on the carbohydrate and the positively charged metal ion. NMR studies have revealed that iduronic acid is the main binding site in heparin for divalent cations. Spectral data also revealed that Zn2+ binding controls the ring conformation of iduronate in heparin and HS, with the 1C4 ring conformation being stabilized over the 2S0 conformation (148,149).
Molecular Modelling of GAGS and GAG–Protein Interactions
In view of the limited structural knowledge available on GAG–protein interactions and the phenomenal structural diversity of heparin and HS, molecular modelling approaches have assisted the understanding of GAG binding affinity and specificity. Glycosaminoglycans are challenging from a molecular modelling perspective because of their high negative charge density and their conformational flexibility. Protein side chains also have a high degree of conformational flexibility. Hence, if all possible conformations of the sulphate and hydroxyl groups on a GAG oligosaccharide and all rotamers of charged side chains in a protein are to be taken into account, an accurate prediction of GAG–protein binding becomes an extremely challenging task.
Several molecular modelling techniques have been described in the literature for the successful prediction of sulphated GAG binding sites on the surface of proteins and for the prediction of their relative affinities. These methods include energy mapping of ligand probes on the surface of proteins, molecular docking and scoring, and molecular dynamics (MD) simulations (150). The prediction of binding sites for GAGs can also be validated upon survey of bound sulphates originating from the crystallization buffer found in the crystal structures of proteins (137,151).
Prediction of GAG binding sites on protein surfaces using GRID
The prediction of the location of GAG binding sites on the surface of proteins has been attempted by searching for the most positively charged patches of amino acids. The GRID algorithm (152), using atom probes to represent polar or charged groups on saccharide molecules, has been used successfully to map the most energetically favourable positions where sulphate groups may bind to the surface of proteins. Such studies have been performed with a number of proteins such as aFGF, bFGF, AT and IL-8 (153). The mapping of sulphate interaction energies can be first computed using GRID and then followed by ligand–protein docking to predict the most favourable anchoring position for a charged sulphate group on the surface of proteins. Different binding modes were proposed in this way for the interactions of HS with the chemokines RANTES, MIP-1α, and chemokine domain of fractalkine, showing that the different types of interactions that may arise on the surface of proteins are determined by their three-dimensional structures (151). However, this study did not consider an analysis of the optimum GAG sequence required for binding (i.e. effect of N and O-sulphation).
The purpose of docking GAG fragments to the surface of a protein is to identify the likely position of its heparin-binding site(s), predict the binding mode of GAG fragments and obtain an estimate of the free energy of binding (and dissociation constant). Most docking studies reported so far for heparin-binding proteins have focussed on predicting the amino acids that make up the heparin binding sites, but there have been few reports on the calculation of binding affinities (137,154).
Simulated annealing and genetic algorithms have been extensively used to dock GAGs to their putative proteins or receptors (137,144,154–158). It is clear that the prediction of binding energies of heparin and related sulphated GAGs to their biological targets requires a large number of docking evaluations in order to achieve energy convergence and sufficient conformational sampling.
Several ligand–protein docking studies have been reported for the prediction of heparin-binding sites on AT, aFGF and bFGF, which have been contrasted with the X-ray crystal structures of the complexes of these proteins with heparin oligosaccharide fragments (153). After correctly predicting the location of heparin binding sites in these proteins, docking simulations were also used to predict which are the key residues that make up the heparin binding site of IL-8 (153). Molecular modelling studies have also been carried out to predict the binding of a heparin hexasaccharide to the multi-component complex between bFGF and FGFR1, with findings consistent with experimental data on the binding mechanism of bFGF to its receptor, the dimerization of the receptor, and site-specific mutagenesis and biochemical cross-linking data (159). Molecular docking has also been used to predict that long heparin fragments such as a dodecasaccharide or a tetradecasaccharide are required for binding to the dimer of chemokine SDF-1α (160). In another study, different protein models for the dimer of MIP-1α were built on the basis of the crystal structures of PF4 and IL-8 (67). Docking simulations using heparin penta- and endecasaccharides predicted the interaction of these GAGs with the S-domains (requiring fragments 12–14 saccharides long) and the electropositive surface on the opposite face of the MIP-1α dimer (67).
A study of the interaction between a heparin pentasaccharide and AT has been carried out, despite the difficulty posed by the known conformational change that occurs in the protein upon ligand binding. Homology modelling of the protein structure and manual docking of the pentasaccharide were used to determine the basic amino acids involved in the recognition of the sulphate and carboxylate groups of the oligosaccharide. These predictions were then confirmed by automated docking simulations (161). The crystal structure of the complex between AT and the pentasaccharide revealed the existence of contacts between heparin and arginine and lysine residues on three different helices in the protein (93). The crystal structures of ternary complexes of AT, thrombin and heparin and AT and factor Xa and heparin, provided further insights into the large conformational changes that occur in AT upon activation.
Docking simulations have been used to predict the binding mode of a heparin oligosaccharide onto the surface of endostatin (144), as well as to determine the binding mode of a hexasaccharide to aFGF (162). In the case of aFGF, most of the low energy docking poses of a hexasaccharide were oriented towards Lys127 and Lys142 on the surface of the protein.
Docking methods have also been used to screen a combinatorial virtual library of heparin/HS hexasaccharides against the crystal structure of AT, identifying high specificity heparin/HS sequences (163). A good correlation was observed between the GOLD docking scores and the experimental binding affinity. Different methods have been used to dock heparin fragments onto activated protein C (APC), supported by experimental data (164). Short heparin oligosaccharides were determined to bind to various loops in APC, impairing the interaction of APC with factor Va during APC-catalysed cleavage.
Current docking methods aimed at predicting high affinity GAG sequences generally fail to take into account any conformational changes that may occur in the protein receptor. In addition, GAG oligosaccharides have many rotatable bonds (large degrees of freedom), posing a significant challenge for the search of the correct binding mode, particularly for molecules larger than a pentasaccharide. An additional problem arises due to the presence of distant, discontinuous heparin binding sites. Because most docking methods perform coarse docking, two model oligosaccharide fragments are needed for improved accuracy: one in which all IdoA2S residues are in the 1C4 ring conformation and another one in which they adopt the 2S0 ring conformation. Some of these limitations are being overcome by recent docking and scoring methods, such as those implemented in Glide (Schrödinger LLC) and AutoDock 4.0, which take into account partial flexibility of the protein and allow full treatment of ligand flexibility. Monte Carlo Multiple Minima (MCMM) calculations have also been used to try to overcome the problem of ligand flexibility (165) in the search for binding modes of cyclitols (GAG-like sulphated molecules) in complex with aFGF (166).
Recent docking simulations of GAG fragments to the homology model of PECAM-1 considered the sugars in different 1C4, 4C1 and 2S0 conformations (137,138). Glycosaminoglycan binding sites were initially predicted on the basis of a survey of sulphate groups in known crystal structures of proteins followed by docking of heparin fragments of various size and ring conformations (Figure 13). AutoDock scores gave a good correlation with experimental data suggesting the existence of high and low affinity GAG binding sites in PECAM-1. Docking simulations also predicted the effect of pH on binding of GAGs due to the presence of a key protonated histidine residue in Ig-domains 2 and 3. When these calculations were repeated at neutral pH, the free energy of binding increased to approximately 3 kcal/mole because of the loss of ionic interactions. The free energy of binding was determined to decrease with the increase in the size of the heparin fragments, with the optimum size being a pentasaccharide for an interaction with the ‘closed’ conformation of the receptor (137,138).
Energy scoring of GAG–protein interactions
The accurate computational prediction of the affinity of binding of GAG–protein complexes is still in its infancy. This is mostly due to the poorly defined contribution of water (solvation/desolvation) to the binding interaction and current limitations in the force fields (such as in the parameterization of charges and bonded terms and the neglect of polarization effects) and scoring functions used to represent GAG structure, dynamics and interactions.
Specific scoring functions have been developed for ranking the binding modes of non-GAG carbohydrates to proteins (167,168) which can take into account CH···π interactions, hydrogen bonds and electrostatic interactions. These functions can also be used for the sugar backbone present in sulphated GAGs but have not been optimized to deal with anionic substituents, such as sulphates and carboxylic acid groups. The GScore and EScore functions implemented in Glide (Schrödinger) have been used successfully for the screening of interactions of GAGs and their mimetics with FGFs, with good correlation with experimental binding affinities (169,170). Both scoring functions performed the same for the prediction of binding modes and the orientation of sulphate groups; however, Escore was better at predicting binding affinities.
The biomolecular ligand energy evaluation protocol (BLEEP) method has been used successfully to identify low-energy binding modes of heparin fragments (171). Various conformations of heparin were generated and the structure of human bFGF was kept rigid, all in the presence of explicit water molecules. The method correctly assigned the lowest energy to the binding mode observed in the crystal structure, indicating that its potential of mean force score (PMFscore) function is able to measure correctly the interaction energies of GAGs.
The linear interaction energy (LIE) method (172) and MCMM conformational search have been used to predict the binding affinities of tetracyclitols to aFGF and bFGF (166), but met with little success. This is due to the large magnitude of the electrostatic contributions to the free energy, resulting in unreasonable LIE coefficients.
Molecular dynamics simulations
Molecular dynamics simulations have been reported for complexes of oligosaccharides with proteins such as galectin-1 (152) and endo-1,4-b-xylanase II (XynII) (173), but very few MD simulations have been performed for sulphated GAGs such as heparin and HS. A 5 ns MD simulation has been reported for the complex of a heparin disaccharide with IL-8, where the binding affinities of the Ala mutants of basic residues histidines, lysines and arginines were predicted compared to wild type IL-8, and the structural stability of the monomeric and dimeric forms of IL-8 was also determined (68).
Molecular dynamics simulations have also been performed for the complex of a heparin pentasaccharide with AT in order to characterize the energetic contribution of important amino acids required for the interaction with GAG fragments and the ability of GAG fragments to induce the observed conformational change in AT (174). These simulations revealed that there is no specific conformational requirement for IdoA2S, as either the skew-boat or chair conformation is appropriate for binding with AT with a similar enthalpy of interaction.
A number of MD simulations have been carried out for heparin fragments in aqueous solution. Simulations of a system comprising a heparin decasaccharide in water with explicit sodium ions determined the conformation of heparin in solution under physiological conditions, which was found to be in agreement with NMR data (175). These simulations investigated the conformational changes in iduronic acid and the conformational flexibility of the glycosidic linkage. Larger variability in the conformation of heparin with respect to NMR-determined structures was observed, although this may have been due to the choice of partial atomic charges. Molecular dynamics simulations suggested that chair forms predominate monosaccharide level of IdoA2S (78) and the skew-boat may contribute approximately 40–60% of the total IdoA2S conformational preference in the entire polysaccharide chain depending on the heparin sequence (78). More recently, MD simulations of IdoA2S containing oligosaccharides has been reported to calculate the forces responsible for conformational preference of IdoA2S residue in aqueous solution and the results indicated that stabilization due to intramolecular hydrogen bonds around IdoA2S is highly correlated with the expected conformational equilibrium for this residue in solution (i.e. 2S0 conformation) (176).
Other simulations have investigated the binding of divalent metal ions such as Ca2+ and the effect of the sulphation pattern of heparin in aqueous solution (141,142). These simulations revealed that IdoA2S residues adopt the 1C4 conformation when co-ordinating the metal. The co-ordination shell of Ca2+ was made up of the N-sulphate of GlcNS6S and the carboxylate of IdoA2S, whereas the sulphate at position 2 of IdoA2S is not essential for binding to the metal.
The force constants used in molecular mechanics force fields to characterize equilibrium bond lengths and angles, as well as partial charges and VDW interaction parameters can significantly affect the accuracy of simulations of ligand–protein interactions. A variety of force fields have been designed for the simulation of carbohydrates, including optimized potential for liquid simulations (177), GROningen MOlecular Simulation (GROMOS package) (178), carbohydrate solution force field (179), CHARMM (180), CHARMM CHEAT95 (181), Glycam/AMBER (182), MM2 (183) and MM3 (184), PEF95SAC (185) and PIM (set of carbohydrate parameters) (186). These force fields do not always contain parameters for sulphated carbohydrates such as GAGs, but various approaches can be followed to develop specific parameters for GAGs using the MM2 (187,188), CHARMm and AMBER force fields (189). Parameters (i.e. non-bonded) for sulphates not available from the work of Huige and Altona (189) can be approximated from those for phosphates available from AMBER or CHARMm or Glycam06 (190) or from Lan Jin’s thesis (191).
Therapeutic Potential of GAG Molecules and GAG Mimetics
Carbohydrates such as GAGs derive their biological activity through binding to their protein receptors. These carbohydrate–protein interactions may be mimicked by designing small molecule drugs with appropriate binding affinity and selectivity. However, a significant problem is that the binding affinities of many carbohydrate–protein interactions are in the milli- to micromolar range, whereas small molecule drugs tend to require nanomolar binding affinities. Consequently, synthetic compounds that have been specifically designed to mimic the structure and interactions of carbohydrates, such as GAG mimetics, need to bind to their receptors with higher affinity than naturally occurring GAG oligosaccharides. Potential strategies based on heparin/HS–protein interactions have recently been described to assist GAG-based drug discovery (192). Glycosaminoglycan-based drugs can act in several ways by activating (agonists) or inactivating (antagonists) protein-based receptors, competing with endogenous GAGs and/or inhibit GAG biosynthesis.
The molecular diversity of heparin/HS interactions has been exploited for the development and clinical progression of GAG mimetics (193). Discrete GAG sequences can bind specifically and make unique interactions with a large number of proteins, such as chemokines (194), growth factors (195), proteases [e.g. AT (93)] and adhesion molecules (64). Nevertheless, the design of GAG mimetics requires an understanding of the mechanism and specificity of a given GAG–protein interaction. One mechanism of interaction of cell surface GAGs involves binding to their receptor through the formation of a complex involving the GAG, ligand and receptor molecules. In this case, the GAG and receptor binding sites are spatially distinct and uncompetitive for binding, resulting in the formation of ternary complexes, as in the case of IFN-γ, PF4, IL-3, G-CSF and GM-CSF (196). The other mechanism of interaction of soluble GAGs involves the binding competition with ligands, such as the chemokine IL-8 (197) and the helical cytokine IL-5, to their high affinity receptor (196). Cell surface GAGs help present chemokines to their GPCRs (G-Protein Coupled Receptor) by increasing the local concentration of protein (122).
Very few GAG fragments have been developed for therapeutic use (Figure 14), mostly because the synthesis of such fragments is difficult. The synthetic challenges associated with the complex structures of these oligosaccharides arise from the low availability of l-idose and l-iduronic acid from commercial or natural sources and the lack of efficient synthetic routes to make sufficient amounts of these monosaccharides. Other difficulties involve the development of a suitable protecting-group strategy to allow the implementation of a high degree of functionalization of heparin/HS fragments and the stereo-selective and efficient formation of inter-glycosidic bonds in the carbohydrate backbone (198).
The most recognized pharmaceutical application of GAGs is in anti-coagulation. Many pharmaceutical companies, such as Organon and Sanofi-Aventis, have been working on the development of commercial GAG-based drugs that can bind AT and thereby cause anti-coagulation efficiently without the need for frequent administration compared to full length heparin. For example, the synthetic pentasaccharide Arixtra® (Fondaparinux or SR90107/Org31540) (199) binds to AT and has better efficacy at low doses (with a half life of 17 h).
The crystal structure of the complex of Arixtra® with AT in the absence of coagulation factors has been determined (200). The crystal structure of this synthetic pentasaccharide complexed to thrombospondin-1 (TSPN-1) has also been resolved (201). The structural requirements for the binding of heparin to AT, as shown in Figure 14, were determined on the basis of the crystal structure and the structure-activity relationships for a series of pentasaccharides, using various combinations of sulphate and carboxylate groups (127,202). The 3-O-sulphate group at position F of Fondaparinux was shown to exhibit the strongest binding to AT by interacting with positively charged amino acids, whereas the lack of the 3-O-sulphate group at position F results in a decreased binding affinity to AT of nearly 20,000-fold.
Several other clinical candidates have been developed, such as Idraparinux (SANORG 34006), which also selectively inhibits coagulation factor Xa and binds to AT (203). Idraparinux is currently in phase III clinical trials for the treatment of venous thromboembolism. The synthesis of Idraparinux is much easier than that of Fondaparinux or heparin. It has a higher affinity (Kd of 1 nm) and better efficacy than Fondaparinux (Kd of 25 nm). The hydroxyl groups in Idraparinux are methylated and the N-sulphate groups in Fondaparinux are replaced by O-sulphates in Idraparinux. Idraparinux has an increased half life (120 h) in the bloodstream. The higher activity of Idraparinux appears to be due to the presence of its methyl ethers and of IdoA in the favourable 2S0 conformation (202). The crystal structure of the pentasaccharide analogue of idraparinux complexed with activated AT has been reported (93), providing convincing evidence that the IdoA adopts the 2S0 conformation.
Heparin binds to both AT and thrombin simultaneously to form a ternary complex, and it also binds to and inhibits factor Xa. Significantly longer heparin oligosaccharides are required to inhibit thrombin activity compared to the specific pentasaccharide that is required to bind to AT and inhibit factor Xa. The synthetic hexadecasaccharide SR123781 has tailor-made factor Xa and thrombin inhibitory activities combined with more selectivity in their mode of action. The likely molecular interactions of this hexadecasaccharide have been determined from the X-ray crystal structures of ternary complexes of AT/thrombin/heparin (204). This oligosaccharide consists of an AT-binding domain (ABD) (S12–S16) at the reducing end of the non-sulphated linker, a non-sulphated linker region (S6–S11), and a thrombin-binding domain (TBD) (S1–S5) at the non-reducing end of the linker (Figure 15). It also contains methoxy groups MeO and 2-O-sulpho substituted glucose units in the AT binding domain instead of the N-sulpho substituted glucosamine (S12–S16) that occurs in the natural heparin pentasaccharide. The highly sulphated glucose units allow non-specific binding to thrombin, which is dependent primarily on the overall charge density of the GAG fragment rather than on the precise sequence of the variously substituted sugar residues. The non-sulphated linker region (S6–S11) does not interact with any protein residues, but rather it facilitates the formation of the ternary complex, giving rise to increased anti-thrombin activity but with minimal interaction with PF4. It was reasoned that an oligosaccharide in which charged ABD and TBD are separated by a non-sulphated neutral linker could deliver an AT-mediated factor Xa inhibitor and block the active site of clot-bound thrombin, hence not giving undesired interactions with PF4.
PI-88 (Progen, Toowong, Queensland, Australia) (Figure 16) has progressed to clinical trials to treat inflammatory diseases, thrombosis, virus infections and cancer (205). PI-88 acts as a substrate analogue to inhibit heparanase to prevent HS degradation. This enzyme is the therapeutic target in disease states such as tumour cell invasion, metastasis, and angiogenesis. PI-88 is a phosphomannopentose sulphate (6-O-PO3H2-α-d-Man-(1→3)-α-d-Man-(1→3)-α-d-Man-(1→3)-α-d-Man-(1→2)-d-Man), wherein the chain length, sugar composition and glycosidic linkages α(1→3) and α(1→2) play important roles in its anti-coagulation activity compared with the anti-coagulant activity of sulphated glucose-containing oligosaccharides with β(1→4) and β(1→3) linkages (206). A number of sulphated pseudo sugar molecules such as cyclitols have been identified as selective inhibitors of certain protein–heparin interactions (207). In general, so called GAG mimetics consisting of different sugar backbones (such as mannose, triose, Xyl or inositols) with different degrees of sulphation and coupled by linkers of variable chain length, flexibility, orientation, and hydrophobicity can help to probe differences in the heparin/HS binding specificity and selectivity of proteins.
The chemical synthesis of carbohydrate-based oligosaccharides is plagued with problems of poor yield, multitude of products and long reaction times. Microwave-based synthesis of variably functionalized, per-sulphated organic molecules has been reported to result in high yields and purity to facilitate the rapid screening of these molecules as GAG mimetics (208). An alternative approach (i.e. peptide synthesis), in which homo-oligomers of tyrosine instead of carbohydrates are used as the backbone, has recently been reported to overcome the limitation of oligosaccharide synthesis (209). These oligo (tyrosine sulphate) molecules bind more strongly to AT than a heparin-derived hexasaccharide, with the binding strength being dependent on chain length (209).
Anti-coagulants comprise direct and indirect inhibitors of enzymes involved in the coagulation pathways, primarily thrombin and factor Xa (210). Direct inhibitors (including thrombin inhibitors, e.g. Lepirudin, Argatroban, Bivalirudin and factor Xa inhibitors, e.g. Rivaroxaban) (211) interact with either the active site or an exosite of the pro-coagulant enzyme, blocking its proteinase activity. Indirect inhibitors enhance the proteinase inhibitory activity of the natural anti-coagulants, AT and heparin co-factor II (e.g. LMWH). Significant efforts have been made to design GAG mimetics as inhbitors of direct and indirect mechanisms (210,211). A carboxylic acid-based polymer, poly-acrylic acid (PAA), demonstrated a surprisingly high acceleration in thrombin inhibition through a ‘bridging’ mechanism of activation (212). Further exploration of this scaffold led to the design of 4-hydroxycinnamic acid-based dehydrogenation polymer DHP (dehydropolymers) oligomers with dual inhibitory action against thrombin and factor Xa (213). Although high activation is achieved with these carboxylate-only molecules, SAR (structure activity relationship) studies show the need to retain critical sulphate groups and a more rigid backbone when designing GAG mimetics (or activators) in order to retain interactions in the ABD.
Molecular modelling has been used to design a small non-sugar based AT activator, epicatechin sulphate (ECS), from a pharmacophore deduced from the DEF portion of the natural heparin pentasaccharide DEFGH (214). Epicatechin sulphate was rationally designed using the hydropathic interaction technique (HINT) (215,216) to target the binding and activation of AT for enhanced inhibition of factor Xa in a pH- and salt-dependent manner. However, +/−catechin sulphate, a chiral stereoisomer of ECS, did not bind in the pentasaccharide binding site of AT, resulting in only weak activation (217). In fact, catechin sulphate was found to bind in the extended heparin binding site, which is adjacent to the binding domain for the reference trisaccharide DEF (217). Similarly, HINT analysis was performed for the PAA–AT complex, indicating that residues Arg 13, Arg 47, Lys 125, Arg 129, Arg 132, Lys 133 and Lys 136 are required for the interaction with PAA (212), as opposed to Lys 114, which is a key residue for the interaction with the critical 3-sulphonate group of residue F in the heparin pentasaccharide (113).
Several molecules capable of blocking heparin binding to growth factors such as VEGF (Vascular Endothelial Growth Factor) and bFGF have been identified (218). Most of these molecules have a linear extended structure containing heparin mimetic functional groups, such as carbohydrates, sulfonates, carboxylates and hydroxyl groups (218). This is important as negative charges on functional groups in GAG molecules, such as N-sulphates and O-sulphates, have been found to play an important role in the interactions with growth factors. Multiple sulphated peptides have been also reported to bind to a set of heparin-binding peptides and VEGF165 (219).
The entry of viruses into cells can be effectively blocked by either the removal of HS chains from the cell surface through enzymatic treatment or the presence of soluble forms of HS or HS-like molecules which competitively bind to the virus. Lignin sulphate, a sulphated form of lignin, was identified as an inhibitor of HSV-1 entry into cells (220). Recently, Surfen (bis-2-methyl-4-amino-quinolyl-6-carbamide) has been identified as small molecule antagonist of HS, as cell attachment and infection by HSV was diminished in the presence of this molecule (221). Furthermore, Surfen is also known to inhibit FGF signalling by blocking the formation of the FGF/HSPG complex (221).
A variety of different approaches such as solution-phase and solid-phase chemistry have been used for the polymer-supported synthesis of GAG and non-GAG derivatives in order to develop a large variety of oligosaccharides (198,207). Heparin microarray technology has provided the tools for rapid screening of specific GAG sequences (natural or synthetic oligosaccharides) that interact with proteins such as chemokines (222) and signalling proteins involved in inflammation, viral infection and immune system regulation (223–225). The introduction of non-anionic structural motifs into heparin/HS (226,227) may also provide a route for the development of novel, potent drug-like GAG mimetic molecules that can treat various diseases.
The use of biochemical, structural and molecular modelling methods to study the structure and function of GAGs has allowed significant progress in the description of the conformational properties of these complex molecules, the elucidation of the structural determinants of their interactions with proteins and the development of GAG mimetic molecules of therapeutic value.
The number of X-ray determinations of crystal structures of diverse GAG–protein complexes has increased in recent years. This has made possible to obtain a broader characterization of the properties of heparin binding regions in proteins, the preferred conformations and sulphation patterns that heparin has when it interact with proteins, the role of metals in mediating these interactions and the modifying effect of pH on the affinity of binding. The successful design and development of GAG-mimetic drug molecules depends on understanding how these factors shape the relationship between GAG structure and the nature of the interactions of GAGs with different proteins.
Molecular modelling techniques have enabled substantial progress to be made on the prediction of heparin binding sites in proteins, the prediction of biologically active GAG conformations, the dissection of the different intermolecular forces that dictate GAG–protein binding affinity and the rationalization of the specificity and selectivity of GAG interactions with proteins. These methods, in combination with new synthetic approaches and the use of fast screening tools, are enabling the successful development of new potent drug molecules.
A number of challenges lie ahead in the investigation of the interactions of GAGs with proteins from a structural and functional point of view. For example, a number of proteins of immunological importance (such as cytokines and chemokines) have the ability to oligomerize and so do their receptors (228–231). Heparin is known to interact with many of these proteins but the precise mechanism of interaction is not known. Structural biology determinations and molecular modelling approaches will be required to determine whether heparin and receptor binding sites occupy different positions on the surfaces of these proteins and to elucidate the possible role of heparin in mediating the oligomerization of these immunological molecules and/or their interactions with their receptors.
The realistic description of the conformational flexibility of proteins and GAGs and its effect on heparin binding, and the incorporation of the effect of the aqueous solvent in mediating heparin–protein interactions are two important areas where more molecular modelling studies are clearly needed. From a methodological point of view, one of the main challenges facing molecular modelling and computational chemistry is the accurate representation of the polarization effects in GAG molecules because of their high charge densities, which are likely to affect the computational prediction of their affinity of binding to proteins.
NG gratefully acknowledges the award of an Endeavour International Postgraduate Research Scholarship from the Australian Government. The authors acknowledge Dr Rupesh Khunt at Vienna University of Technology for his help with the molecular drawings.