Recombinant production of proteoglycans and their bioactive domains



J. M. Whitelock, Graduate School of Biomedical Engineering, The University of New South Wales, Sydney, NSW 2052, Australia

Fax: +61 2 9663 2108

Tel: +61 2 9385 3948



Proteoglycans are ubiquitous dynamic molecules that are made up of a protein core to which specific linear glycosylation structures, known as glycosaminoglycans, have been covalently coupled. They have roles in many biological and pathological processes, which have been shown to be dependent on events involving the protein component and/or the glycosaminoglycan chains. This review focuses on the literature describing the recombinant expression and production of proteoglycans known to be present in the extracellular, cell surface and intracellular environments with an emphasis on how the structure of the molecule relates to its biological function and how this relationship has been explored using recombinant DNA technology for clinical applications.


Chinese hamster ovary


African monkey kidney


chondroitin sulfate


chondroitin sulfate proteoglycan 4


dermatan sulfate


extracellular matrix


fibroblast growth factor


globular 1/2/3








heavy chain


human embryonic kidney cell line


human cervical cancer


heparan sulfate


human fibrosarcoma




keratan sulfate


nerve/glial antigen 2




Spodoptera frugiperda


transforming growth factor


This review discusses the expression and characterization of recombinant proteoglycans (PGs), how they have been used to advance our understanding of the relationship between PG structure and function, and how they might be useful in clinical applications in future.

PGs are multifunctional macromolecules that occur intracellularly, at the cell surface or in the extracellular matrix (ECM), and are involved in biological and pathological processes. Glycosaminoglycan (GAG) side chains dominate the functions of many PGs, whereas other PGs are known to occur in both glycosylated and unglycosylated forms, suggesting that the protein core is also important for many biological functions. The roles of PGs in biology and pathology are diverse and complex, and cover such areas as: the physical effects imparted on cartilage tissue via the binding of water molecules to the GAG chains decorating aggrecan; the finely orchestrated and dynamic events seen in embryonic development and organogenesis, where mitogen/morphogen gradients are established and continually modified via an equilibrium between cell-surface syndecans and glypicans; and extracellular heparan sulfate (HS) PGs like perlecan, agrin and type XVIII collagen.

PGs are composed of protein cores that are modified in a post-translational fashion with one or more GAG chains, which are long linear carbohydrate chains that are made up of repeating disaccharide units consisting of an N-acetylated hexose sugar linked to a hexuronic acid that has been named on the basis of the tissue from which it were first isolated. For example, chondroitin sulfate (CS) from cartilage, dermatan sulfate (DS) and keratan sulfate (KS) from skin, and HS and heparin from liver. Expression of the protein core is driven by a well-understood process which, because of the advent of DNA recombinant technology in the late 1980s and 1990s, meant that large quantities of known protein could be produced for use in laboratories or eventually the clinic. This started with the use of Escherichia coli-based expression systems, which had limited success in the production of large molecular mass proteins, proteins that had extensive folding and tertiary structure due to disulfide bridge formation and proteins with extensive glycosylation. Most of these factors apply to the PG family of molecules, particularly the large protein cores seen in the extracellular group of PGs. The challenge of obtaining glycosylation using E. coli systems is focused on using glyco-engineering approaches that utilize hybrid cell systems and have resulted in the fusion of bacterial strains that possess some of the intracellular machinery to achieve N-linked glycosylation with E. coli [1, 2]. These hybrid E. coli strains are still unable to O-glycosylate proteins, meaning that they are not suitable for the expression of PGs. Yeast expression systems using Saccharomyces cerevisiae or Pichia pastoris [3] and viral expressions systems using Spodoptera frugiperda (Sf9) insect cells [4] as hosts have been used with some success for the expression and glycosylation of PGs; the infection of mammalian cells has also provided some success for the production of biglycan and decorin [5]. Most of the focus in this area has been on the use of mammalian expression systems that utilize the Chinese hamster ovary (CHO) cell line [6] or the human embryonic kidney cell lines, HEK-293/293T. HEK-293 expression systems have the advantage of decorating the protein core in a similar GAG structure to the native molecules. For example, recombinant endocan was shown to be similar with respect to disaccharide composition to that produced by human umbilical venous endothelial cells under native conditions, however, the size and therefore the length of the recombinant GAG chains was significantly greater [7]. A significant challenge that remains for the production of recombinant PGs and their domains by these mammalian expression systems is the production of GAG chains that have similar lengths and sulfation patterns to that produced by the cell of interest. The HS attached to recombinant domain I of perlecan expressed in HEK-293 cells was approximately half the length of that attached to endothelial-cell-derived full-length perlecan and was produced as part of a heterogenous group of molecules decorated with HS, CS and KS. This is in contrast to the endothelial cell form which was decorated with HS only [8]. Interestingly, the full-length perlecan produced by the HEK-293 cells was also decorated with HS, CS and KS [9], indicating that each cell type may have a different set of post-translational modification enzymes that produce GAG chains specific for each cell type.

The synthesis of GAG chains takes place in the Golgi and is initiated by xylosylation of certain serine residues on the protein core that are present in consensus sequences. The linkage region is then built sequentially, residue by residue, resulting in a linkage tetrasaccharide with the sequence GlcAβ1–3Galβ1–3Galβ1–4Xyl-ol that is identical for all CS/DS- and HS/heparin-containing GAGs. The decision to generate either CS/DS or HS/heparin is defined by the addition of the next residue to this structure with the addition of GlcNAc dictating the generation of HS/heparin structures, whereas the addition of 2-(acetylamino)-2-deoxy-d-galactose (GalNAc) will generate CS/DS structures. Although the ribosomal-localized mechanisms involved in synthesis of the protein core are relatively well understood, many of the aspects of the Golgi-based glycosylation processes remain elusive, including what controls the addition of the first xylose of the linkage tetrasaccharide to what serine residue so that it can become decorated with GAGs and what dictates whether a HS/heparin chain or a CS/DS chain is produced. It has been suggested that the default pathway for GAG addition may be the addition of CS, but it has recently been suggested that the GAG decoration type was due to residence time in the Golgi where the various chain-modification enzymes are embedded and localized in the Golgi membranes [10]. The other factor involved in the generation of GAG chains is the vast complexity that is provided via the addition of sulfate residues at various points along the polymerized chain. HS/heparin chains are more highly sulfated than CS/DS chains and contain more complexity, with heparin containing relatively more sulfate than HS and relatively more DS than CS. HS/heparin chains are also modified via deacetylation prior to sulfation on some GlcNAc residues and epimerization of some glucuronic acid residues into iduronic acid (IdoA). DS shares some of these modifications with HS in that it may also contain iduronic acid as well as 2-sulfated iduronic acid. There are five different enzymes in humans, many of which have up to six or seven isoforms that have been shown to be expressed at different levels in different tissues to produce a modified HS/heparin chain in a precisely controlled and well-orchestrated fashion. The initial steps of the sulfation process occur by the removal of an acetyl group from a GlcNAc and the addition of sulfate at various points along the chain, which undergo further sulfation around that initial GlcNAc, resulting in the generation of sulfate or S-domains. The S-domains are interspersed by relatively longer regions of unsulfated acetylated saccharides that usually account for 70% of the residues in an HS chain and significantly less in a heparin chain, which tend to be completely sulfated [11].

When genes for PGs are transfected into mammalian cells, the amounts of transcript and protein core increase and, following Michaelis–Menton kinetics, the amount of PG would increase until the rate-limiting enzyme reached its maximum velocity. This results in an increase in the amount of protein core that is undecorated as well as an increase in PGs with GAG chains that might be shorter or lacking certain modifications. This leads to an increase in the heterogeneous nature of the recombinant forms of PGs produced and the potential requirement of transfecting the cells with the genes for the enzymes [12]. When chondroitin sulfate proteoglycan 4 (CSPG4) was transfected into HEK-293 cells, two PG populations were isolated with one being substantially present as a protein core [13]. Given that HS/heparin chains contain the greatest degree of complexity, it will be interesting to investigate how cells that are transfected with the serglycin gene produce PGs decorated with GAG chains that are similar in structure and function to those present in the granules of neutrophils, platelets and mast cells. Recently, there has been some success in bioengineering heparin in the laboratory using microfluidic-based bioreactors [14]. In future, it will be challenging but exciting to see the development of laboratory bioreactors designed to take the enzyme kinetics into account so that a synthetic PG can be manufactured.

Extracellular matrix proteoglycans

Recombinant DNA technology has enabled the expression and biochemical characterization of several ECM PGs, including the small leucine-rich PGs, decorin and biglycan, and other CS PGs including aggrecan, versican and bikunin. In addition, the HS PGs including perlecan, endocan, types XVIII and XV collagen are described below.


Decorin and biglycan are members of the small leucine-rich PG family and are structurally similar with leucine-rich repeats that are flanked by cysteine-rich domains [15]. Small leucine-rich PGs are usually decorated with CS, DS or KS with decorin displaying tissue-specific GAGs with CS in bone and DS in cartilage and skin [15]. Decorin, also known as DS PG II, has a core protein of 40 kDa which is decorated with one CS or DS chain that ranges in size from ~ 40 to 80 kDa [16, 17]. Decorin is found in the ECM of many tissues including skin, tendon, ligament and articular cartilage and has been shown to bind to a variety of molecules including fibronectin, transforming growth factor (TGF)-β and collagen types I, II [18] and VI [19]; it regulates fibrilogenesis.

Human decorin has been expressed as a CS PG in a variety of cell types including CHO cells [17], human fibrosarcoma (HT-1080) cells [20], HEK-293 cells [18, 21], human gingival fibroblasts [22] and human adenocarcinoma (A549) cells [23]. Human decorin expressed in CHO cells [17] was found to contain different linkage hexasaccharide structures including ΔHexAα1–3GalNAc(4S)β1–4IdoAα1–3Galβ1–3Galβ1–4Xyl-ol indicating that the GAG chain carried some DS disaccharides, ΔHexAα1–3GalNAc(4S)β1–4GlcAα1– 3Galβ1–3Galβ1–4Xyl-ol and ΔHexAα1–3GalNAcβ1–4GlcAα1–3Galβ1–3Galβ1–4Xyl-ol [24]. Decorin from human skin fibroblasts expressed in HT-1080 cells was produced as a mixed population, 25% of which was not decorated with GAGs with the remainder decorated with ~ 20–30 kDa CS [20]. The estimated yield was 30 mg from 109 cells in 24 h. The PG form was found to be two distinct populations, one with the larger CS chain containing predominantly 4-sulfated CS disaccharides and the other containing ~ 55% 4-sulfated CS disaccharides, 15% 6-sulfated CS disaccharides and the remainder unsulfated CS disaccharides [20]. Decorin has been expressed in other cell types including rat glioma cells [25], human cervical cancer (HeLa) cells [26], WiDr colon carcinoma [27] and A431 squamous carcinoma cells [27], CNS-1 glioma cells [28], rat mesangial cells [29] and HEK-293 [30], but has not been characterized with respect to GAG decoration. Mouse decorin has been expressed as a CS PG in HeLa cells [31] and found to contain 4-sulfated CS disaccharides [32]. Bovine decorin has also been expressed in E. coli [33] and human decorin in Spodoptera frugiperda 21 (Sf21) cells [34].

Analysis of recombinantly expressed decorin determined that it must be decorated GAGs and/or N-linked oligosaccharides to be secreted [35]. Recombinantly expressed decorin has also been explored to understand how it interacts with ECM proteins. It has been shown that decorin protects collagen types I and II fibrils against collagenase degradation [36], whereas the interaction between decorin and collagen type I has been shown to involve leucine-rich repeats 5–6, including Arg207 and Asp210 [37]. The N-terminus of decorin has been shown to bind to fibrinogen in the presence of Zn2+ [38]. Decorin expressed in human gingival fibroblasts was found to decrease the protein levels of MMP-1 and -3 suggesting that decorin can modulate tissue remodelling by balancing ECM synthesis and degradation [22]. Decorin also binds the first subcomponent of the C1 complex of the classical pathway of complement activation, C1q, via both the collagen and globular domains and although decorin binds C1, it does not activate complement [39].

Recombinantly expressed decorin has been used to understand how it affects cell behaviour. Full-length decorin decorated with GAGs has been shown to control cell proliferation by reducing the saturation density and changing the morphology of cells [18]. Decorin has been shown to inhibit the proliferation of tumour cells in vitro [27, 40] including human prostate cancer cell lines [30] and breast cancer cells [41]. Decorin has also been shown to reduce primary tumour growth by 50% in mice with orthotopic mammary carcinoma xenografts [41]. The addition of decorin to tumour cells caused suppression of vascular endothelial growth factor mRNA and protein, indicating that decorin might affect tumour growth by suppressing angiogenic stimuli [42]; it can also reduce the motility and invasion of the LM8 murine osteocarcoma cell line [43]. Decorin was also found to suppress epidermal growth factor receptor and androgen receptor expression and phosphorylation which inhibited phosphatidylinositol 3-kinase and Akt phosphorylation and led to apoptosis in LNCaP human prostate cancer cells [30]. This indicates that decorin may be a promising therapy for prostate cancer. Decorin-expressing tumours show more T-cell infiltration, but less activation of microglial cells because decorin suppresses TGF-β synthesis which regulates microglial cell growth and activity [25].

Decorin has been explored for the treatment of glomerulonephritis and in the prevention of glomerulosclerosis [44]. Intravenously injected recombinant decorin was found to be rapidly removed from the circulation and accumulate in the liver before being cleared [45]. Ex vivo decorin gene transfer to rat mesangial cells decreased the levels of TGF-β1 and matrix expansion, which may be useful in the treatment of kidney disease [29].

Decorin has also been explored for repair of the central nervous system and muscle injuries, and as an anti-fibrotic agent. The interaction between decorin and TGF-β has been explored by treating penetrating incisional wounds of rat brains with recombinant human decorin which was found to attenuate all aspects of central nervous system scarring, including matrix deposition and inflammation [46]. Muscle injuries are slow to repair and can result in incomplete functional recovery. Administration of recombinant human decorin has been shown to improve recovery strength, enhance muscle regeneration and inhibit fibrosis in a murine muscle model [47]. Decorin has also been shown to reduce neointimal thickening in balloon-injured rat carotid arteries by increasing the synthesis of collagen type I for enhanced contraction [48]. Gene delivery of decorin in spontaneously hypertensive rats was found to inhibit hypertension-induced cardiac fibrosis, remodelling and hypertrophy [49]. Streptococcus anginosus group pathogens have been shown to bind to decorin via its GAG chain which has implications for dental abscesses [32].


Biglycan consists of a protein core of 42 kDa and contains two CS/DS attachment sites located at the N-terminus, however, it is predominantly only decorated with one GAG chain which is tissue specific [50-52]. Recombinant biglycan has been predominantly expressed in mammalian cell culture systems including HEK-293, CHO and HeLa cells. Recombinant biglycan produced by stably transfected HEK-293 and CHO cells was found to be a mixed population decorated with one or two GAG chains that may be CS or HS [53]. By contrast, biglycan transiently expressed in HT-1080 and rat osteosarcoma (UMR106) cells was found to be a mixed population decorated with one or two CS chains or not decorated with a GAG chain [50], whereas biglycan expressed in Epstein–Barr virus-expressing (293-EBNA) cells was produced in both a PG and protein form, however, the GAGs were not characterized [54]. Recombinant biglycan produced by the HT-1080 cells appeared to be decorated with more than CS because chondroitinase ABC did not completely remove all of its GAG chains [50]. Biglycan expressed in HeLa cells was reported to be decorated with CS [55], whereas mouse biglycan expressed in HeLa cells was decorated with CS types A and C [32].

Mutations in the protein core of biglycan at GAG-attachment sites, serines 42 and 47, have been explored to understand how GAG chain attachment is regulated because PG forms of biglycan exist with either one or two GAG chains [53]. In all the mutations explored, the GAG-attachment site at serine 42 was almost always decorated, irrespective of the presence or absence of the serine 47 site, whereas decoration of serine 47 was dependent on the presence of serine 42 [53]. It was also shown that the C-terminal GAG-attachment site is usually only decorated once the N-terminal site is decorated. The expression and use of recombinant biglycan have led to the discovery of the role of biglycan in organizing collagen type VI into a hexagonal-like network that is dependent on the presence of GAG chains [56]. Recombinant biglycan has also been shown to bind to collagen type I [57], collagen type V, C1q [50] and to mannose-binding lectin, leading to inhibition of the lectin pathway [39].

The use of recombinant biglycan has also elucidated that biglycan is important for the maintenance of muscle cell integrity and plays a direct role in regulating the expression and sarcolemmal localization of the intracellular signalling proteins dystrobrevin-1 and -2, α- and β1-syntrophin and neuronal nitric oxide synthase [54]. In addition, biglycan increases utrophin expression in cultured myotubes and improves muscle function [58]. Biglycan has been shown to bind to Streptococcus anginosus bacteria via its CS chain [32].

Decorin/biglycan chimeras have also been produced to analyse the interactions between decorin and type I collagen [55] which was found to be the result of multiple binding sites acting in concert. In addition, biglycan–decorin chimeras have been shown to bind α- and γ-sarcoglycan [59].


Aggrecan contains a large protein core of 220 kDa that is decorated with both CS and KS. The protein core of aggrecan is composed of six domains, including: globular 1 (G1), interglobular (IG), globular 2 (G2), KS, CS and globular 3 (G3). Many different recombinant fragments of aggrecan have been produced, although few have been characterized with respect to GAG decoration (Table 1). Full-length bovine aggrecan has been expressed in African monkey kidney (COS-7) cells and was found to be decorated with CS [60]. One study produced 14 different recombinant constructs of aggrecan including each of the six domains individually, and combinations thereof, in COS-7 cells [61]. The recombinant proteins containing the CS, KS or IG domains were found to be decorated with GAGs as deduced from polydisperse relative molecular mass bands in western blots, however, their structure was not determined [61]. When the IG domain was expressed alone it was found to be decorated with GAGs, however, the IG–G2 domain construct had a reduced amount of GAGs. Similarly, the free KS domain was not as extensively GAG decorated as the G1–KS construct [61]. Recombinant human G1–G2 has been expressed in primary bovine keratocytes using a vaccinia virus expression system and found to be decorated with N-linked KS [62]. Link protein has been expressed as a fusion protein with chicken aggrecan CS and G3 domains in COS-7 cells, which was reported to be decorated with CS [63]. A fusion protein containing the CS and G3 domains of rat aggrecan have been expressed in COS-7 cells and found to contain a protein core of 70 kDa decorated with CS [64].

Table 1. Recombinantly expressed forms of aggrecan. nd, not determined; ns, not specified; no GAG, not decorated with GAGThumbnail image of

Other recombinantly expressed fragments of aggrecan reported were either not decorated with GAGs or their GAG decoration was not investigated. The G1 domain of avian aggrecan has been expressed in COS-7 cells, as well as in NIH3T3 and chicken chondrocytes [65], Spodoptera frugiperda, Sf21 insect cell line [66] and High Five insect cells [67]. Transgenic proteins of the G1 domain of aggrecan have also been expressed in CHO cells [68], whereas three subdomains of the G1 domain, A, B and B′, have been expressed in HEK-293 cells individually, and as fusions of two these subdomains with the G2 domain [69]. In addition, a fusion of the G1, IG and G2 domains has been expressed in HEK-293 cells [69]. Fusion proteins containing G1, CS and G3 derived from avian aggrecan have been expressed in CHO cells, but were not characterized with respect to GAG decoration [70]. Human aggrecan IG domain has been expressed in E. coli [71] and COS cells, but it was not decorated with KS [72], which is in contrast to when the IG domain was expressed in COS-7 cells [61]. Recombinant human aggrecan G1–G2 domains have been expressed in Spodoptera frugiperda, Sf21, insect cell line but this construct was not decorated with KS [73]. The G3 domain of human aggrecan has been expressed in E. coli [74, 75]; a fusion protein containing 20% of the CS domain as well as the G3 domain has also been expressed in E. coli [75].

Recombinantly expressed fragments of aggrecan have been used to understand the role of its different domains. Expression of the G1 and G2 domains revealed that these domains are poorly secreted and accumulate in the endoplasmic reticulum [70, 76], whereas the CS and G3 domains are required for efficient secretion of the protein core [77, 78] which is mediated through cysteine-dependent disulfide bonds in the G3 domain [78]. In addition, the G3 domain was found to enhance GAG-attachment [78]. Transgenic G1 domain proteins expressed with two alanines juxtaposed at the C-terminus affected intracellular localization, were not modified in the Golgi and were not efficiently secreted [61, 68], whereas addition of the G3 domain enabled efficient secretion of the protein [68].

Other studies have used recombinantly expressed aggrecan to understand cleavage specificity as well as interactions with the ECM and cells. Studies using incremental deletion and site-directed mutagenesis of the G3 domain expressed in COS-7 cells revealed that the carboxyl tail of the G3 domain is subject to cleavage, however, it was protected from cleavage when it is linked to a GAG [79]. A recombinant aggrecan IG domain was used to demonstrate that aggrecanase can cleave this domain independent of the G1 and G2 domains and KS [72]. Cleavage of G1–G2 by aggrecanase was reduced once the KS chains were removed [62], whereas cleavage of recombinant full-length aggrecan by ADAMTS-4 was reduced once the CS chains were removed [80]. Cleavage of aggrecan by MMP-13 was found to occur first in the CS domain followed by the IG domain [60]. Recombinantly expressed G1–G2 was found to bind to both hyaluronan and link protein [73], with the majority of binding occurring in the B and B′ subdomains of aggrecan G1 [69]. The G3 domain, and particularly the C-type lectin subdomain, was found to bind to known lectin ligands including tenascin-R, -C and fibulin-2 [81]. The G1 domain of aggrecan has been shown to reduce chondrocyte attachment [65].

A perlecan/aggrecan chimera has been expressed in COS-7 cells to investigate the involvement of non-GAG-bearing domains in directing GAG attachment to GAG-bearing domains [64]. The chimeras produced were perlecan domain I with the G3 domain of aggrecan (I/G3) as well as the CS domain of aggrecan with perlecan domains II and III (CS/II/III). Both chimeras were produced as two populations, one decorated with GAGs and the other not, and the GAGs were determined to be attached to the CS region. The CS/II/III construct was almost entirely decorated with CS, whereas the I/G3 construct was decorated with both CS and HS.


Neurocan is a CS PG related to the aggrecan family and is expressed in neural tissue. Recombinant fragments expressed in HEK-293 cells were shown to interact with the neural cell-adhesion molecule, N-CAM [82], as well as tenascin-C [83] and HS PGs suggesting that it may be involved in guiding and modulating the growth of neurons [84].


Versican, also known as CSPG2, is a large CS PG that can form large aggregates through interactions with hyaluronan (HA) and can also bind to other ECM proteins, chemokines and cell-surface molecules. Versican is a multifunctional molecule with roles in cell adhesion, matrix assembly, cell migration and proliferation [85]. The chicken homologue of human versican is called PG-M. The versican gene is composed of 15 exons with exons 3–6 encoding the G1 domain, exons 7–8 encoding the G2 domain and exons 9–14 encoding the G3 domain. RNA splicing occurs in exons 7 and 8, which that encode for the GAG-attachment sites, resulting in the expression of four isoforms denoted V0, V1, V2 and V3 (Fig. 1). The V3 isoform does not contain either exon 7 or 8 and is hence not a PG.

Figure 1.

Splice variants of versican [197]. UTR, untranslated region; SP, signal peptide; HABR, hyaluronan-binding region; GAGα, glycosaminoglycan α-binding region; GAGβ, glycosaminoglycan β-binding region; EGF, epidermal growth factor repeat; CR, complement regulatory protein repeat. The G1 domain is the HABR while the G3 domain contains exons 9–14.

Many different fragments of versican have been produced recombinantly, while few are decorated with GAGs (Table 2). Full-length versican has been expressed in CHO cells [86] and found to be decorated with CS. A fusion protein of the G1, 15% of the GAG-binding region and the G3 domain (also called mini-versican), derived from the chicken versican gene has been transiently expressed in COS-7 and stably expressed in NIH3T3 cells which was found to be a 150 kDa protein core decorated with 40–100 kDa CS [87-89]. Mini-versican was later expressed in chicken chondrocytes [90] and U87 astrocytoma cells [91]. Domains 1 and 3 together with the GAG domain and link protein have been expressed in COS-7 and NIH3T3 cells [89]. The link protein was used to increase secretion of the recombinant product. A fusion protein containing the GAG domain and domain 3 was found to exhibit a range of relative molecular masses by western blotting and assumed to be decorated with GAGs, however their presence was not confirmed [89].

Table 2. Recombinantly expressed forms of versican. nd, not determined; ns, not specified; no GAG, not decorated with GAGThumbnail image of

Many other recombinant forms of versican have been reported in the literature, however, they are not decorated with GAGs either because they do not containing exons 7 and 8 or are expressed in systems that do not decorate the protein core with GAGs, such as E. coli. The G1 domain, also known as the HA-binding region, has been expressed in NIH3T3 murine fibroblasts [86], Drosophila [92], High Five insect cells with a yield of 1–2 mg·L−1 [67] and bacteria [93]. The α- and β-GAG domains (exons 7 and 8) of versican have been expressed in E. coli, however, these constructs were not decorated with GAGs [94]. The G3 domain has been expressed in E. coli [88, 93, 95], whereas the lectin region (exons 11–13) of the G3 domain has been expressed in CHO [96] and HEK-293 cells [97].

Recombinant expression of versican has led to discoveries of its role in cancer through cell adhesion and migration studies. Recombinant full-length versican in the presence of HA has been shown to promote pericellular matrix formation in ovarian cancer cells, which promotes their motility and invasion in vitro, whereas treatment of these cells with recombinantly expressed versican induced cell invasion through an ECM barrier [98]. Mini-versican has been shown to enhance murine fibroblast proliferation which was attributed to the G3 domain [88], whereas the G1 domain has also been implicated through decreasing cell–substratum adhesion [89]. This form of versican has also been shown to inhibit mesenchymal chondrogenesis, which was also attributed to the G3 domain [87], and promote astrocytoma cell migration via the G1 domain [99]. Expression of this form of versican in chicken chondrocytes was found to promote proliferation through the G1 domain, destabilizing adhesion and the two EGF-like repeats stimulating proliferation [90]. Overexpression of G1 and G1/G3 domains in ovo resulted in enhanced cartilage deposition [100]. Domain I of human versican was found to induce spondylitis in the lumbar spine and sacroiliitis in BALB/c mice [67].

Recombinant expression of versican has also enabled an understanding of the role of the different domains. The presence of the G1 domain in constructs was found to inhibit expression, secretion and GAG decoration, whereas the presence of the G3 domain enhanced product secretion [87-90]. The G1 domain was found to form HA-dependent ternary complexes with link protein [101], whereas the HA-binding region was found to bind HA [86]. The C-type lectin domain was found to self-aggregate in dimer and multimer complexes [102], bind with high affinity to fibulin-1 [97, 103] and possesses carbohydrate-binding activity [96]. The C-terminal region of versican has been reported to bind heparin and HS in a Ca2+-dependent manner [95]. The P-selectin glycoprotein ligand-1 has been shown to bind to the versican G3 domain and mediates cell aggregation [91].


Bikunin, also known as urinary trypsin inhibitor, urinastatin and ulinastatin, is a CS PG. Bikunin is mostly found covalently linked via the CS chain to one or two of three heavy chain proteins designated HC1, HC2 and HC3 [104] named inter-α-trypsin inhibitor (bikunin, HC1 and HC2) and pre-α-inhibitor (bikunin and HC3) [105]. Bikunin is most widely known for its serine protease inhibitory activity [106], although it has also been investigated for its therapeutic activities in acute pancreatitis [107] and plays a role in the organization of hyaluronan in the ECM. Bikunin has a 20 kDa protein core and is predominantly synthesized by hepatocytes, where it is decorated with a CS chain as well as heavy chain proteins. It is transcribed from the α1-microglobulin/bikunin precursor gene in a precursor form with the α1-microglobulin protein N-terminal to the bikunin protein core, which is cleaved in the Golgi prior to decoration of the bikunin protein core at serine 10 with CS and subsequent attachment of the HC proteins [108]. Inter-α-trypsin inhibitor and pre-α-inhibitor are secreted by cells and have been termed PGs, however, bikunin is the only PG in these complexes because it is the only protein decorated with an O-linked CS chain on a serine residue [109]. The HC proteins are attached via an ester bond between the CS chain of bikunin and the a-carbon of the C-terminal amino acid of the heavy chains [110]. The cosynthesis of α1-microglobulin and bikunin is intriguing because they are not known to have a functional connection. Bikunin is most commonly isolated from urine as this form is not substituted with heavy chain proteins and it is naturally present in high abundance. Recombinant human bikunin has been successfully transiently expressed in COS-7 cells [111] and well as stably expressed in E. coli [112] and P. pastoris [113, 114], whereas rat α1-microglobulin/bikunin has been expressed in Sf9 insect cells [115] and COS-1 cells [116]. The PG form of bikunin has been recombinantly expressed in COS-7 cells decorated with ~ 30–50 kDa CS [111]. This form of bikunin was secreted by the cells as either the precursor form with α1-microglobulin, as bikunin alone or decorated with HC proteins. The PG form of bikunin has also been expressed in COS-1 cells in the precursor form with α1-microglobulin decorated with ~ 23-kDa CS and another form not decorated with CS [116]. Similar results were found in the insect expression system where bikunin was secreted as the precursor protein with α1-microglobulin or in its mature form in high concentrations [115], however, it was not decorated with CS. Bikunin expressed in P. pastoris was found to yield 55 mg·L−1 active protein with two protein cores of 24 and 21 kDa [114], whereas a fusion construct of bikunin with domain II of human serum albumin has been reported to produce a similar yield in P. pastoris [113].

Bikunin has also been shown to inhibit tumour invasion in vitro, which is dependent upon its anti-plasmin activity and urokinase-type plasminogen activator receptors [117]. A fusion protein of bikunin domain II and urokinase-type plasminogen activator was expressed in E. coli and found to be more effective at inhibiting cell-surface plasmin and in vitro cancer cell invasion than bikunin or urokinase-type plasminogen activator alone [117, 118]. Domain II was chosen because residues 78–136 have been reported to be responsible for the antimetastatic activity of bikunin [117], however, domain I contains the CS attachment site.


Perlecan, the major HS PG of basement membranes and other connective tissues, is a modular molecule with five structural domains. The molecular mass of the protein core is ~ 470 kDa whereas the molecular mass of the PG that includes the covalently attached GAG chains is in the range 600–700 kDa. The complete PG has been expressed in both HEK-293 and JAR choriocarcinoma cells and shown to interact in an HS-dependent fashion with activin A, a member of the TGF-β superfamily [119]. Of the five domains, the N-terminal domain I is the smallest with a molecular mass of ~ 20 kDa, and has been shown to contain a sperm protein, enterokinase and agrin module similar to that seen in mucins, and three well-characterized GAG-attachment sites, identified by the sequence Ser–Gly–Asp at residues Ser65, Ser71 and Ser76. Amino acid residues upstream of this region, as well as domains to the C-terminus, have been shown to be important in determining what type of GAG ultimately decorates each of the serine residues [120]. Domain I was first expressed in CHO cells [121], then in insect cells [122] and has subsequently been expressed in COS-7 [120] and HEK-293 cells [123-127]. In all of the mammalian systems, recombinant domain I was expressed as a PG mostly decorated with HS, whereas expression in conjunction with C-terminal domains of perlecan or as a fusion protein with enhanced green fluorescent protein resulted in only HS decoration [120, 124]. The HS attached to domain I was shown to be important for the binding of growth factors such as fibroblast growth factor (FGF)-2 [8, 125, 128], vascular endothelial growth factor-165 [126] and BMP-2 [129], as well as the basement membrane protein, laminin [130]. Domain II has homology to the low-density lipoprotein receptor and was expressed as a complete and correctly folded domain in HEK-293 cells [131]. The positions of the cysteine residues are strictly conserved suggesting that they are important for the tertiary structure of this domain. Domain III shares homology with structures present in the laminin α and γ chains as well as possessing EGF-like motifs and has been expressed in E. coli in its entirety [132] and as three separate, but overlapping, recombinant fragments in HEK-293 cells [133]. The highest affinity binding of platelet-derived growth factor BB isoform has been mapped to the laminin-type IV and laminin type EGF-like repeat 3 regions of this domain [134]. Despite the presence of an RGD motif in one of them, none of the recombinant fragments interacted with either α5β1 or αvβ3 integrins and no cell binding to any of the recombinant fragments could be demonstrated. This is in contrast to the same domain expressed in HT-1080 cells in which cells were shown to bind in an RGD-dependent fashion [135]. A recombinant fragment from this domain was shown to bind to WARP, which is an ECM molecule expressed by chondrocytes that is a member of the von Willebrand factor A domain superfamily [136]. Domain IV is the largest of the domains and consists of multiple IgG modules similar to those found in the neural cell-adhesion molecule. The number of the IgG motifs varies between species with the human perlecan sequence possessing 21 motifs, whereas the mouse perlecan sequence contains 14 [137]. This domain has also been expressed as overlapping fragments, which have been shown to bind to the basement membrane nidogens, fibronectin and fibulin-2 [138, 139]. Domain V is the most C-terminal of the domains present in the protein core, and is made up of three regions with homology to the globular domains of the laminin α chain, which is interspersed with four epidermal growth factor-like repeats. The mouse sequence for domain V was expressed in HEK-293 cells and was decorated with either HS or CS [140-142]. The human sequence representing domain V (Leu3611 to Ser4391) was expressed in HEK-293 cells and also shown to be decorated with either HS or CS [143], whereas the recombinant product known as endorepellin (Glu3687 to Ser4391) when transfected into the same HEK-293 cells was expressed without GAG chains [144] suggesting that the missing 76 amino acids may be important for correct glycosylation at this site or that the glycosylation of recombinant PGs is variable and dependent on the expression system. Interestingly, when the sequence from domain V of the Drosophila melanogaster homologue of perlecan, unc-52, was expressed in HEK-293 cells, it was also produced as a protein without GAG decoration [145]. The presence of GAG chains on these recombinant products may provide insights into the role of the perlecan molecule given that the interaction of recombinant domain V with nidogen, fibulin [140], laminin [146] or PRELP [147], an ECM protein that binds to collagen and is involved in matrix assembly, are all dependent on the presence of the GAG chains. By contrast, interactions with either α-dystroglycan [148] or the α2β1 integrin [149] have both been shown to be protein core dependent. Interestingly, removal of the GAG chains from recombinant domain V has been shown to enhance cell binding, suggesting that the presence of the carbohydrate inhibits the interaction with the integrin [143].


The serum-borne PG, endocan, which is highly expressed in the vasculature around tumours, has been expressed in HEK-293 cells [150] and shown to have a relatively longer CS/DS chain than the endocan purified from endothelial cells. Given this, they had a similar disaccharide composition suggesting that their microstructures, which are responsible for binding growth factors and cytokines, are very similar [7].

Type XVIII collagen

The C-terminal fragment of this HS PG is known as endostatin and has been expressed in COS-7 cells. This region of type XVIII collagen, which does not contain GAG-attachment sites, has been expressed as a 20 kDa protein. Although its mechanism of action is independent of HS/heparin [151], endostatin expressed by HEK-293 cells was shown to bind to heparin [152]. Endostatin was first described in 1997 as an inhibitor of angiogenesis and it was hypothesized that it might be useful as an anti-cancer therapeutic [153], however, despite 15 years of intensive investigation in many cell and animal models, it has not been shown conclusively to have long-term clinical benefit. The mechanism of action of endostatin on cancer cells has been extensively studied but remains elusive. Although other groups have also successfully expressed fragments of type XVIII collagen and investigated its proteolysis by MMPs [154], nobody has successfully expressed full-length type XVIII collagen as an HS PG and this remains a challenge for the field moving forward.

Type XV collagen

Collagen type XV, like type XVIII collagen, contains an anti-angiogenic fragment called restin that has been expressed recombinantly [155], which cross-reacts with an antibody to endostatin. This recombinant fragment has been shown to inhibit the growth of tumours when injected into tumour-bearing animals [156]. Type XV collagen is a CS PG present in the basement membrane [157, 158] and ECM in close association with collagens where it may be involved in the fibrotic response [159]. It was expressed in insect cells and shown to regulate the adhesion and migration of both fibroblasts and fibrosarcoma cells and interacted with fibronectin, laminin and to a lesser extent vitronectin [160].

Cell-surface proteoglycans

Cell-surface PGs provide a mechanism for cells to interact with a wide variety of ECM components including enabling the formation of receptor–ligand complexes. Cell-surface PGs can themselves be shed to form soluble PGs. Although HS is found on a variety of cell-surface proteins, it is found consistently on two major families of membrane-bound PGs – the syndecans and the glypicans. In addition, the cell-surface CS PGs, CD44 and CSPG4, are described.


The syndecans are a family of four cell-surface HS PGs that span the cell membrane and are important for cell motility and bind growth factors such as FGF-2 and ECM molecules. Different splice forms of syndecan-1 were transfected into Madine–Darby canine kidney cells, which were mostly produced as HS PGs, however, in some cases a mixed population of HS- and CS-decorated PGs were produced [161]. Syndecans-1 and -4 have been studied extensively using recombinant-based expression systems to investigate their role in binding to laminin [162], as well as the migration, proliferation and differentiation of cells. For example, mouse fibroblasts expressing recombinant syndecan-1 were shown to have activated intracellular adhesion pathways via the αvβ3 integrin [163]. The ectodomain containing the HS chains can be shed by proteinases and growth factor activation [164] and this has been shown to promote cell migration [165], which has ramifications for both wound healing and cancer progression. Full-length and truncated forms of syndecan-1 were transfected into mesenchymal-derived tumour cells and shown to modulate the epithelial to mesenchymal transition of the cells [166]. The ectodomain of syndecan-4 was expressed in E. coli, which promoted cell attachment but failed to assemble focal adhesions [167] suggesting that the HS chains might play a role. The ectodomain of syndecan-1 was expressed in CHO cells and shown to be decorated with HS that contained significant amounts of the disaccharides, 2-O-sulfated iduronic acid–N-sulfated, 6-sulfated glucosamine and 2-O-sulfated iduronic acid–N-sulfated glucosamine which bound FGF2, collagen type I, fibronectin and laminin [168]. A recombinant fusion of syndecan-4 and FGF-1 was engineered and isolated from transfected CHO cell medium. This fusion PG was decorated with both HS and CS chains and was active on endothelial cells in angiogenesis assays [169].


There are six members of the glypican family of cell-surface HS PGs and these, like the syndecan family, have been shown to interact with growth factors and ECM molecules via their HS chain. However, unlike the syndecans, the glypicans do not span the cell membrane, but instead are anchored in the membrane via a glycosylphosphotidyl inositol moiety. Human myelogenous leukemia (K562) cells transfected with glypican-1 and the FGF receptor-1 (iiic splice variant) had an increased signalling capacity in the presence of FGF-2, supporting the hypothesis that glypican-1 was involved in growth factor signalling [170]. Glypican-4 was expressed in neural precursor cells and shown to also interact with FGF-2 via its HS chains [171]. Interestingly, the ARH 77 B-lymphoid cell line adhered to collagen type I when the cells were transfected with either syndecan-2 or -4, but did not adhere to collagen type I when transfected with glypican-1 [172]. Furthermore, when glypican-1 was expressed in HEK-293 cells the cells did not bind to the LG4 domain of laminin α4 chain, suggesting that glypican-1 might have more defined binding interactions with growth factors such as FGF-2 and vascular endothelial growth factor-165 [173] rather than the ECM [171]. However, Slit-2, which is an extracellular protein involved in embryonic development, has been shown to interact specifically with the HS attached to recombinant glypican-1 [174]. Soluble recombinant glypican-1 lacking the glycosylphosphotidyl inositol anchor was shown to inhibit the growth of hepatocellular carcinoma cells in culture potentially by interfering with growth factor activation [175].

Glypican-1 ectodomain was expressed in HEK-293 cells and shown to be produced as a HS PG as well as a nondecorated protein core that was also N-glycanated and N-nitrosylated, which modulated the production of nitric oxide that could oxidize unsubstituted glucosamine residues present in the HS chains [176]. Other experiments using fusion constructs containing the various extracellular domains of glypican-1 showed that when the globular domain was present, the recombinant was decorated with HS and if this domain was absent, it was decorated with CS [177].


CD44 has some spliced forms that contain the exon V3 and have been shown to be decorated with HS that can bind FGF-2 and heparin binding-epidermal growth factor (HB-EGF) [178]. When the CD44 V3–10, V3 and V8–10 forms were expressed in conjunction with exons V4–7, they contained relatively lower sulfated HS, indicating that splicing events may have distal effects on the sulfation levels of the HS that in turn may affect the way CD44 interacts with growth factors [179].

Chondroitin sulfate proteoglycan 4 (CSPG4)

CSPG4, also known as high molecular mass melanoma-associated antigen (HMW-MAA), melanoma-associated CS PG or gp240, is a large integral membrane CS PG present on the cell surface of immature progenitor cells in several developing tissue types [180]. It is also expressed on some types of malignant cells including melanomas, glioblastomas and chondrosarcomas [181-184]. The presence of CSPG4 in these cell types suggests a role for this molecule in cell proliferation and/or differentiation. The rat homologue of CSPG4 is called nerve/glial antigen 2 (NG2) which has been shown to have a large extracellular region divided into three domains as well as a transmembrane domain and a short cytoplasmic domain. Each of the extracellular domains of human CSPG4 has been expressed in CHO cells as well as fusions of domains 1 and 2 and domains 1 and 3. Each of these constructs also contained the transmembrane and cytoplasmic domains. In addition, rat NG2 was expressed in the CHO cells, however, it was not characterized with respect to GAG decoration [44]. Most of the recombinant expression has been performed with rat NG2 cDNA. Rat neural cells (B28 cell line) and human glioma cells (U251MG cell line) have been transfected to express full-length NG2 [180]. Recombinant domains of NG2 have also been expressed including the entire extracellular domain (domains 1–3), domains 1 and 2, domain 2 and domain 3 in HEK-293 cells [185]. NG2 domain 2 was expressed at a yield of 0.3 μg·mL−1, whereas the entire extracellular domain was expressed at a concentration of 8 μg·mL−1 indicating that there might be differences in the stability of the different constructs. Recombinantly expressed NG2 was found to be expressed as two major populations, one without GAG decoration and the other decorated with between 50 and 300 kDa of CS with protein core of 300 kDa, suggesting that the protein core was decorated with CS in a similar manner to the endogenously expressed PG [13]. This same phenomenon has been reported for NG2 expressed in HEK-293 cells and COS-7 cells as either the full protein core of regions that included the extracellular domain 2 [185, 186]. In addition, the PG form of NG2 has been reported as the expected full protein core of 300 kDa as well as two shorter forms of 275 and 290 kDa. The smaller forms were found to be produced by proteolytic cleavage of the extracellular domain near the transmembrane domain from trypsin as well as increased protein kinase C activity that was thought to increase the abundance of proteases such as metalloproteinases [13].

The recombinantly expressed forms of NG2 have been used to explore some of the functions of the endogenously produced NG2 in terms of collagen binding, growth factor binding and cellular responses. Through the use of deletion mutants it was possible to determine that although there are 13 potential GAG-attachment sites on the NG2 protein core, it is only decorated with one CS chain at amino acid 999 [187]. Recombinantly expressed full-length NG2 has been reported to bind and anchor collagen type VI at the cell surface [180] and to modulate cell morphology [188]. Deletion mutants of full-length NG2 were able to demonstrate that the interaction between NG2 and collagen type VI involved the extracellular domain 2 [187] and that the interaction may involve the CS chain [185, 189]. Interestingly, the CS that decorates domain 2 of NG2 plays a role in mediating collagen types V and VI binding when NG2 is bound to a substrate and collagen is in solution, however, when collagen is bound and NG2 is in solution, the CS does not play a role in the binding [185]. This suggests that the conformation of NG2 is important for its interactions with collagen. The recombinantly expressed extracellular domains of NG2 were also able to support FGF-2 binding in a CS-dependent fashion [190]. The CS attached to recombinantly expressed NG2 domain 2 was found to inhibit neurite growth and induce growth cone collapse to a similar extent as either recombinantly expressed NG2 domains 1 or 3, however, when recombinantly expressed NG2 domain 2 was treated with chondroitinase ABC to remove the CS, it was no longer inhibitory [186].

Intracellular proteoglycans

Serglycin is an heparin/HS/CS PG that is produced by many different cell types including endothelial cells [191], neutrophils and mast cells [192], and has been shown to be important for binding and packaging proteases into intracytoplasmic secretory granules. It is also present in the α-granules of platelets where it binds and packages platelet factor 4 and the serglycin in mast cells is thought to be the major source of commercial heparin manufactured from porcine intestinal mucosa or bovine lungs. It has been expressed in pancreatic cancer cells (AR4-2J) where the presence of GAG was shown to be important for the correct trafficking of the serglycin to the granules [193]. The sulfation content of the GAG on serglycin has also been shown to be important for the apical-basolateral sorting when it was transfected into Madine–Darby canine kidney with the higher sulfated forms being trafficked to the basolateral region for secretion [194]. Interestingly, the presence of GAG chains on serglycin has also been shown to be highly expressed by nasopharyngeal carcimonas and may be involved in promoting metastasis [195].


There are still many challenges awaiting the bioengineer/molecular biologist who designs and expresses PGs in cells. Many of the extracellular PGs are large and contain many domains, whereas many of the cell-surface PGs contain stretches of hydrophobic residues and the mucin-like nature of some intracellular PGs means that repetitive serine–glycine sequences need to be expressed and glycosylated. The other significant challenge of expressing PGs is the importance of the structure of their GAG chains including length, sulfation level and content of iduronic acid and how this maps to their biological functions. We are just beginning to understand how the glycosylation of PGs enhances and modulates the functions attributable to the protein core and many of these effects can only be studied using recombinant forms of the PGs. The attributes described above provide significant and on-going challenges but it is important that future research efforts are focused on these challenges so that the recombinant forms that are produced mimic the natural PG structures, which will enable them to be used as research reagents that will help us address many hypotheses in the laboratory today, and which will guide the development of new classes of therapeutics useful for many clinical applications into the future.