*Corresponding author: S. E. Humphries, Tel: +44-020-7679-6962. Fax: +44-020-7679-6212. E-mail: firstname.lastname@example.org
Familial hypercholesterolemia (FH) (OMIM 143890) is most commonly caused by variations in the LDLR gene which encodes the receptor for Low Density Lipoprotein (LDL) cholesterol particles. We have updated the University College London (UCL) LDLR FH database (http://www.ucl.ac.uk/ldlr) by adding variants reported in the literature since 2001, converting existing entries to standard nomenclature, and transferring the database to the Leiden Open Source Variation Database (LOVD) platform. As of July 2007 the database listed 1066 unique LDLR gene events. Sixty five percent (n = 689) of the variants are DNA substitutions, 24% (n = 260) small DNA rearrangements (<100bp) and 11% (n = 117) large DNA rearrangements (>100bp), proportions which are similar to those reported in the 2001 database (n = 683, 62%, 24% and 14% respectively). The DNA substitutions and small rearrangements occur along the length of the gene, with 24 in the promoter region, 86 in intronic sequences and 839 in the exons (93 nonsense variants, 499 missense variants and 247 small rearrangements). These occur in all exons, with the highest proportion (20%) in exon 4 (186/949); this exon is the largest and codes for the critical ligand binding region, where any missense variant is likely to be pathogenic. Using the PolyPhen and SIFT prediction computer programmes 87% of the missense variants are predicted to have a deleterious effect on LDLR activity, and it is probable that at least 48% of the remainder are also pathogenic, but their role in FH causation requires confirmation by in vitro or family studies.
Familial hypercholesterolemia (FH) is an autosomal dominant condition, caused predominantly by variants in the low density lipoprotein receptor (LDLR) gene (Goldstein & Brown, 1989). Affecting around 1 in 500 individuals worldwide, pathogenic changes in the LDLR peptide result in impaired uptake and or processing of low density lipoprotein (LDL) particles, which in turn result in accelerated atherosclerosis and increased risk of coronary heart disease. The LDLR gene which is located at 19p13.2, is composed of 18 exons spanning 45kb, the transcript is 5.3kb long and encodes a peptide of 860 residues. Functional domains of the peptide correspond with the exons as follows: signal sequence – exon 1, ligand binding domain – exons 2–6, epidermal growth factor precursor like domain – exons 7–14, O-linked carbohydrate domain – exon 15, transmembrane domain – exon 16 and 41bp of exon 17 and cytoplasmic domain – remainder of exon 17 and exon 18 (Yamamoto et al. 1984; Sudhof et al. 1985). LDLR is expressed ubiquitously under the control of sterol regulated negative feedback, mediated by three 16bp imperfect repeats (sterol regulatory elements) and a TATA like sequence in the promoter (Sudhof et al. 1987).
LDLR variants have been reported along the length of the gene in FH patients from around the world (Heath et al. 2001; Villeger et al. 2002). As FH-associated LDLR variants continue to be reported in large numbers, there is clearly a need for a readily accessible and easy to use locus-specific database. The database outlined in this paper provides such a resource. The University College London (UCL) LDLR FH database, which was established in 1996 and last updated in 2001 (Heath et al. 2001), has been accessed more than 31,000 times. Our aim was to check all existing entries and to convert them to standard nomenclature (den Dunnen & Antonarakis, 2000; Human Genetic Variation Society (HGVS)(http://www.hgvs.org)), add variants reported in the literature since 2001 and transfer the database to the Leiden Open Source Variation Database (LOVD) platform (Fokkema et al. 2005). In addition, wherever possible, predictions of the likely pathogenicity of reported variants have been added, using two publicly available computer programmes, namely PolyPhen (Ramensky et al. 2002) and SIFT (Ng & Henikoff, 2003), and by inspection of cross-species conservation.
Materials and Methods
The updated database may be accessed via http://www.ucl.ac.uk/ldlr. The original UCL LDLR FH database remains unchanged and can be accessed via a link on the homepage of the new database.
All existing and new entries to the LDLR FH database were adapted to adhere to the recommendations of the Human Genetic Variation Society (HGVS)(http://www.hgvs.org) (den Dunnen & Antonarakis 2000). In some instances the numbering of an entry may have been altered, for example where a deletion has occurred in a run of bases, the recommendation is that it is the most 3′ base that has been deleted, and this also applied to amino acid repeats. Furthermore, many variants previously reported as insertions have now been reported as duplications, where the inserted sequence matches that adjacent to the insertion site. All amino acid variants now follow the standard nomenclature with the initiating methionine given as number one, rather than that which was used historically for LDLR with the first residue (alanine) of the mature peptide denoted as number one. Hence 21 has been added to all original amino acid numbering, although ‘Original amino acid numbering’ appears as a field in the database. Some variants in the 5'untranslated region of the LDLR gene have in the past been numbered from the start of transcription (Hobbs' numbering Hobbs et al. 1992), and all such variants are now numbered from the C (−1) immediately preceding the A of the initiating methionine (1), hence, Hobb's numbering −1 is −94 according to the standard nomenclature.
The validity of each entry in the original database was reviewed in the published literature, and any entry that could not be verified in this way was omitted from the new database. Reports of LDLR gene variants published since 2001 were found by putting ‘Familial hypercholesterolemia’ and ‘Gene’ into a PUBMED (http://www.pubmed.gov) search.
Leiden Open Source Variation Database (LOVD) Platform
The LOVD platform (Fokkema et al. 2005) provides a gene specific homepage from which the following may be accessed: genomic reference sequence, complete allelic variant table, polymorphism table and summary tables. The database may be searched by type of variant or using either the simple search (by any or all of: exon, type of variant, disease) or the advanced search (by any or all of: exon, type of variant, sequence variation description, protein change description, disease, reference). In addition, links to other resources can be accessed from the homepage. The LOVD was installed on the UCL web server according to the download instructions provided. The database was set up for the human LDLR gene by addition of the relevant reference sequence and links to the appropriate entries on: Entrez-Gene: 3949, OMIM gene: 60694, OMIM disease: 143890 (FH), HGMD: LDLR and GDB: 119362. A link was also added to allow continued access to the original UCL LDLR FH database. A publicly available hit counter was added to the homepage by editing the ‘index.php script’ and adding a table to the database to store the count.
The following custom columns were added to the complete allelic variant table: Original aa No, Allele Name, Functional Domain, Product Activity, Predicted Effect, PolyPhen Prediction, SIFT Prediction, Refined SIFT Prediction, Species Conservation, Ethnic Origin, Country Origin, whilst existing columns: RNA, Frequency, Disease, DNA/RNA and Technique were hidden. The table legends were altered accordingly by editing the ‘legends.php script’.
Analysis of Variants
In an attempt to assess the effect of missense amino acid substitutions on the mature LDLR peptide, three publicly-available computer programmes were employed.
PolyPhen predicts whether an amino acid substitution would be probably damaging, possibly damaging or benign using (i) sequence based characterisation, e.g.: signal sequence, disulphide bond, binding site, (ii) homologous sequence analysis, against a family of homologous proteins, giving a PSIC (Position-Specific Independent Counts) score for wild type and variant, (iii) mapping to known 3D structure, (iv) structural parameters: secondary structure & solvent accessible surface area and (v) contacts with critical sites: e.g.: ligands and subunits of the protein molecule. Each missense amino acid substitution was analysed, using P01130 http://www.ebi.uniprot.org/entry/P01130 as the identifier for human LDLR peptide and following the instructions at: http://tux.embl-heidelberg.de/ramensky/index.shtml
SIFT predicts whether an amino acid substitution in a given peptide would be Tolerated or Not Tolerated by performing homologous sequence analysis against a family of proteins. Human LDLR peptide sequence (P01130) in FASTA format was used as the query for this analysis by following the instructions at: http://blocks.fhcrc.org/sift/SIFT.html.
Refined SIFT analysis was performed by comparing amino acid substitutions in human LDLR (P01130), against LDLR amino acid sequences from thirteen species: chimp (ENSPTRT00000019326), macaque (Q6S4M2), dog (ENSCAFT00000027791), pig (Q28832), cow (ENSBTAT00000016342), mouse (Q8VCT0), rat (P35952), hamster (P35950), rabbit (P20063), chick (Q7T2X3), frog (Q99087), zebra fish (Q7ZZT0) & shark (P79708). These were entered in FASTA format onto the form at: http://blocks.fhcrc.org/sift/SIFT_related_seqs_submit.html.
Human LDLR amino acid sequence and thirteen other LDLR amino acid sequences (chimp (ENSPTRT00000019326), macaque (Q6S4M2), dog (ENSCAFT00000027791), pig (Q28832), cow (ENSBTAT00000016342), mouse (Q8VCT0), rat (P35952), hamster (P35950), rabbit (P20063), chick (Q7T2X3), frog (Q99087), zebra fish (Q7ZZT0) & shark (P79708)) were aligned using the CLUSTAL W program (http://www.ebi.ac.uk/clustalw/).
Compared with the 2001 UCL LDLR database, which listed 683 variants (Heath et al. 2001), a total of 1066 individual validated LDLR variants from FH patients are now listed on the UCL LDLR FH database (the LOVD platform identifies 1035 unique events, because interpretation of the accepted nomenclature reported a number of large rearrangements as repeated events, although the sizes of individual rearrangements differ). Sixty five percent (n = 689) of the variants were DNA substitutions, 24% (n = 260) small DNA rearrangements (<100bp) and 11% (n = 117) large DNA rearrangements (>100bp). These proportions were similar to those reported in the original UCL LDLR database (62%, 24% and 14% respectively) (Heath et al. 2001).
DNA Substitutions and Small DNA Rearrangements
The 949 DNA substitutions and small DNA rearrangement variants are distributed along the length of the LDLR gene (Figure 1a). Examination of the 208 variants reported from patients in the UK revealed that their distribution broadly followed that of variants worldwide (Figure 1a) (a full table of UK reported variants may be viewed in the supplementary material). The largest number of variants were reported in exon 4 (186/949 = 20%) however, the number of variants per nucleotide revealed a more even distribution along the length of the LDLR coding sequence (Fig. 1b). Not withstanding this, the majority of variants occur in the ligand binding domain (exons 2 – 6) and the epidermal growth factor precursor-like domain (exons 7 – 14) (Fig. 1c); furthermore, it is apparent that there are more variants per base pair in the ligand binding and EGF-like domains than elsewhere in the gene (Fig. 1c).
Large DNA Rearrangements
One hundred and seventeen large DNA rearrangements are now listed on the UCL LOVD LDLR FH database (Fig. 2). The 100 deletions (85%) and 17 duplications (15%) were distributed along the length of the LDLR gene (Fig. 3a). The majority of break points occurred in introns 1, 6, 8 and 15 (Fig. 3b black bars), however, when the number of breakpoints per base pair were analysed most were found in introns 5 and 8 (Fig. 3b, white bars). It was interesting to note that the number of breakpoints correlated well with the number of Alu repeat sequences per base pair of each intron (Fig. 3b hatched bars), although breakpoints in introns 16 and 17 were under-represented. A major limitation for the analysis and comparison of large DNA rearrangements is the lack of detailed breakpoint information. Hence, a number of rearrangements that have been reported as independently occurring events by different groups, may prove to be duplicate reports of a single event; for example the 4 to 4.7kb deletion of exons 13 and 14, which has been reported by six different groups in European populations.
Predicted Effects of LDLR Variants
Exonic Small DNA Rearrangements
Two hundred and forty seven exonic small DNA rearrangements have been reported, of which 204 (83%) resulted in a frame shift and were therefore presumed to have a deleterious effect on LDLR activity. The remaining 43 (17%) inframe rearrangements could alter secondary and tertiary peptide structures, and therefore may also be pathogenic, affecting LDLR activity to varying degrees.
Exonic DNA Substitutions
SIFT, refined SIFT and CLUSTAL W cross-species LDLR amino acid alignments can all be viewed in the supplementary material. Nonsense mutations accounted for 93 of the 592 (16%) exonic DNA substitutions and as such were predicted to be damaging to normal LDLR activity. The predicted effect of the remaining 499 missense substitutions was examined using PolyPhen, SIFT and Refined SIFT analyses. Four hundred and forty three (89%) of these DNA substitutions were predicted to have an adverse effect on LDLR activity by at least one of the programs used; the remaining 56 (11%) variants were predicted to be ‘apparently non-pathogenic’ by both of these computer programs. As an example, the variation p.T726I (previously p.T705I) which is commonly reported as being found in FH patients and assumed to be FH-causing (Hobbs et al. 1992, Lombardi et al. 1995) is now known to be non-FH causing, as it also occurs in normocholesterolaemic subjects (Lombardi et al. 1997, Heath et al. 2000). This variant was designated ‘benign’ by PolyPhen and ‘tolerated’ by both SIFT analyses.
The ‘apparently non-pathogenic’ variants were further classified into ‘probably pathogenic’, ‘possibly pathogenic’ and ‘probably not pathogenic’. A variant was classified as ‘probably pathogenic’ if evidence for any of the following were found in the literature: co-segregation with FH, reports of the sequence variant being absent from normal healthy control subjects, if the residue involved was highly conserved across species, or other strong evidence of pathogenicity. ‘Possibly pathogenic’ variants were those where evidence in the literature was less convincing and they were less well conserved. ‘Probably not pathogenic’ variants were classified as those with evidence in the literature for any of the following: mild phenotype, only found in the presence of another frankly pathogenic LDLR variant, occurrence in normal controls, or if the residue was weakly conserved across species and the amino acid substitution was conservative. It should be remembered that all LDLR variants reported here were originally described in FH patients. Using this approach 27 (48%) of the ‘apparently non-pathogenic’ variants were classified as ‘probably pathogenic’, 13 (23%) as ‘possibly pathogenic’ and 16 (29%) as ‘probably not pathogenic’ (Table 1). ‘Probably pathogenic’ variants include p.A151T (previously p.A130T) which is a non-conservative substitution (non-polar to polar) and shows 79% cross species sequence conservation, and p.P341R (previously p.P320R) which is also a non-conservative substitution (non-polar to basic) and shows 86% sequence conservation. Although the variant p.V797L (previously p.V776L) is a conservative substitution and only shows 36% sequence conservation, it was also designated as ‘probably pathogenic’ as the DNA change (c.2389G >T) destroys the splice junction at the 3′ end of exon 16 (Lombardi et al. 2000).
Table 1. Classification of variants reported to be non-pathogenic by PolyPhen & SIFT computer programmes into ‘probably pathogenic’ (evidence for at least one of the following: co-segregation with FH, no reports of variant in normal controls, residue highly conserved across species, or other strong evidence of pathogenicity). ‘Possibly pathogenic’ (variants were less well conserved and other evidence in the literature was less convincing. ‘Probably not pathogenic’ (evidence in the literature for any of the following: mild phenotype, only found in the presence of other frankly pathogenic LDLR variant, occurrence in normal controls, or if the residue was weakly conserved across species and the amino acid substitution was conservative). All LDLR variants reported here were originally described in FH patients.
New AA No
Old AA No
*Although both of these variants have been reported in the presence of the splicing variant c.313+1G>C, they have been classified as ‘Probably pathogenic’ as the amino acid residues are highly conserved and the substitutions are non-conservative.
Also carries p.G382V (predicted pathogenic), unknown whether or not on same allele (personal communication)(Fouchier et al. 2005)
Triple mutant with p.Q92E (predicted benign) & c.313 + 1G>C (predicted pathogenic)(Mozas et al. 2004)*
Described as clinically heterozygous. Found as compound heterozygote: [p.Q92E; c.313 + 1G>C]+[c.1061-8T>C; p.T726I]. Double mutant with c.313 + 1G>C. Triple mutant with p.N80K & c.313 + 1G>C (Mozas et al. 2004)*
8 Dutch patients identified, considered to be damaging (J. Defesche, personal communication)
One Dutch family with 16 carriers, considered to be damaging (J. Defesche, personal communication).
Residue conserved in primates, 9/11 remaining LDLR sequences has L at this position. Found in 2 unrelated German FH patients, co-segregation with FH not demonstrated, but no other LDLR variant found and this change was not present in 100 normal chromosomes (Nauck et al. 2001).
Residue conserved in 10/14 LDLR sequences. Non-conservative substitution (basic to non-polar) at buried site.
5–15% LDLR activity when heterozygous with Q33HfsX173 (predicted to be damaging)(Hobbs et al. 1992)
Two Dutch families with 52 carriers (J. Defesche, personal communication).
20 Dutch patients identified (J. Defesche, personal communication).
Almost invariably found with c.2393del9 (could cause peptide missfolding (Ebhardt et al. 1999), the effect of both variants is ambiguous, may act together. Most common Dutch mutation (3260 patients)(J. Defesche, personal communication).
Always found on same allele as c.769C>T, p.R257W (predicted to be damaging) in this study (Fouchier et al. 2005).
13 Dutch patients identified (J. Defesche, personal communication).
No effect on function (Jensen et al. 1994). In two members of one family this was found on same allele as c.2282C>T, p.T761M and as a compound heterozygote with c.1123_1124insC, p.Y375SfsX6 & c.1120_1121GG>TC, p.G374S
Found in 1/18 non-FH Finnish subjects with moderate hypercholesterolaemia (proband LDL-C 6.70mmol/l). Not found in 123 healthy subjects or 145 FH patients. No segregation with high LDL-C in probands family (Vuorio et al. 1997).
Found in two unrelated Bulgarian patients with possible FH (LDL 6.71mmol/l and 7.05mmol/l but no xanthoma). No family study but absent from 120 chromosomes form healthy Bulgrian subjects (Mihaylov et al. 2004).
Not thought to be disease causing as inherited from non-FH father. This patient also carries c.1436T>C, p.L479P(predicted to be damaging) (Naoumova et al. 2004).
Found to be allelic with c.1268T>C, p.I423T. Transfection into hamster ovary cells (CHO1d1A7) show that p.E277K does not affect LDLR activity where as c.1268T>C, p.I423T does (Ekström et al. 2000)
4/14 have E at this position. Reported as heterozygous with p.G478R (predicted to be pathogenic)(Hobbs et al. 1992).
9/14 have S at this position. In two members of one family this was found on same allele as p.Y375SfsX6, and as a compound heterozygote with p.A50S & p.T761M (Brusgaard et al. 2006).
4/14 are T at this position.
Rare only one patient identified in NL. Residue conserved in 6/14 LDLR peptide sequences, 3/14 are T at this position. Conservative substitution (both non-polar).
6/14 have V at this position. Found in two normal controls as well as 3 FH patients, conclude that this is a non-disease causing variant (Chang et al. 2003).
Variant p.E277K (previously p.E256K) is an example of a variant which was designated as ‘probably non-pathogenic’, because although it is a non-conservative substitution (acidic to basic) and shows 71% sequence conservation, Ekström et al. (2000) demonstrated that it has no effect on LDLR activity in transfection studies; furthermore, it has mostly been reported in the presence of other FH causing LDLR variants (Pereira et al. 1995; Ekström et al. 2000; Cenarro et al. 1998; Sözen et al. 2005). Another example of a ‘probably non-pathogenic’ variant is p.S397T (previously p.S376T); this is a conservative substitution (both polar) and 29% of other LDLR peptides also have threonine at this position.
DNA Substitutions and Small DNA Rearrangements in the Promoter Region
Twenty four variants have been reported in the promoter region of the LDLR gene (19 DNA substitutions, 5 small DNA rearrangements). As shown in Figure 4, 71% of these fall within either one of the sterol regulatory elements (SRE1 (−130 to −144), SRE2 (−145 to −161) or SRE3 (−180 to −195)) or in the cis-acting elements (FP1 (−219 to −238) or FP2 (−269 to −280)) (Sudhof et al. 1987; Mehta et al. 1996). In vitro expression reports were available for 8 variants (Fig. 4b); of these, 7 resulted in reduced expression from the LDLR promoter, whilst the variant c.−217C >T (2bp away from 3′ end of FP1), elevated expression to 160% of normal (Scholtz et al. 1999). Variants that fall outside the regulatory elements also altered expression (Fig. 4b, c.−217C >T & c.−120C >T); however, c.−268G >T which lies in FP2 has been reported as a non-FH variant present at polymorphic frequency in African subjects (Scholtz et al. 1999). The finding that variants in the LDLR promoter region can reduce, elevate or have little effect on transcription, irrespective of whether or not the variant lies within one of the regulatory elements, emphasises the importance of in vitro expression studies for these variants.
DNA Substitutions and Small DNA Rearrangements in Intronic Sequences
Eighty six variants have been reported in LDLR intronic sequences (78 DNA substitutions, 8 small DNA rearrangements), of which at least 67% (n = 58) were predicted to affect normal splicing as they disrupted the consensus splice sequences (Wu & Krainer, 1999; Thanaraj & Clark, 2001). In vitro studies are required to confirm this, and to determine what effect the variants that do not lie within the consensus splice sites have on normal LDLR splicing.
Large DNA rearrangements
The effect of large deletions and duplications on LDLR activity is expected to be profound; clearly variants with 3′ deletions that remove the promoter and or exon 1, would produce no peptide (previously denoted as Class 1 or null mutations (Hobbs et al. 1990)). Deletions and duplications involving other parts of the gene may have less complete but none the less damaging effects, such as impaired transport of the peptide to the cell surface or failure of the mature peptide to anchor in the cytoplasm. As with other types of variant, a clearer picture of the effect of large DNA rearrangements would be given by in vitro expression studies, although such analysis would be limited technically by the sizes of the rearrangements involved.
FH associated variants continue to be reported along the length of the LDLR gene by groups from around the world, with the majority falling within the ligand binding (40%) and EGF precursor-like (47%) domains. LDLR variants arising from DNA substitutions and small rearrangements were predicted to adversely affect normal LDLR activity in at least 86% of cases. Computer programs were clearly valuable in predicting the effect that a gene variant has on biological function, however, it became apparent that they should not be taken as absolute proof by themselves, as a number of variants which were reported as ‘benign’ (PolyPhen) and ‘tolerated’ (SIFT) are probably pathogenic on a more detailed consideration of all available data (Table 1). Other limitations, which are especially relevant to LDLR, is that neither of these programs predicts the effect that two allelic variants together will have on biological function, nor can they predict the effect of amino acid insertions, duplications or deletions (caused by inframe DNA insertions, duplications or deletions). While a more accurate prediction of biological function may be gained by in vitro studies, these are costly and time consuming. Co-segregation of an LDLR variant with FH may be thought of as indicative that it is FH causing; however, unless the family is large the co-segregation may have occurred by chance. The issue is further complicated, as raised LDL levels commonly occur in subjects in the general population (because of polygenic or dietary influences), so some relatives may not carry the family pathogenic variant and yet still have elevated LDL levels. By contrast, not all family members carrying genuinely pathogenic variants will have elevated LDL levels, as co-inheritance of other LDL-lowering genes (e.g. variants of PCSK9 (Cohen et al. 2006)) or dietary influences may mask the effects. Finally, a non-pathogenic LDLR variant could be seen to co-segregate with FH if it was allelic with a pathogenic variant that was yet to be identified in the family, as they would be in allelic association.
The prevalence of FH-causing LDLR variants in exon 4 could be explained by the preponderance of mutable CpG sequences in this exon (Hobbs et al. 1990; Day et al. 1997). Furthermore, amino-acid changes in this key part of the ligand-binding domain could be expected to have a particularly detrimental effect on LDLR function, which would consequently make it more likely for subjects carrying such variants to present in lipid clinics. However, exon 4 is the largest exon and when the number of variants per base pair is taken into account, the distribution of variants along the gene becomes more even, which in turn suggests that amino acid changes anywhere in the gene could potentially cause FH. Interestingly, the distribution of the 56 variants predicted by PolyPhen and SIFT to be non-pathogenic was also spread throughout all exons (Table 1), suggesting that any region may also contain potentially benign amino acid changes.
The PolyPhen and SIFT computer prediction programs were convenient to use and gave some indication of the likely effect that an amino acid substitution would have on the mature LDLR peptide. The PolyPhen program relied on a combined strategy; analysing sequence characterisation (eg: disulphide bonds, binding sites), homologous sequence comparison, 3D structure mapping, structural parameters (e.g. secondary structure) and contacts with critical sites (e.g. ligands or peptide subunits). The SIFT program compared all proteins known to show homology with LDLR, including less relevant proteins such as vitellogenins and very low density lipoprotein receptors. Removing these from the comparison in the ‘refined SIFT’ analysis as used in this study, identified additional variants which were predicted to be pathogenic. Although helpful, it should be remembered that these computer prediction programs do only give an indication of the effect that a variant may have on the biological activity of the mature LDLR peptide.
It is relevant to note also that DNA substitutions which result in conservative amino acid changes, or even which result in no amino acid change at all, can be pathogenic, for example the ‘benign’ variant p.V797L (previously p.V776L) at the 3′end of exon 16, which causes FH because it disrupts the correct splicing of the LDLR transcript. There have also been reports of synonymous amino acid changes that impair protein function, for example in the Multidrug Resistance 1 gene, where a change in codon usage resulted in a low abundance tRNA being required (Kimchi-Sarfaty et al. 2007). The requirement for low abundance tRNA slowed translation and consequently altered the timing of co-translational folding, thereby altering biological function of the mature peptide. Whether such a mechanism could act in the case of the LDLR protein is unknown. However, it would further our knowledge if such variants were reported in the literature and listed on the database. Taking into account the limitations of computer based prediction programs, the gold standard for determining the effect that a variant has on protein function is still in vitro expression, using cDNA derived from site-directed mutagenesis as a template. Unfortunately such assays have only been performed on a very small number of variants as they are costly and time consuming.
To date only 86 FH associated variants have been reported in LDLR intronic sequences although it is probable that this is an under-representation of the true number, as historically such sequences have not been widely studied. Since publication of the human genome sequence (Lander et al. 2001; Venter et al. 2001) it is now possible to design primers that allow intronic sequences to be examined in mutational analysis; however, this is still not straight forward, as the choice of primer sequences is restricted by the high concentration (85% of non-exon flanking sequences, Amsellem et al. 2002) of Alu sequences in LDLR introns. Of the intronic LDLR variants so far reported many have been found in FH patients where previous studies have failed to find any variants (Amsellem et al. 2002, Graham et al. 2005). Although family association of such variants with FH and their absence in normal chromosomes provides strong evidence for their role in FH, in vitro splicing assays (Whittall et al. 2007) or quantitative RT-PCR (Graham et al. 2005) can give an indication of how a variant might actually affect normal splicing and expression of the LDLR transcript.
The role of Alu repeats in the genesis of large DNA rearrangements of the LDLR gene has long been recognised (Hobbs et al. 1990, 1992). Publication of the human genome DNA sequence has revealed that there are 98 Alu repeats within the LDLR gene (95 in intronic sequences and 3 in the 3′untranslated region, Amsellem et al. 2002). As might be expected, there appears to be correlation between the number of breakpoints and the number of Alu repeat sequences in each intron, and indeed there are no breakpoints reported in intron 9 and only one in intron 13, neither of which have any Alu repeats. The number of large DNA rearrangements reported here makes up a smaller proportion of the database than in 2001 (11% in 2007, 14% in 2001), and it is probable that this is due to under-reporting, as mutational analysis has focused in recent years on identifying exonic variants. Furthermore, with Alu repeats accounting for 65% of LDLR intronic sequences (Lander et al. 2001; Venter et al. 2001; Amsellem et al. 2002), and the availability of new assays for screening for LDLR rearrangements (Holla et al. 2005) it is likely that many more large DNA rearrangements will be reported in FH patients in the future.
The distribution of variants identified in patients studied in the UK mirrors that found worldwide, and this observation was not surprising as the UK population is heterogeneous, with many UK residents being of different geographical and ethnic origins. There is a slightly higher preponderance of exon 3 variants reported in UK patients than worldwide, but this may be due to chance.
The LOVD Open source database was chosen as a convenient ‘off the shelf’ package on which to load the updated UCL LDLR FH variation database. Housing the database on the UCL web server allowed us to adapt it more closely to our specific needs. The value of adding a hit counter to the home page was emphasised by the observation that the database was accessed over 2000 times in the first week after release to the public. In the future the UCL LDLR FH database will be transferred to the latest version of the LOVD platform which is currently in the development stages (http://www.lovd.nl/2.0/index.php). This will allow greater flexibility for data storage and retrieval, and will be generally more user friendly. We welcome the submission of new LDLR variants to the database and would be grateful if any errors or omissions could be reported to us via the links on the home page (http://www.ucl.ac.uk/ldlr).
SEH and RW acknowledge British Heart Foundation support (RG 2005/014) and SL was supported by a grant from the Department of Health to the London IDEAS Genetics Knowledge Park. CH is supported by the Pinto Foundation. We thank A.H Foster who gave his time and expertise to this project with no financial gain. We also thank the UCL IS Department, Dr Ivo Fokkema and Dr Johan den Dunnen for their help with the LOVD platform, Dr Joep Defesche for allowing access to unpublished variant data, Prof Sue Povey for assistance with the database and anyone who reports database errors.