Classification of missense substitutions in the BRCA genes: A database dedicated to Ex-UVs


  • Communicated by Rolf H. Sijmons


Unclassified sequence variants (UVs) arising from clinical mutation screening of cancer susceptibility genes present a frustrating issue to clinical genetics services and the patients that they serve. We created an open-access database holding missense substitutions from the breast and ovarian cancer susceptibility genes BRCA1 and BRCA2. The main inclusion criterion is that each variant should have been assessed in a published work that used the Bayesian integrated evaluation of unclassified BRCA gene variants. Transfer of data on these substitutions from the original publications to our database afforded an opportunity to analyze the missense substitutions under a single model and to remove inconsistencies that arose during the evolution of the integrated evaluation over the last decade. This analysis also afforded the opportunity to reclassify these missense substitutions according to the recently published IARC 5-Class system. From an initial set of 248 missense substitutions, 31 were set aside due to nonnegligible probability to interfere with splicing. Of the remaining substitutions, 28 fell into one of the two pathogenic classes (IARC Class 4 or 5), 174 fell into one of the two nonpathogenic classes (IARC Class 1 or 2), and 15 remain in IARC Class 3, “Uncertain.” The database is available at Hum Mutat 33:22–28, 2012. © 2011 Wiley Periodicals, Inc.


The breast and ovarian cancer susceptibility genes BRCA1 and BRCA2 (MIM#s 113705 and 600185, respectively) are the most routinely tested genes for cancer predisposition. Under the criteria used in the United States to select patients for testing, between 10% and 15% of subjects who undergo full-sequence BRCA testing are found to carry a clearly pathogenic sequence variant. This pathogenic variant can be either a nonsense variant, a small insertion or deletion variant that creates a frameshift, a larger gene rearrangement, a variant that creates a severe splicing aberration, or a known pathogenic missense substitution. However, more than 5% of patients are found to carry an unclassified variant (UV)—usually either a missense substitution or a variant that falls in the splice junction consensus regions but outside of the canonical GT–AG dinucleotides. Because of ongoing efforts to classify UVs, most former-UVs with frequencies of above 0.1% have now been classified; consequently, the remaining UVs are individually rare. Here, we refer to former-UVs that are now classified as Ex-UVs.

The main method for classifying BRCA gene UVs is called the “integrated evaluation” or the “multifactorial method” [Easton et al., 2007; Goldgar et al., 2004, 2008; Spurdle, 2010]. The integrated evaluation applies a Bayesian statistical inference to each individual UV analyzed. The evaluation begins with a prior probability in favor of pathogenicity (prior probability) based on sequence analysis [Easton et al., 2007; Tavtigian et al., 2008]. Observational data summarized as odds ratios in favor of pathogenicity (likelihood ratios, LRs) are used to update the prior, resulting in a posterior probability in favor of pathogenicity (posterior probability). The posterior probability is converted to a qualitative classification via a 5-Class lookup table [Plon et al., 2008]. Once this conversion has been made, the variant is considered “classified.” Nonetheless, additional data may result in reclassification; this is especially true for Ex-UVs with an initial classification result of “Class 3, Uncertain,” a possibility that highlights the distinct meanings of “Unclassified” (i.e., the variant has not yet been subjected to an integrated evaluation) and “Uncertain” (i.e., the result from an integrated evaluation was a posterior probability that falls between 0.05 and 0.95) in this field.

Individual elements included in the integrated evaluation have been reviewed recently [Goldgar et al., 2008; Spurdle, 2010], and are summarized briefly below.

Prior Probability

At the protein level, missense substitutions are initially assessed by their position in the protein. If a substitution falls in one of the domains of BRCA1 or BRCA2 known to harbor missense substitutions that are pathogenic because of missense dysfunction (for BRCA1, the RING and BRCT domains; for BRCA2 the DNA-binding domain and perhaps the PALB2 interaction domain), then an in silico assessment of the substitution is performed using a missense substitution analysis program, such as Align-GVGD [Tavtigian et al., 2008], that has been calibrated so that its output can be interpreted as either a probability in favor of pathogenicity or an LR. At the mRNA-processing level, splice site fitness programs such as MaxEntScan [Yeo and Burge, 2004] could in principle be used to evaluate the probability for a sequence variant to damage a wild-type splice site and/or to create a de novo splice site. However, the splice site program calibration is in progress and has not yet been entirely incorporated into the analysis model.

Observational Data

Currently, four types of observational data are included in the integrated evaluation.

Cosegregation of UVs with cancer phenotype in pedigrees

The cosegregation LR can be calculated when substantial genotyping data from a pedigree(s) in which a UV of interest is present are available. The underlying algorithm is derived from linkage analysis and is evaluated under the hypothesis that the studied variant has the same penetrance as an “average” protein truncating BRCA mutation, compared with the hypothesis that the variant segregates independent of disease in the pedigree(s) under study [Thompson et al., 2003].

Personal and family history

This LR is a comparison of personal and family history between individuals carrying a given UV, those with a demonstrated pathogenic BRCA mutation, and tested individuals from the same population who were found to have wild-type BRCA sequences (aside from known neutral variants) [Easton et al., 2007]. Derivation of this LR depended on a large database of mutation data provided by Myriad Genetic Laboratories Inc. The LR will have to be recalibrated in order to be applied to family history data from sources that use patient selection criteria that are different from those used in the United States, or more generally, from sources from which the frequency of pathogenic mutations is not approximately equal to that present in the Myriad dataset that was used for the LR derivation.

Co-occurrence with known pathogenic mutations

The basis of this LR is that homozygotes and compound heterozygotes for pathogenic mutations in BRCA1 and BRCA2 are rarer than would be expected from their independent frequencies [Abkevich et al., 2004; Judkins et al., 2005]. Reasons behind this phenomenon include that BRCA1-null genotypes are often embryonic lethal; similarly, BRCA2-null genotypes are often embryonic lethal or else can cause Fanconi anemia [Evers and Jonkers, 2006; Howlett 2002; Wagner, 2004]. The equation for calculating the co-occurrence LR is structured as a binomial LR based on the probability that an individual in the test population who carries an unclassified neutral variant also carries (in trans) a clearly pathogenic mutation, and an assumed probability that a phenotypically normal individual in the test population who carries an unclassified but actually pathogenic variant also carries (in trans) a deleterious mutation [Goldgar et al., 2004].

Tumor immunohistochemistry and histological grade

The basis of this LR is that some physical tumor characteristics are more common in tumors from carriers of pathogenic BRCA1 or BRCA2 mutations than among tumors from noncarriers [Hofstra et al., 2008]. For example, breast tumors from BRCA1 carriers are notably more likely to be negative for the estrogen receptor, progesterone receptor, and HER2/neu than tumors from noncarriers; tumors from BRCA2 carriers are slightly more likely to be positive for tubule formation than are tumors from noncarriers. By measuring the frequencies of the possible combinations of such characteristics in the tumors of carriers and noncarriers, Chenevix-Trench et al. [2006] and Spurdle et al. [2008] were able to work out empirical LRs for various combinations of tumor characteristics. It is important to note that the individual tumor characteristics such as estrogen receptor status and progesterone receptor status are probably not conditionally independent; consequently, it is inappropriate to measure LRs for each individual characteristic and then use the product of those LRs in an integrated evaluation of a particular UV [Goldgar et al., 2008].

Operationally, a problem faced by testing laboratory staff, clinical geneticists, genetic counselors, and potentially patients is that it is very difficult to go through the literature, identify papers that have used bona fide implementations of the integrated evaluation to assess UVs, and then determine whether a sequence variant reported as a UV—or as one of the clinically used synonyms such as variant of uncertain significance (VUS)—has actually been classified and is therefore actually an Ex-UV. Focusing on missense substitutions, we have gone through the literature, identified papers that have used bona fide implementations of the integrated evaluation, and cross-referenced the UVs actually assessed. Within the limitations of our current analytic ability, we have extracted appropriate prior probabilities for these missense substitutions, extracted observational LR data for these substitutions, combined them to calculate posterior probabilities, used the posterior probabilities to determine qualitative classifications, and recorded the results in a curated Leiden Open Variant Database (LOVD) [Fokkema et al., 2005].


To identify published studies that used bona fide implementations of the integrated evaluation, we used the ISI Web of Knowledge tool to find every paper that cited the original description of this method [Goldgar et al., 2004]. Our search, conducted in December 2009, found 90 such papers. We filtered these with the query (BRCA1 or BRCA2 or BRCA) and (UV* or VU* or UCV* or unclassified or uncertain* or unknown* or substitution* or variant* or missense*). We then manually screened abstracts from the remaining 72 papers, resulting in a final set of 15 papers.

Missense Priors

To automate the process of determining prior probabilities for single nucleotide substitutions to the BRCA gene coding sequences, we wrote a program that performs the following actions:

  • (a)Models every possible single nucleotide substitution from 25 bp before to 25 bp after each coding exon.
  • (b)Gives each substitution HGVS nucleotide nomenclature, HGVS amino acid nomenclature, and Breast Cancer Information Core (BIC) nucleotide nomenclature names.
  • (c)Scores each missense substitution with Align-GVGD using curated BRCA1 and BRCA2 protein multiple sequence alignments in which the most diverged sequence is from the sea urchin Strongylocentrotus purpuratus.
  • (d)Applies the Align-GVGD-specific and position-specific probabilities in favor of pathogenicity defined in Table 3 of Tavtigian et al., [2008]. The alignment through sea urchin is used for steps (c) and (d) because, of the depths of alignment tested, this depth of alignment gave the clearest resolution in risk prediction between the grades of missense substitutions defined by Align-GVGD [Table 2 and Fig. 5 of Tavtigian et al., 2008].

Splicing Priors and Exclusion

Since work on splicing prior probabilities is in progress, we chose to limit this analysis to missense substitutions and to exclude substitutions with nonnegligible probability to cause aberrant splicing (either damaging a wild-type splice site or creating an exonic de novo splice site). First, we set aside substitutions falling in the first three bases or the last three bases of each exon because these often disrupt either the canonical splice donor or the canonical splice acceptor. Second, we scored with MaxEntScan [Yeo and Burge, 2004] all of the de novo splice donors and de novo splice acceptors recorded in the databases of aberrant 3′- and 5′-splice sites (DBASS3 and DBASS5) as of December 15, 2010 [Buratti et al., 2011], as well as the corresponding canonical splice sites that had been skipped in the corresponding genes. DBASS3 (for 3′-splice sites, i.e., splice acceptors) can be found at, and DBASS5 (for 5′-splice sites, i.e., splice donors) can be found at Third, we scored with MaxEntScan all possible single nucleotide substitutions to the open reading frames of BRCA1 and BRCA2. We then used the relationship between the scores of spliceogenic sequence variants recorded in DBASS3 and DBASS5 and all possible substitutions in BRCA1 and BRCA2 to determine thresholds below which we could consider the probability to create either a de novo donor or a de novo acceptor to be negligible. Note that MaxEntScan was used for this analysis for three reasons: (1) Houdayer et al. found MaxEntScan to be the single most accurate program for analysis of splice donor sequence variants [Houdayer et al., 2008], (2) we are unaware of evidence that any program is significantly more accurate than MaxEntScan for analysis of splice acceptor sequence variants, and (3) we were able to use the program to analyze all possible single nucleotide substitutions across the coding exons and proximal splice junction regions of BRCA1 and BRCA2.

Observational Data

For each sequence variant analyzed in the papers that we found to contain methodologically valid integrated evaluations, we combined the prior probabilities described above with the published odds in favor of causality determined from segregation, personal and family history, co-occurrence, and/or tumor immunohistochemistry to arrive at a posterior probability. For sequence variants assessed in more than one paper, we (1) retained the observational data from the chronologically first result that yielded the most extreme classification, and (2) checked for examples of conflicting results. We would interpret a variant as having a conflicting result if it received a posterior probability of less than 5% on the basis of one paper's analysis and a posterior probability of more than 95% on the basis of another paper's analysis (there were none). Data type point estimates were retained as follows:

  • (a)Segregation LRs were retained without modification.
  • (b)Calculated personal and family history LRs were retained without modification. If personal and family history data were described but an LR not calculated, we used Table 2 from Easton et al. [2007] to calculate an LR. When data from one individual (or family) fit multiple categories, we used the result from the single most severe category. To avoid issues of nonindependence, we did not multiply results from multiple proband categories or multiple family history categories. When appropriate, we did multiply results from the proband category with results from the family history category.
  • (c)Co-occurrence LRs were retained without modification.
  • (d)Immunohistochemistry odds ratios from breast tumors were recalculated from estrogen receptor status, cytokeratin profile (if available), and tumor grade per the analysis underlying Spurdle et al. [2008]. Immunohistochemistry odds ratios from ovarian tumors were calculated according to Spearman et al. [Spearman et al., 2008].

For each sequence variant included, the paper that presented these data was recorded as the primary reference, the individual LRs were included in the database, and the likelihoods multiplied together to obtain the product of LRs. For each individual sequence variant included, the reference that provided the prior probability was recorded as the secondary reference, the prior was recorded, and the posterior probability calculated from the prior and product of LRs according to Bayes' rule. The posterior probability point estimate was then interpreted into a qualitative classification according to the International Agency for Research on Cancer (IARC) 5-category classifier [Plon et al., 2008].

LOVD Programing

Our BRCA genes Ex-UV database was built on LOVD v.2.0 Build 22, with several modifications. One shortcoming to manual data entry is the potential for data entry errors. The LOVD system authorizes external modules to work within its pages (such as the embedded modules showmaxdbid, Mutalyzer, and reCaptcha), allowing us to program internal checks to limit such errors. We implemented an LOVD module that, given a BRCA gene nucleotide substitution in HGVS nomenclature, auto-fills the correct exon number, HGVS amino acid substitution designation, and BIC nomenclature nucleotide substitution. The same module calculates posterior probabilities from prior probabilities and LRs, and then picks IARC class based on the posterior probability. This module is coded in a combination of PHP and Javascript: the PHP part is used to fetch the information to auto-fill and to display data to the user in a pop-up, and Javascript is used to allow the pop-up and the LOVD submission page to communicate with each other. With this combination, one click closes the pop-up and sends the data directly in the relevant fields of the submission page. As the submitter is not required to use these modules, we also wrote a script that finds all entries where the missense, splice, and combined prior are not complete or are incompatible, and that checks that the posterior probability and IARC class are compatible. This feature is only usable by the database administrator and provides a regular check on database integrity. The PHP code for the checker is similar to that of the auto-fill module, the main difference being that this is applied to batches of variants within the database.

We have also modified the LOVD variant display page to change what would by default have been text-only “Primary Reference,” “Secondary Reference,” and “IARC class” data columns to hyperlinks. To do this, we modified the source code of the variants list display page to recognize each paper and each “IARC class” text string within the database, and then changed these into HTML hyperlinks. At first glance, the more direct approach would have been to directly save these HTML hyperlinks in the underlying MySQL database fields, allowing the LOVD variant list display page to simply fetch the MySQL database and display the data. However, security features of the LOVD system consider this sort of modification forbidden cross-site scripting. Nonetheless, using hyperlinks both decreases the opportunity for data entry errors and assists users in locating the primary literature associated with the data populating our resource.

To simplify initial population of the BRCA Ex-UV database, we have limited our initial analysis to the set of single nucleotide missense substitutions to the open reading frames of BRCA1 and BRCA2 for which we can provide both sequence analysis based prior probabilities and observational data summarized as LRs in favor of pathogenicity. There are two main routes by which an open reading frame single nucleotide missense substitution can be pathogenic: the underlying nucleotide substitution can interfere with splicing or the missense substitution can interfere with a key element of protein function. Accordingly, a sequence analysis based prior probability should explicitly incorporate two elements: (1) missense effect on the protein, and (2) nucleotide substitution effect on the splicing. The prior probabilities for missense effect on the protein are based on a combination of position in the proteins and, for substitutions that fall in one of the key domains of the proteins where there is clear evidence that missense loss of function can be pathogenic, an in silico analysis of cross-species sequence conservation and relative missense substitution severity [Easton et al., 2007; Tavtigian et al., 2008]. For substitutions falling outside of the key domains and not near a splice junction, the measured prior probability was 0.00 (95% CI 0.00–0.04)[Easton et al., 2007]. Because a prior of 0.00 cannot be used in a Bayesian calculation, we had used 0.01 for this prior. In the interest of making the priors slightly more conservative, we reassign this prior to the midpoint of the confidence interval, 0.02. For substitutions falling within the key functional domains but at evolutionary variable positions and assigned the relatively benign score of C0 by our missense analysis program Align-GVGD, the measured prior probability was 0.00 (95% CI 0.00–0.06)[Tavtigian et al., 2008]. For the same reason as above, we reassign the prior for this set of substitutions to the midpoint of their confidence interval, 0.03.

We then populated the resulting LOVD with the data defined under subsections, Missense Priors, Splicing Priors and Exclusion, and Observational Data, above. The database is available at

Results and Discussion

The LOVD system provides a proven, flexible environment for the creation of locus-specific databases. We designed the BRCA Ex-UV database so that, in the views that list sequence variants, the leftmost seven columns provide the minimal information that a clinician needs in order to know the status of a missense substitution included in the database: the substitution's name in HGVS and BIC nomenclature, its posterior probability, its classification in the 5-Class IARC system, and two literature references on which that classification is based. The “IARC class” link leads to a screenshot of the classification table and summary clinical recommendations from Plon et al. [2008]. The literature reference links lead to the paper in which the prior probability in favor of pathogenicity is derived and the paper that includes the observational data that contribute to classification.

Scrolling further to the right reveals columns that detail the prior probabilities based on expected missense substitution effect and expected effect on splicing, LRs in favor of pathogenicity for the observational data included in the classification model, a column that will eventually capture functional assay data, and the product of these LRs. By organizing the data in this way, we strive to make the database an accessible and efficient resource for a wide-ranging audience of clinicians, researchers, and laboratory staff.

Most of the empirical prior probability measurements that have been made on BRCA1 and BRCA2 sequence variants actually measure the combination of missense and splice effects. Since the vast majority of substitutions are assigned a prior of 0.02 or 0.03 (previously 0.01) [Easton et al., 2007; Tavtigian et al., 2008], it follows that the splice effects prior for the vast majority of substitutions is ≤0.03. Therefore, we sought to make a sequence analysis based definition of this subset.

Using MaxEntScan, we scored all of the sequence variants in the DBASS3 and DBASS5 databases that create de novo splice junctions within an exon. In doing so, we recorded the score of the normal sequence at which the de novo splice happens (after mutation), the score of the mutant sequence, and the score of the canonical splice junction that is replaced by the de novo splice event. A table of these scores is available from SVT on request. We then used MaxEntScan to score all possible single nucleotide substitutions to the open reading frames of BRCA1 and BRCA2, both from the point of view of de novo donor and de novo acceptor. The graph in Figure 1 displays the distribution of MaxEntScan donor scores for the de novo donors in the DBASS5 database. The graph also displays the de novo donor scores for all possible single nucleotide substitutions to the segments of BRCA1 and BRCA2 that are both extremely unlikely to harbor a variant that damages a wild-type splice junction (i.e., sequence variants falling within 3 bp of a canonical splice junction are excluded) and unlikely to harbor a pathogenic missense substitution (i.e., excluding substitutions falling within the BRCA1 RING or BRCT domains, or the BRCA2 DNA binding domain or PALB2 interaction domain). Concatenated, these gene segments include 79.5% of all possible single nucleotide substitutions to the open reading frames of BRCA1 and BRCA2, and the empirically measured prior probability for substitutions falling in these regions—as an undifferentiated group—is 0.0 (95% CI 0.00–0.04) [Easton et al., 2007]. This graph (Fig. 1) is also annotated with the position of the 10th percentile MaxEntScan score for sequence variants recorded in DBASS5 that create de novo splice donors. In other words, 10% of DBASS5 sequence variants reported to create de novo donors have MaxEntScan scores at or below the 10th percentile demarcation. From this annotated graph, it is clear that the distribution MaxEntScan scores for variants recorded in the DBASS5 database that creates de novo donors are well resolved from the distribution MaxEntScan scores for BRCA gene substitutions, and that the vast majority of BRCA gene substitutions have MaxEntScan scores that are not indicative of de novo donor creation.

Figure 1.

For splice donors, the program MaxEntScan [Yeo and Burge, 2004] scores sequences of length nine nucleotides for their fitness as a splice donor under the assumption of a splice junction following the third nucleotide of the given sequence. Every possible single nucleotide substitution to the coding exons of BRCA1 and BRCA2 was scored nine times, that is, with the substitution placed from the first to the ninth nucleotide of the program's scoring window. 1For each possible substitution, the given MaxEntScan splice donor score is the highest (most fit as a splice donor) of those nine scores. 2Density: BRCA gene variants are approximately the fraction of MaxEntScan splice donor scores for BRCA variants that fall in an x-axis interval of one MaxEntScan unit; the integral of this curve is exactly 1.00. 3Density: DBASS5 de novo donors are approximately the fraction of MaxEntScan splice donor scores for de novo donors from the DBASS5 database that fall in an x-axis interval of one MaxEntScan unit; the integral of this curve is exactly 1.00. 4The 10th percentile is the MaxEntScan score below which fall only 10% of de novo donor mutations recorded in the DBASS5 database. Note that it is a coincidence that the BRCA gene variants curve and DBASS5 curve cross at the 10th percentile demarcation.

For the dataset as a whole, the ratio of DBASS5 de novo donors to possible BRCA gene substitutions is (62/37,782) = 0.0016. For the <10th percentile subset, this ratio is (7/36,836) = 0.00019—almost an order of magnitude lower than for the entire group. The midpoint of the 95% CI of the probability in favor of pathogenicity for single nucleotide substitutions in the dataset as a whole is 0.02. Given these data, the probability that a single nucleotide substitution drawn at random from the ≤10th percentile subset will give rise to a pathogenic de novo donor must be <<0.02 and consequently negligible from the point of view of determining an operationally useful prior probability. Therefore, to avoid having to determine splicing priors for potentially more severe categories at this time, we restrict initial population of the Ex-UV database to exonic substitutions that fall more than 3 bp from the end of an exon and that reside within the MaxEntScan <10th percentile categories for both de novo donors and de novo acceptors. Thus defined, this dataset still contains >95% of all possible single nucleotide substitutions to the open reading frames of BRCA1 and BRCA2 but nonetheless excludes the vast majority of substitutions that either damage a wild-type splice junction or create a de novo splice junction.

Populating the Database

From 12 papers [Chenevix-Trench et al., 2006; Easton et al., 2007; Farrugia et al., 2008; Goldgar et al., 2004; Lovelock et al., 2006; Osorio et al., 2007; Spearman et al., 2008; Spurdle et al., 2008; Sweet et al., 2010; Tavtigian et al., 2006; Tischkowitz et al., 2008; Wu et al., 2005], we found 248 missense substitutions that had been subjected to an integrated evaluation. Two older articles were also used to include data relevant to the assessment of missense substitutions that were widely considered classified before the integrated evaluation was developed [Miki et al., 1994; Wooster et al., 1995]. Of these 248 missense substitutions, 10 were set aside because they fall in the first three or last three base pairs of an exon and consequently have nonnegligible prior probabilities to interfere with a canonical splice junction. A further 21 were set aside because their MaxEntScan scores were indicative of nonnegligible probability to create a de novo splice junction (Table 1). The remaining 217 substitutions were then included in the LOVD; 112 from BRCA1 (Table 2) and 105 from BRCA2 (Table 3).

Table 1. Number and Status of Excluded Missense Substitutions
 Reason for exclusion
 Exon endsPotential de novo splice
Previous classificationFirst three base pairsLast three base pairsDonorAcceptor
Table 2. Number and Fate of BRCA1 Missense Substitutions in Existing Papers by Classification
 Current IARC class
Previous classificationClass 1Class 2Class 3Class 4Class 5
Total8264218 (112)
Table 3. Number and Fate of BRCA2 Missense Substitutions in Existing Papers by Classification
 Current IARC class
Previous classificationClass 1Class 2Class 3Class 4Class 5
Total67191126 (105)

Reassuringly, in moving from the older 3-category classification system to the 5-category IARC classification system [Plon et al., 2008], no sequence variants previously classified as “neutral” were reclassified as Class 4 or 5 (likely pathogenic or definitely pathogenic, respectively). Similarly, no sequence variants previously classified as “pathogenic” were reclassified as Class 2 or 1 (likely not pathogenic or not pathogenic, respectively).

In the next several paragraphs, we describe the fates of the 217 BRCA gene missense substitutions initially included in the LOVD. The section is organized based on the qualitative classification of each variant in the primary reference, that is, the section headed “previously pathogenic” summarizes the status in the LOVD of the substitutions that were classified as pathogenic in the primary reference. Attention is focused on those substitutions for which there was a discernable change in classification, that is, from “neutral” to Class 2 or Class 3 or from “uncertain” to a more clinically informative Class.

Previously Pathogenic

Of the 13 nonspliceogenic missense substitutions previously classified as pathogenic, all 13 had posterior probabilities of >0.99 and consequently fell into IARC Class 5 (Tables 2 and 3).

Previously Neutral

Of the 164 nonspliceogenic missense substitutions previously classified as neutral, 147 had posterior probabilities of <0.001 and consequently fell into IARC Class 1. An additional 15 had posterior probabilities between 0.001 and 0.05; these fell into IARC Class 2.

Two BRCA2 missense substitutions, p.S869L (c.2606C>T) and p.A1170V (c.3509C>T) moved from a published classification of neutral [Chenevix-Trench et al., 2006; Spearman et al., 2008] to IARC Class 3 “uncertain.”

One reason for these changes in previously neutral classifications is that the classification systems employed by Chenevix-Trench et al. [2006] and Spearman et al. [2008] included loss of heterozygosity at the susceptibility gene locus. These two studies considered loss of the wild-type allele evidence in favor of pathogenicity, lack of loss of heterozygosity (LOH) modest evidence against pathogenicity, and LOH of the mutant allele stronger evidence against pathogenicity. However, recent evidence throwing doubt on the evidence for LOH as a good predictor of missense substitution pathogenicity in BRCA1 or BRCA2 has led to exclusion of this type of data from the classification model employed here [Hofstra et al., 2008]. In Chenevix-Trench et al. [2006], LOH data provided odds of pathogenicity of 0.02 for p.S869L. Exclusion of these data is the major reason why the posterior probability for this substitution now rests at 0.104, slightly above the threshold for Class 2. In Spearman et al. [2008], LOH data provided odds of pathogenicity of 0.067 for p.A1170V. However, there was a second issue in this analysis. Spearman et al. reported p.A1170V from two apparently unrelated subjects, one breast cancer case and one ovarian cancer case. In combining the resulting data, they used the former sequence analysis based prior probability for this substitution, 0.01, twice. For any given sequence variant, the prior can only be used once in the integrated evaluation. Excluding both LOH data and one instance of the prior removes a factor of 0.0067 in favor of pathogenicity; the posterior probability for this substitution now rests at 0.230.

Of the 40 nonspliceogenic missense substitutions previously classified as uncertain (including “suspect neutral” and “suspect pathogenic” or similarly designated categories), 12 moved to one of the more informative neutral classes, Class 1 or Class 2, and 15 moved to more informative pathogenic classes, Class 4 or Class 5. Thus, only 13 of the nonspliceogenic sequence variants previously assigned “uncertain” rest in IARC Class 3 “uncertain”, whereas 27 are now reclassified to one of the two neutral or one of the two pathogenic categories. Below, we outline the informative reclassifications of these 27 substitutions by primary reference, highlighting the reasons behind their reclassification using the integrated evaluation model.

The largest group was a set of 11 missense substitutions analyzed by Easton et al. [2007] that had genetic data of between 23:1 and 350:1 in favor of pathogenicity. In combination with their sequence analysis based prior probabilities, three of these were reclassified as Class 4 (BRCA1 p.M18T and p.M1689R plus BRCA2 p.D3095E) and eight were reclassified as Class 5 (BRCA1 p.T1685A, p.T1685I, p.S1715R, p.G1738R, p.L1764P, p.I1766S plus BRCA2 p.W2626C and p.T2722R).

A group of four BRCA2 missense substitutions assessed by Spearman et al. [2008] are now classified as Class 2. Two of these, p.K1434I and p.A2351G were originally placed in Spearman et al.'s “neutral suspected category”; for these, our analysis simply confirms Spearman et al's analysis. Of the remaining two, p.D1352Y moved into Class 2 because our model excluded LOH data that had provided some evidence in favor of pathogenicity, and p.R2418G moved into Class 2 because our “likely not pathogenic” category extends up to a posterior probability of 0.05 whereas the suspect neutral category used by Spearman et al. was more restrictive.

Four substitutions classified as uncertain by Spurdle et al. [2008] also move to Class 2. For three of these, BRCA1 p.V920A, BRCA2 p.K607T, and BRCA2 p.S1760A, the published posterior probabilities were within the likely not pathogenic range, but the category did not formally exist when the paper was published. For the fourth, BRCA2 p.Q2858R, the updated prior probability model provides a lower prior probability than that used in Spurdle et al., resulting in a posterior probability in the Class 2 range. The updated prior probability model also moved two BRCA2 missense substitutions assessed by Chenevix-Trench et al., p.R2318Q and p.A2770T, from uncertain to Class 2. Additionally, one substitution analyzed by Tavtigian et al. [2006], BRCA1 p.V772A, moved from uncertain to Class 1 for the same reason.

Three BRCA1 RING domain missense substitutions analyzed by Sweet et al. [2010], p.L22S, p.C44Y, and p.C44S, move from their “likely deleterious” category to Class 5. In each case, the main reason for the upgraded classification is that their paper reported—but did not formally use—summary family history data. We made a conservative interpretation of the given family history data through Table 2 of Easton et al. [2007]; these additional data provided odds of between 2:1 and 8:1 in favor of pathogenicity and pushed the substitutions into Class 5.

Finally, two BRCA2 missense substitutions originally classified as uncertain by Farrugia et al. [2008] were reclassified: p.N319T moved to Class 1 and p.L2647P moved to Class 4. The underlying basis for reclassification was that Farrugia et al. did not employ a sequence analysis based prior probability in their classification model. For p.N319T, the prior is 0.02 and for p.L2647P the prior is 0.81 [Tavtigian et al., 2008]; calculation of posterior probabilities from these priors and the genetic data provided by Farrugia et al. results in the new classifications.

Currently, it is not possible to directly measure the performance of the integrated evaluation. In time, some reclassifications will certainly occur. Once this happens, it will become possible to estimate the frequency of reclassification from one class to a contradictory class, and it will also become possible to assess the weaknesses of individual elements of the integrated evaluation. Nonetheless, we note that even the maximum possible change in the prior (from 0.02 to 0.81, or vice versa) cannot result in a sequence variant moving from Class 1 or 2 to Class 4 or 5 (or vice versa). Moreover, there are no examples of a Class 1 or Class 2 variant where removing the “most indicative of neutral” individual LR measurement results in a switch to Class 4 or Class 5 (obviously, some variants would move to Class 3, uncertain). Similarly, there are no examples of a Class 4 or Class 5 variant where removing “most indicative of pathogenic” individual LR measurement results in a switch to Class 1 or Class 2.

Moving forward, we intend to make, at minimum, annual updates to this database with results from bona fide integrated evaluations that have been published since the previous update. Beyond updates, we see three obvious directions for the evolution of this database. First, once sequence analysis based splicing prior probabilities have been worked out, it will become possible to include sequence variants that have nonnegligible probability to either damage a canonical splice junction or create a de novo junction. Second, a number of BRCA1 and BRCA2 functional assays have been developed [Couch et al., 2008], but none have been calibrated so that their output can be interpreted as an odds ratio in favor of pathogenicity; once this milestone has been passed, identification of pathogenic missense substitutions should become more efficient. Finally, the BIC database of variants in BRCA1 and BRCA2 [Szabo et al., 2000] is meant to stand as a compilation of all sequence variants observed in these two genes and does record their classification when available. Accordingly, it will be appropriate to link from the BIC database to our Ex-UV database; as URLs for individual variants in our database can be generated algorithmically from the HGVS name of the variant, this will be quite easy to do.

The BRCA gene Ex-UV LOVD described here houses two of the largest single-gene collections of securely classified missense substitutions. As such, the dataset may contribute to future refinements of in silico missense substitution analysis algorithms. More importantly, it stands as a resource to which clinical cancer geneticists and genetic counselors can refer to see whether specific BRCA1 or BRCA2 missense substitutions observed in their patients have been classified.


We would like to thank all of the members of the Breast Information Consortium (BIC) who are not listed as authors for their continuing encouragement. We would like to thank Amanda Spurdle and Frans Hogervorst for help with summarizing potentially spliceogenic sequence variants. The content of this article does not necessarily reflect the views or policies of the NCI, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.