The concept of personalized or individualized medicine is based on an ability to predict phenotype from genotype. With the increasing availability of clinical exome and genome sequencing, clinicians are being presented with opportunities to practice predictive genomic medicine. A key component of this ability to predict is a database of the pathogenicity (or absence thereof) of genetic variants in patients. Geneticists rely heavily on prior reports of causal mutations at two levels. Here, I am restricting my comments and analysis to point mutations for disorders typically inherited in an autosomal dominant pattern, although many of these considerations could also apply to copy number variants (CNVs). First, mutations published as being pathogenic should be a good data source when those specific mutations are detected in subsequent patients. Although modifiers, genetic background, and environment can be significant factors, in general, a mutation that is pathogenic in one patient should be pathogenic in another. The second level of utility for a mutation database is to allow researchers to build predictive algorithms that accurately predict pathogenicity of novel mutations, which can only be done if the database of causative mutations is robust. It is now widely recognized that current databases include a substantial number of variants designated as pathogenic, which are in fact benign. These errors of causality attribution are a significant problem and will lead to erroneous predictions of disease risk for patients undergoing clinical sequencing and pose problems for those designing pathogenicity prediction algorithms. These challenges are not novel—this issue has existed over the past several decades of genetic testing by the sequencing of individual genes or gene panels. However, the advent of exome and genome sequencing has grossly amplified this problem—the coding regions of a genome typically contains 30,000 variants and of the many variants that are designated as pathogenic, a substantial fraction of them may be erroneous.
Most databases draw their data from published reports of mutations determined to be pathogenic by the authors of those reports. The data that authors use to determine pathogenicity varies widely. These data include mutation type, cosegregation, amino acid alteration predictions (e.g., SIFT, polyphen), co-occurrence with other mutations, functional assays, etc. One of the most important types of segregation data is the occurrence of a heterozygous de novo variant in a patient affected with a disorder that is inherited in an autosomal dominant pattern. Because the baseline mutation rate of the genome is low (estimated to be 1 × 10−8 per base pair per meiosis per haploid genome) [Roach et al., 2010], one would expect to encounter only one de novo mutation in an individual in approximately 30 Mb of protein coding regions of the genome (2 × 10−8 × 3 × 107 = 0.6). Although it cannot be assumed that all de novo variants are pathogenic, the occurrence of such an event can be highly predictive of pathogenicity when it predicts a functional change (i.e., is coding and is not a synonymous variant) and occurs in a gene that is known to be mutated in other patients with the phenotype.
However, the predictive utility of the de novo status of a sequence variant depends entirely on whether it is truly de novo. The typical way that segregation is established is to genotype the parents of the proband for the mutation to determine if either parent harbors it. If neither of the parents harbor the variant, it is declared to be de novo. However, there is an alternate explanation, which is that one of the parents is not the biologic parent of the proband. The frequency of nonbiologic relationships among trios (child, mother, and father) presumed to be biologically related has been measured in a number of studies, with widely varying results [for review, see Bellis et al., 2005]. Most studies focused on issues of misattributed paternity because misattributed maternity is much rarer. The Bellis et al.,  review of these studies concluded that, among trios, the best available estimate of the overall rate of misattributed paternity is 3.7%, with a range from 0.8% to 30%. Therefore, the alternative explanation is that the variant was inherited from an unrecognized biologic parent. As the phenotypic data on that parent will not typically be available, the presence of the variant in the proband is much more challenging to interpret. I suggest that this alternative explanation of misattributed parentage should be excluded experimentally in order to claim that the variant is de novo. We require data for all other conclusions, so we should require it for this conclusion.
To survey current practice in the field regarding this issue, I searched for papers published in Human Mutation using the search string: “novo[Text Word] and Human Mutation[Journal].” This yielded 150 papers, and I reviewed the most recently published 10 of those papers that included primary data (i.e., were not review articles). Three of the 10 papers included a description of biologic parentage testing in either the methods, results, or figure legend [de Pontual et al., 2011; Le Guen et al., 2011; Neveling et al., 2012]. Six of the 10 papers did not mention parentage testing but concluded that the mutation was de novo apparently after testing the parents only for the candidate pathogenic variant found in the child [Callewaert et al., 2011; De Marco et al., 2012; Gazda et al., 2012; Isidor et al., 2011; Lettice et al., 2011; Rotthier et al., 2011]. In one paper, parentage was not biologically confirmed, but the authors concluded that the mutation was “apparently de novo” [Lamb et al., 2012]. I conclude from this modest analysis that there is not a standard in our field on this issue, but a substantial number of researchers recognize the importance of this issue and have instituted parentage testing to support a claim of de novo status of a variant. I suggest that we should join these investigators and meet the standard that they are practicing.
To address this issue, I suggest that mutation reporting standards for research publications should include a requirement that parentage is confirmed through biologic parentage testing in all cases where a sequence variant is claimed to be de novo and that the parentage testing should be noted in the manuscript, either in the Methods or Results section. There are a number of ways this can be accomplished. In most cases, paternity can be readily confirmed using five to 10 short tandem repeat polymorphism (STRP) markers. If extensive SNP data already exist for the trio, the paternity can be readily affirmed by examination of 50–100 SNPs. In some cases, none of these approaches may be possible, due to technical limitations or availability of parental specimens. In other cases, researchers may be constrained from performing such testing due to regulatory concerns, an issue that is beyond the scope of this commentary but one that warrants a fuller discussion and analysis. Irrespective of the reasons for which parentage testing is not practicable, if it is not done the variant should be annotated as “apparently de novo,” as was done by Lamb et al. . This would increase the transparency of our data by allowing the reader (and the database curators) the ability to distinguish confirmed from unconfirmed de novo variants. The unconfirmed, apparently de novo variant reports are of value, but they are clearly of lesser value. This effort may not directly improve the quality of the data in our field but may instead lead to more reports describing such variants as apparently de novo. This is still important because it would increase the transparency of the data—distinguishing what we can determine to be true from what we assume to be true and signal to the reader that the variant is not truly a de novo event. The availability and transparency of these data should ultimately improve the accuracy and robustness of mutation annotations and subsequently improve our ability to predict phenotype from genotype, and to practice predictive genomic medicine.