Human Congenital Diseases with Mixed Modes of Inheritance Have a Shortage of Recessive Disease. A Demographic Scenario?

Authors


N. Avrion Mitchison, Institute of Ophthalmology, University College London (UCL), 11–43 Bath St, London EC1V 9EL, UK. Tel.: 020 7608 4056/020 7359 5344; E-mail: n.mitchison@ucl.ac.uk

Summary

An archive of congenital human diseases is presented, aiming to contain all those where recessive (biallelic) can be compared with X-linked and/or dominant (monoallelic) inheritance. A significant deficit of recessive inheritance is evident, both in disease inheritance and in contribution to inheritance per known disease gene. The deficit contrasts with expectation derived from the cell biology of mutation, and from the importance of recessive mutation in evolution and its preponderance in N-ethyl-N-nitrosourea (ENU) mutagenesis. The deficit fits well with the standard model of demographic change since the neolithic era, and may also reflect natural selection acting on heterozygotes.

Introduction

The modes of inheritance of congenital disease are worth examining from an evolutionary perspective, because the influence of natural selection on the recessive (AR) mode of inheritance is so different from that on X-linked (XL) and dominant (AD) modes. For congenital disease, the recessive mutations are likely to circulate for longer periods in the human population, with profound effects evident in the higher single nucleotide polymorphism (SNP) ratio (Dn/Ds, the ratio of nonsynonymous to synonymous nucleotide substitution) of recessive disease genes (Furney et al., 2006, Blekhman et al., 2008). The disease frequencies would also be expected to differ with AR disease inheritance being more deeply influenced by demographic change (Williamson et al., 2005) and by natural selection operating on heterozygotes. The problem with testing these predictions is that the population frequency of the different modes of inheritance is seldom known accurately. Accordingly, we here take the narrower approach of comparing the relative frequency of the modes within those diseases where the AR form of a disease can be compared with its AD and/or XL forms. Such data are regularly kept for each disease because of their importance for diagnosis and for assessing risk to family members.

We have assembled an archive of diseases that have the necessary mix of inheritance modes, together with lead references, in a form that allows addition of new data and corrections. The archive is based on a search through Online Mendelian Inheritance in Man (OMIM), screened for a suitable assortment of inheritance modes, and is presented in supporting information. Various smaller data sets based on OMIM have been assembled for other purposes such as calculation of the SNP ratio or the mutation rate. To avoid ascertainment bias, we included all the available data for diseases with mixed inheritance mode. As these data vary in quality, we identified a subset of diseases where the mode of inheritance has been analysed more thoroughly. These selected data were then analysed and compared with the total data set. The two data sets, selected and total, turn out to provide concordant evidence of a shortage of recessive disease. The shortage observed applies to inheritance of the disease, as well as to inheritance per gene (i.e. per known recessive gene compared with per known dominant or XL gene). Discarding other explanations such as a lower rate of recessive mutation, we propose that passage of the European population through a demographic bottleneck best explains this shortage.

The present archive has features in common with the Orphanet collection of prevalence data for rare diseases (Orphanet Report Series, 2010). We cite below the valuable data from this collection, although these do not readily allow the frequency of the modes of inheritance to be compared within a disease.

We hope that our archive will attract critical scrutiny, made easier by the alphabetic arrangement. The corresponding author would be grateful for corrections and pointers to additional good quality data.

Materials and Methods

The archive presented here in supporting information lists data for 95 diseases assembled as follows. The OMIM database (assembled by the National Center for Biotechnology Information, Bethesda, MD) currently lists 2867 diseases with phenotype description and molecular basis known, of which 692 (24%) include “recessive AND (dominant OR XL).” We searched this list for entries for the archive, using the following criteria. Data excluded were as follows: (i) redundant entries, where a single disease has several entries (e.g. dominant and recessive forms entered separately), as is usual in OMIM; (ii) entries with only a single gene, which has both dominant and recessive mutations; (iii) association studies (excluded because they do not identify modes of inheritance); (iv) entries listed only because they mention dominant and/or recessive forms of XL disease; (v) nondisease entries such as pigmentation or frizzled hair. We counted only genes with a known chromosomal location, which are usually listed under the OMIM entry title. These criteria excluded 597 OMIM entries, leaving 95 entered into the present archive, which is presented here in supporting information. As mentioned below, we have not yet sorted out the contribution from isolated populations, where the frequency of recessive disease alleles is expected to be relatively high. Our analysis omits consideration of disease alleles identified solely by genome sequencing (Vissers et al., 2010).

The archive is arranged in columns with the diseases listed in alphabetic order with their OMIM numbers and a lead citation. The following three columns show the % inheritance of the disease found in the three modes of inheritance (XL, AR and AD). The last three columns show the number of genes identified as responsible for the disease in each mode. Thus, for example, retinitis pigementosa is present in 10% of patients as XL, 36% as recessive and 54% as dominant disease. Of the known causal genes, 2 are XL, 26 are recessive and 12 are dominant. The median population frequency of the diseases in the archive is 0.8 × 10−4.

Data for the proportions of inheritance and the number of genes (and in some cases their date of discovery) were acquired from the lead citations listed in the archive and from other published data. To avoid selection bias, no disease was excluded for lack of information. We evaluated the data as shown in the first column of the archive, where an asterisk denotes the 16 diseases that have the most substantial data. The category includes a range of four each immunological and musculoskeletal, two each ophthalmological, haematological and neurological, and one each sensory and renal disease. The data come from national (OMIM numbers 233690, 105650, 253600, 310200, 173900) and international (118220, 217000, 123100, 306700, 124900, 162400, 252700) surveys, and from single large (>60 patients) clinical series (120970, 308230, 268000, 102700). Categorizing the data in this way allows the analysis of the diseases with the best data to be compared with the full data set. Fortunately, the two data sets yield similar findings, as presented below.

Results

First, to gain an overview of how the acquisition of data is progressing, we traced the accumulation of mutation in the six representative diseases, as shown in Figure 1. They show how accumulation has progressed over the last two decades, and suggest that it has slowed for XL mutations and is beginning to slow for AR and AD mutations. The earlier accumulation of XL data is as expected from their easier ascertainment. Thus, the distribution of the modes of inheritance is now clear enough to merit attention, although no doubt likely to require revision in the future.

Figure 1.

Progressive accumulation of mutations in six representative diseases.

In the archived data, we first compared the 22 diseases where AR > AD inheritance with the 57 where AR < AD. The difference is highly significant p= 0.0004 (Fisher's exact contingency test). However, the corresponding comparison of AR with XL inheritance does not show a significant difference.

We next compared the contributions to disease frequency from AD, AR and XL inheritance, and also the contributions to disease frequency per known mutation of each type, with results shown in Figure 2. The two charts refer to the total archived data on the left (95 diseases) and the right to the more substantial data, graded * (16 diseases). In both groups, the % disease inheritance and the % inheritance per known gene are significantly lower for both the AR disease inheritance and the number of AR genes, as compared with either the XL or AD diseases and genes (eight comparisons, in each case with p < 0.05). The % inheritance per recessive gene has been multiplied by two, to allow for the two recessive genes lost per disease case. This correction was applied in the comparisons, as shown in Figure 2. The % inheritance per recessive gene is particularly low for the better data set (right-hand panel). This reflects the fact that this set has, on average, twice as many recessive genes per disease. Evidently, diseases with large numbers of mutations attract more systematic work, as might be expected. Allowing the two data sets to be compared (total versus more substantial) reduces the likelihood of undetected ascertainment bias. The analysis thus yields a clear shortage of recessive inheritance in congenital disease, in a collection that contains most of the data at present available.

Figure 2.

Distribution of inheritance between the three modes within the diseases surveyed: showing mean and standard error of mean. The left-hand panel refers to the total archived data (95 diseases), and the right to the 16 diseases with best data, graded * as shown in the archive.
White bars: % inheritance per disease. Black bars: % inheritance per known gene, doubled for AR genes to allow for two genes per disease case.

Discussion

In considering the shortage of recessive inheritance found here, we can first dispose of the misconception that recessive inheritance is less common simply because each new mutation lingers in the human population before it meets another. This could not explain the low equilibrium level of recessive disease. We do not here consider quantitative variation in disease impact, for simplicity and because the age of onset of congenital disease has little influence on the Dn/Ds ratio (Blekhman et al., 2008).

The recessive mutation rate in man is not intrinsically low (Lynch, 2010), and mutations are dominant only in the restricted circumstances of haploinsufficiency or gain-of-function. Hence, a higher frequency of recessive disease mutations might be expected. Early mutagenesis studies in mice, in fact, confirmed this expectation (Ehling et al., 1985), as have later large-scale studies of N-ethyl-N-nitrosourea (ENU) mutagenesis in mice (Cook et al., 2006, Nelms & Goodnow, 2001, Jamsai & O’Bryan, 2010). The ENU community worldwide is largely moving over to screening only for recessive mutations.

Against this background, the shortage of recessive inheritance found in the present study requires explanation. Demographic change since the ice age era provides a plausible explanation as follows. The two population coalescence models identify a bottleneck in the European population during the last ice age and before the start of agriculture, when expansion began (Schaffner et al., 2005, Williamson et al., 2005, Reich & Lander, 2001). Inbreeding depression mediated by purging of recessive disease mutations would have occurred during the bottleneck. Loss of deleterious recessive alleles in this way has been observed in many species, and mathematical models have been developed (Charlesworth & Willis, 2009). An early application was to a bottleneck in the Japanese population (Nei & Imaizumi, 1963).

We have applied this modelling as follows. The median disease frequency in our collection is 0.5 × 10−4, to which each AR disease allele contributes an average of 10.2%. The AR disease frequency per allele then is 5 × 10−6, with a corresponding allele frequency in the population of √(5 × 10−6) = 2.25 × 10−3. We assume that entry of the disease allele into the population by mutation (μ) and exit by disease are in balance at a rate of 5 × 10−6 per generation. This rate is in line with previous estimates of μ, such as the rate 3.2 × 10−6 arrived at after survey of several earlier estimates (Reich & Lander, 2001). We further assume that a recessive disease allele accumulates during recovery from purging in the bottleneck at a rate of μ per generation (neglecting temporarily the loss of disease alleles through disease that is discussed below). We make the arbitrary assumption that purging reduces the disease frequency by 50%, and thus allows recovery to start from a median disease frequency per allele of 2.5 × 10−6. The Williamson et al.'s estimate allows 900 generations of recovery (Williamson et al., 2005). With our estimated value of μ, these assumptions would allow the disease allele frequency in the population to recover up to a present-day level of 4.5 × 10−6, a little less than the present equilibrium estimate of 5 × 10−6, and thus well able to account for the deficit in AR inheritance, as shown in Figure 2. The selected group of better documented diseases has the same median frequency of 0.5 × 10−4, but has more recessive genes for each disease, so that each gene contributes an average of only 4.6% of the disease. Recovery in the model is therefore less complete, in accordance with the larger deficit in AR inheritance evident in Figure 2.

So far, the calculation accounts for entry of AR disease genes during recovery from the bottleneck, but not for loss through disease (i.e. loss-through-homozygosis). We examined recovery in a spreadsheet-based model similar to that used previously (Testa & Bojarski, 2007). We increase the frequency of the AR disease allele by a factor of μ−2μ2 in each successive generation (μ for entry by mutation, −2μ2 for loss-through-homozygosis at Hardy-Weinberg equilibrium, as shown in Figure 3. Evidently, this simple loss-through-homozygosis model works reasonably well for values of μ in the range of 0.00003–0.0003 (0.3 × 10−5–3 × 10−5). We then asked how this relates to the values of μ found (as prevalence) in the Orphanet survey (Orphanet Report Series, 2010), where we found 31 diseases from our archive listed with their prevalence, which we take as estimates of μ. Of these diseases, 14 had values of μ that fell within the range of 3–30 × 10−5, indicating that the model fits the available data reasonably well. However, the four diseases in this sample that had values of over 30 × 10−5 do call for some adjustment of the model.

Figure 3.

Recovery over 900 generations of a recessive disease gene that had been fully purged during a bottleneck. New mutations accumulate linearly (dashed lines) if no allowance is made for their loss through expression in homozygotes. The three panels show the effect of loss through expression in homozygotes for mutations occurring at the three rates (μ) (solid lines).

As is well known, natural selection may also operate on the heterozygote of disease alleles classified as recessive. The general pattern is of slightly deleterious carrier effects, but with beneficial effects also widely reported, for instance, among genes involved in resistance to infection (Dean et al., 2002). The panel of diseases in the present narrow archive would not be particularly useful for evaluating this form of natural selection, where a genome-wide approach would be more appropriate.

Finally, we need to consider alternative explanations for the shortage of recessives. One concerns the outcome of recent large population association studies, which seems only to identify near-additive effects–although this can partly be explained by incomplete LD diminishing the dominance/recessive effect–and a bias of researchers to only look for additive effects. Recent population expansion and perhaps more outbreeding with more mobility may have increased our carrying capacity for recessive alleles. Furthermore, the more important bottleneck for non-Africans dates back to the time of emergence of our ancestors from Africa perhaps 60–100 kya. The more recent ice age may not have provided a bottleneck severe enough to eliminate the signals of earlier expansion in the interglacial period roughly 40–50 kya.

A second possible explanation for the shortage of recessive forms concerns their relative level of severity. High severity could reduce the population frequency by hindering spread of a mutant gene, as has been proposed for retinal degeneration where the most severe form, Leber congenital amaurosis (involving many genes), is also the commonest. We have not yet examined the impact of disease timing and severity on recessive gene frequency. Clearly, the diseases in the present archive would provide suitable material for evaluating this possibility, but this would be a large task requiring care to avoid data-selective bias.

Another factor tending to reduce the frequency of recessive forms is their loss in isolated populations. This is certainly the case, but again, we have not yet managed to sort out the effect. The effect of consanguinity is also likely to be important. It would be of interest, for example, to compare the recessive disease frequency in North versus South India.

In conclusion, a significant shortage of recessive inheritance has been identified, which we suggest may reflect in part at least demographic change in the human population.

Acknowledgements

We thank D. Isaacs and G. Clark for setting up the spreadsheet model used here, D. Balding (UCL) for discussion of recent large-scale association studies and early demography, and R. P. Erickson (University of Arizona) for mentioning the importance of disease severity.

Ancillary