The frequency of misattributed paternity in Sweden is low and decreasing: A nationwide cohort study

The occurrence of misattributed paternity has consequences throughout society with implications ranging from inheritance and royal succession to transplantation. However, its frequency in Sweden is unknown.


Introduction
Misattributed paternity, where a man is incorrectly believed to be the genetic father of a child, has widespread implications ranging from sibling bone marrow transplantations to inheritance disputes, and royal succession orders [1]. Given its sensitive nature, it has been difficult to estimate its true frequency and no population-based estimates are available for the Swedish population [2].
Several back-dated studies, presenting case reports or investigating the frequency of misattributed paternity, have based their analyses on non-representative populations, including men seeking paternity testing, patients undergoing compatibility testing for organ transplantation, individuals seeking genetic counselling or in genealogically well-enumerated populations [3][4][5][6][7][8][9]. Thus, differences in estimates of the misattributed paternity occurrence can likely be attributed not only to population variability but also to variations in pre-test probabilities. From more recent studies, estimating the misattributed paternity rate over several generations using more sensitive genomic methods, the rate seems to be considerably lower than early estimates [10]. For example, in a study combining Y-chromosome genotypes with genealogical data in a Flemish population, the frequency of misattributed paternity was estimated at 1-2% per generation during the last 400 years [11]. Contemporary rates, in concordance with the Flemish study, are available for the Dutch, German and Serbian population with rates around 1% [12][13][14]. In an expanded study, including data from both Belgium and the Netherlands, with information on socioeconomic as well as demographic factors, the resulting estimates spanned from 0.4 to 5.9% in married couples with a peak incidence in low socioeconomic groups in the late 19th century [15]. Previous studies also suggest a constant rate over time [16].
Beyond its general scientific and societal relevance, the frequency of misattributed paternity has implications for research investigating familial risk, where its effect may bias risk estimates adversely. To estimate the frequency of misattributed paternity, we, therefore, collected data on a large number of family units, completing with ABO blood group data for both parents and their offspring.

Data sources
We combined all available ABO blood group data from the transfusion database SCANDAT3-S (Scandinavian Donations and Transfusions) [17] with family structure data from the Swedish multigeneration register recording family information for all individuals born in 1932 and alive in 1954. Most individuals born between 1947 and onwards were included but information on individuals who immigrated was missing until 1997 [18]. Blood donors were removed from the database as they were deemed to be non-representative of the general population. The resulting database thus contained data on families where both parents and at least one offspring had undergone ABO blood group testing in preparation for surgery, prior to a transfusion, or for routine testing in antenatal care or at childbirth. Family units were constructed with information on mother, father and offspring blood group. As such, families with multiple offspring were recorded as separate family units. Information on the educational level was retrieved from Statistics Sweden, the governmental agency responsible for population statistics. Further details about the study design and some baseline characteristics of the cohorts are found in Fig. S1.

ABO blood group system
The ABO blood group system is an autosomal triallelic system located at a single locus, with alleles denoted as A, B and O. The blood group phenotypes and accompanying allele combinations are A (AA or AO), B (BB or BO), AB (AB) and O (OO), rendering A and B co-dominant with the autosomal recessive inheritance of O. The allele frequencies are denoted as p, q and r for A, B and O, respectively. As such, impossible inheritance patterns can be determined depending on the mother's and reported father's blood group. For all analyses, we estimated blood group allele frequencies using the blood group distribution from the SCANDAT3-S database, assuming Hardy-Weinberg equilibrium. To estimate the allele frequencies, p, q and r, in our population, we used the expectation-maximization algorithm with six iterations [19].

Statistical analyses
We used a two-pronged approach to estimate the frequency of misattributed paternity. First, where the frequency of misattributed paternity was estimated using the occurrence of impossible blood group combinations (i.e. where the offspring had a blood group that would not be possible given the parents' blood group combination, such as when both parents had blood group O and the offspring had a non-O blood group). The analyses considered the population blood group allele frequencies in order to account for the possibility that the genetic father may have the same blood group as the attributed father. Observed impossible and possible blood group combinations are demonstrated in Supporting Information Fig. S1. For every impossible blood group combination, we calculated an expected count derived from the mother's allele frequency and the possible allele frequency of the genetic father. The Poisson regression model was specified based on the 16 impossible mother-father offspring relations depicted in Supporting Information Table S1. We assume that the frequency of misattributed paternity is independent of the blood groups of the parents and that the frequency is sufficiently low to be approximately Poisson distributed. We fit this model as a Poisson regression model using a log link function, with the total number of family units per mother, father and offspring blood group combination as the outcome. The logarithm of the product of the number of impossible parents and known mother and genetic father expected genetic contribution was used as an offset term (Supporting Information Table S1), yielding the logarithm of the probability of misattributed paternity as the intercept. The model was tested for equidispersion. We constructed 95% confidence intervals using the Wald method.
Secondly, we also fitted a Bayesian model that was based on all mother-father-offspring combinations, both possible and impossible combinations. The probability for each combination to have a non-genetic father was derived conditionally on the frequency of misattributed paternity, by demonstrating that the probability of a specific phenotype in the offspring is the sum of the conditional probabilities of offspring of a specific phenotype given to mother and non-genetic father and mother and genetic father, weighted by the probability of misattributed paternity. The calculations were carried out using the Markov Chain Gibbs sampling scheme [20]. For each step in the chain, the misattributed paternity from the previous step in the chain was used to compute the conditional probability for each combination of the family's blood type, allowing the number of genetic fathers to be sampled. When the number of families in a specific combination was greater than 25, a Poisson approximation to the binomial distribution was used to speed up the computations. The uncertainty of the blood group distribution, and hence allele frequency distributions due to limited data for a specific cohort, was included in the sampling for each iteration. However, the effect sizes or uncertainty intervals did not change owing to a large number of individuals in the cohorts. Once the number of genetic fathers out of the total number of families had been obtained in a specific step in the chain, this number was used to sample the misattributed paternity fraction using a beta distribution, with β(2,2) prior to excluding the case of certainty concerning the parental status, which in turn was used to initiate the calculations in the next iteration of the Gibbs chain. After discarding 10 5 burn-in chain iterations, an additional 10 6 iterations were completed to obtain the estimates and accompanying uncertainty intervals.
Using the second model, we also performed analyses restricted to children born before in vitro fertilization was introduced in Sweden in 1983, as well as stratified by a decade of birth and parental education. As online-only supplements, we also provide estimated rates for analysis on the geographical region of birth and maternal and paternal age at birth (Supporting Information Figs S2 and S3).
Data processing and statistical analyses were conducted using SAS Statistical Analysis Software, version 9.4, R, version 3.6.0 or MATLAB2019a, as appropriate.

Results
A total of 1.95 million family units were included in the analyses. The blood group distribution in the full study population was 45% A, 5% AB, 12% B and 38% O. The cohort mainly consisted of offspring born between 1950 and 1990. Of all parents, 89% were born in Sweden and of the offspring, 97% were born in Sweden.
The main analyses resulted in estimates of the frequency of misattributed paternity that were very similar between the Poisson, 1.7% (95% confidence interval, 1.6-1.7%) and Bayesian models, 1.7% (95% credibility interval, 1.6-1.7%) (Fig. 1a). In a sensitivity analysis, to address the issue of nonindependent samples, we restricted the cohort to the first child born in every family unit. The estimates were similar and again consistent between the Poisson and Bayesian models, 1.8 (95% confidence interval, 1.7-1.9%) and 1.8 (95% credibility interval, 1.7-1.8%). When the analyses using the Bayesian model were restricted to offspring born before in vitro fertilization, first introduced in Sweden in 1983, it yielded a slightly higher estimate, 2.0 (95% credibility interval, 1.9-2.0%). In turn, this was consistent with a general decline in the occurrence of misattributed paternity over time, from 3.2% (95% credibility interval, 2.7-3.8%) for children born in the 1930s to 0.9% (95% credibility interval, 0.7-1.1%) for children born in the 2010s (Fig. 1b). The misattributed paternity frequency was highest among those with the lowest educational level for both the mother and father, 2.0% (95% credibility interval, 1.8-2.1%) and 1.7% (95% credibility interval, 1.5-1.8%), respectively, but with a trend towards higher estimates for mothers with postgraduate education 1.1% (95% credibility interval, 0.6-1.8%) (Fig. 1c). For the analysis regarding the age of father and mother, there was a u-shaped association with higher estimated rates for younger and older age (Supporting Information Fig. S2). For geographical regions, the were small or no associations between place of birth and estimated rates for population-dense or non-dense regions with a few exceptions (Supporting Information Fig. S3).

Discussion
In this study, we present population-based estimates of the frequency of misattributed paternity in Sweden from the 1930s until now. Overall, we find misattributed paternity to be within the range reported for other countries, with the contemporary rate being approximately 1%, although it may be as high as 3% in persons born in the 1940s, or earlier. In addition to its societal implications, it indicates that misattributed paternity is unlikely to have large effects on studies investigating familial disease risks, especially if these studies are conducted using relational data collected in recent decades.
The study comprises some significant strengths. Most notably, the inclusion of a large portion of the Swedish population, namely those who had undergone blood grouping prior to routine health care procedures, likely renders the results generalizable to the Swedish population. This, taken together with the high quality of the ABO blood group data in the SCANDAT3-S database which is based on largely unprocessed blood group data from transfusion medicine clinics where health consequences of coding errors would be severe, implies that the results should be very robust. Excluding individuals who only had undergone blood grouping testing in preparation for donating blood was done in order to improve generalizability by removing blood donors, a group that is over-represented in the SCANDAT3-S database, who are selected because of good health and a low-risk sexual behaviour and may thus not be representative of the general population. Still, persons undergoing blood group testing in preparation for surgery, childbirth or other medical reasons, may also not be entirely representative, on account of there being a higher disease burden and thus also a higher prevalence of common risk factors for poor health. Conceivably, this may have affected the misattributed paternity frequency if risk factors for poor health are associated with increased or decreased risk of misattributed paternity. Moreover, any specific factor, such as coding errors in the contributing registers, chimerism and other rare reasons for familial blood group mismatch, that could explain the impossible mating patterns in our data would lower the estimates. However, such instances would be very rare. Instead, we believe that our results are limited mainly by the quality of the multi-generation register used to construct the input data for the model. For example, a general improvement in register quality over time might at least partly explain the higher misattributed paternity frequencies in more distant decades. Other reasons for the higher estimates in earlier decades can only be speculated upon, where a possible explanation could be related to an increased tendency for absent fathers due to labour in distant cities or at sea -which was more common previously -or for societal reasons, where attitudes towards divorce were previously less permissive, and where contraceptive methods were less accessible. In general, one could argue that results from analyses of time periods further away may need to be interpreted with greater caution.
The assumption of no gene flow in Hardy-Weinberg equilibrium may be violated due to the extensive recent migration to Sweden slightly altering the expected counts. We have provided the blood group distribution of offspring by the decade of birth in Supporting Information Table S2 to demonstrate small differences in distributions between decades. However, the database contains only a relatively small number of immigrant families limiting the impact of the gene flow effect.
A further issue is a possibility that for some of the offspring we identified as having incompatible blood groups with their parents, it may not be unknown to the recorded parents that the child had a different father and would if so, not be a strict occurrence of misattributed paternity. However, the estimated frequencies are similar to previous estimates and our findings probably reflect an upper bound of misattributed paternity in Sweden.
Hence, we argue that the true, contemporary frequency of misattributed paternity in the Swedish population is close to 1%, although it may be as high as 3% in persons born in the 1940s or earlier. As such, our findings show that misattributed paternity in this large population-based sample is in line with previous contemporary estimates from Europe, further contributing to our understanding of population genetics as well as societal structure.

Funding Information
The creation of the SCANDAT3-S database and the conduct of this study was made possible by a grant to Dr. Edgren from Swedish Research