Consequences of imprecision in fetal fraction estimation on performance of cell‐free DNA screening for Down syndrome

Abstract Background There is a significant variability in reported fetal fraction (FF), a common cause for no‐calls in cell‐free (cf)DNA based non‐invasive prenatal screening. We examine the effect of imprecision in FF measurement on the performance of cfDNA screening for Down syndrome, when low FF samples are classified as no‐calls. Methods A model for the reported FF was constructed from the FF measurement precision and the underlying true FF. The model was used to predict singleton Down syndrome detection rates (DRs) for various FF cut‐offs and underlying discriminatory powers of the test. Results Increasing the FF cut‐off led to slightly increased apparent DR, when no‐calls are excluded, and an associated larger decrease in effective DR, when no‐calls are included. These effects were smaller for tests with higher discriminatory power and larger as maternal weight increased. Conclusions Most no‐calls due to a low reported FF have a true FF above the cut‐off. The discriminatory power of a test limits its effective DR and FF precision determines the tradeoff between apparent and effective DR when low FF is used to discard samples. Tests with high discriminatory power do not benefit from current FF measurements.


| INTRODUCTION
Over the last decade a large number of women have had a cell-free (cf)DNA screening test for fetal aneuploidy. So far, most tests have been carried out in the private sector, but increasingly public health screening programs are being established, whereby women are offered a cfDNA test. In light of this, many professional societies and organizations have published recommendations or practical guidelines associated with the utility of cfDNA as a screening assay for fetal aneuploidies, as well as recognizing the importance of 'no-call' results that commonly are excluded from reported performance metrics. [1][2][3][4] These cfDNA tests rely on the presence of a cfDNA fraction in maternal plasma that originates from the placenta, acting as a proxy for the fetus. The amount of feto-placental cfDNA compared to the total cfDNA is commonly referred to as the fetal fraction (FF) and, in the first trimester, averages around 10%-11% with a wide range among individuals being tested. [5][6][7] All cfDNA tests rely on the ratio between a chromosome of interest and a reference chromosome or set of chromosomes.
This ratio can be normalized so that euploid samples have, on average, a ratio of 1, thus in Down syndrome pregnancies, the average ratio would be [3FF + 2(1 -FF)]/2 or 1 + FF/2 since the fetus has three copies of chromosome 21 while the mother has two and both have two copies of the reference chromosomes. 8 In general, the ability of the test to distinguish aneuploid from euploid pregnancies ('discriminatory power') is dependent on the overlap of chromosomal ratios between affected and unaffected pregnancies; and for the individual sample it is also dependent on the FF. This has led to the notion that FF should be routinely quantified and the cfDNA result only reported when it is above a pre-set, or dynamic, limit, with the remainder classified as no-calls. 1 Direct quantification of FF can sometimes be done by measuring either the Y-chromosome, when the fetus is known to be a singleton male, 9,10 or the affected chromosome when it is known to be trisomic. 8 But for routine testing, indirect quantification is needed. Several approaches have been described, including those based on fragment size distributions, 11 nucleosome profiles, 12 and Single Nucleotide Polymorphisms (SNPs). 13,14 Comparison of direct and indirect quantification shows that FF measurements for individual samples are inaccurate 8,15 due to its considerable imprecision; with a standard deviation (SD) in six published studies ranging from 1.3% to 3.4%. 16 The consequences of this FF imprecision have generally not been reported, except in samples from non-pregnant women. 17,18 The imprecision of FF measurements, together with a lack of standardization, has led some to question the use of this metric in deciding whether to report a cfDNA result or not. 19,20 In this paper, modeling is used to examine the effect of FF imprecision on the nocall rate and consequently the Down syndrome detection rate (DR) among singleton pregnancies in a universal cfDNA screening program.

| METHODS
A model was constructed for the joint frequency distribution of FF values in the absence of FF assay imprecision (truFF) and values estimated by a given assay method (estFF). The model comprises a log Gaussian distribution of truFF and a marginal Gaussian distribution of estFF for a given truFF. There are three model parametersmean, SD of log(truFF) and SD of the FF assay-which are the same in both Down syndrome and unaffected pregnancies.
The mean and SD of log (truFF) were derived from a published study of 10,698 singleton pregnancies tested prospectively at 10-14 weeks gestation, which included 10,472 unaffected, 160 Down syndrome, 50 Edwards syndrome and 16 Patau syndrome pregnancies. 5 The reported FF was shown to follow an approximately log Gaussian distribution with a heavier lower tail which would be expected if FF assay imprecision is additive rather than proportional. Therefore, to minimize the influence of the FF assay, the mean of log (truFF) was estimated from the median (11.0%) and the SD from the inter-quartile range (8.3%-14.4%), 21 yielding log(truFF) values of −0.959 and 0.179, for the mean and SD respectively. There was no difference in the reported FF distribution between the Down syndrome and unaffected pregnancies according to statistical hypothesis testing (p = 0.97, two-tailed t-test) and a multi-variate regression analysis. 5 The SD of the same FF assay was derived from a published study of FF measurement in 47,512 singleton pregnancies with male fetuses, tested after 10 weeks gestation. 14 A published digital analysis of a plot between the reported value (estFF) and that from the Ychromosome, representing the truFF, found that the assay imprecision followed a Gaussian distribution with an SD of 1.6%. 16 The model was used to simulate estFF and truFF in one million data points. Goodness of fit was assessed using the reported FF values in the 10,472 unaffected pregnancies provided by Revello et al. 5 The model was used to estimate the no-call rate for different FF cut-offs as well as the proportion of truFF values below the cut-off.
Additionally, both the apparent (app)DR, excluding no-calls, and the effective (eff)DR, when affected no-calls are included as screen negative, were evaluated for Down Syndrome pregnancies. This was done for an idealized cfDNA test assumed to have a hard limit of detection (hLoD) defined as a truFF value above which all Down syndrome results are true-positive, and below which all are false-negative.
Tests with four hLoD values were used in the analysis (2%, 3%, 4% and 5%) and samples that are classified as no-call for technical reasons were not considered. The effDR and appDR of these tests were compared with the intrinsic (int)DR when no FF cut-off is applied.
Fetal fraction is negatively correlated with maternal weight 5,6 consequently reducing Down syndrome detection in heavier women.
This was investigated by updating the parameters for the truFF distribution using a subset of the 10,472 unaffected pregnancies from Revello et al. 5 Figure 1 shows an almost perfect agreement in estFF quantiles between the observed reported FF and the model predicted FF, confirming the model assumptions and derived parameters. Table 1 shows, for selected FF cut-off levels ranging from 1% to 5%, the model predicted proportion of samples with truFF and estFF below the cut-off. In each case the proportion for estFF is much higher than for truFF, thus the vast majority of samples classified as no-call for having estFF below a cut-off in this range would have a truFF above the cut-off. This proportion of erroneously classified samples, for different FF cut-offs, is also shown in Table 1. Table 2 compares the model-predicted appDR and effDR for four cfDNA tests according to different FF cut-off levels. When no FF cutoff is applied, both the appDR and effDR will be equal and reflect the intDR. For all tests with a hLoD ≤4%, a minor gain in appDR is associated with a substantially larger loss in effDR, reflecting that the majority of samples converted to no-calls had a sufficiently high truFF to be called correctly. For a test with a high discriminatory power (hLoD = 2% or 3%), a 4% FF cut-off would give an increase in appDR <0.1% and a decrease in effDR of 2.5%. In addition, it should be noted that even having a FF cut-off perfectly matched with the hLoD, the appDR is less than 100%, reflecting the fraction of samples with truFF < 4% that gets misclassified with an estFF ≥ 4%, resulting in false negative results. Figure 2 shows the effect of varying FF assay imprecision on a test with hLoD 4% and matching 4% FF cut-off. While the appDR does not decrease much with increasing imprecision, the effDR reduces substantially. Table 3 shows the corresponding DRs for women with a maternal weight of 100 kg. Both the appDR and effDR are lower and the separation between them is increased as compared to Table 2. Figure 3 shows, for tests with hLoD of 2% and 4%, and a FF cut-off 4%, the effect of maternal weight on the intDR, effDR and appDR.  South Asian ethnicity and assisted conception. 5 Both the maternal age and South Asian effects were very small; assisted reproduction accounts for less than 5% of births in most localities; and effects from maternal weight on the Down syndrome detection rate was evaluated in this paper (Table 3 and Figure 3). Similarly, the model parameter relating to FF measurement imprecision represents a single FF assay. 14 However, this value, an SD of 1.6%, is consistent with the range of SDs, 1.3%-3.4%, reported or derived in a meta-analysis using various FF assays. 8,11,12,16,22,23 Moreover, the effect of this SD on the Down syndrome detection rate was evaluated (Figure 2), suggesting that the effects in general are similar or worse with other approaches. practice there is a continuous reduction in discriminatory power as FF approaches zero, rather than a sudden loss at a given true FF level. Nevertheless, the effects demonstrated using these simplified cfDNA tests would still apply to those used in practice. Similarly, the same effect of imprecision in the FF measurement is still valid when using a dynamic FF cut-off in combination with, for example, sequencing coverage, instead of using a fixed cut-off. 23 The analysis assessed Down syndrome screening performance only by the detection rates and FF based no-call rate, not by the false-positive rate (FPR) or positive predictive value (PPV). That is because the FPR is generally, except for methods using the reported FF as a part of the risk assessment, independent of FF and very low for cfDNA tests, resulting in a high PPV. Since no-calls based on technical reasons were not considered, the effective and intrinsic

| RESULTS
DRs should be considered upper estimates.
For the purposes of illustration, the effective Down syndrome detection rate was defined by categorizing no-call results from affected pregnancies as screen negative. In clinical practice a repeat cfDNA test on a second blood sample is often offered after a no-call result. However, not all women offered a retest submit one (50%-75%); and, only about two-thirds of those receive a final result. 5,24,25 In one study only 9% of unaffected pregnancies with a no-call as a final result, that is after offered repeat testing, went on to have an invasive test. 5  The current analysis also highlights the trade-off between an increasing appDR and a decreasing effDR. For high discriminatory power (hLoD ≤ 3%) the decrease in effDR is more than 100-fold greater than the very slight increase in appDR, suggesting that using low FF as a reason to classify results as no-calls for these tests provides limited benefit and potentially even a disadvantage for the screening process as a whole considering the proportion of no-calls that would be offered invasive testing. As has been clinically shown, 30 such tests can have a very high intrinsic Down syndrome detection rate which would effectively be reduced by such classification. Moreover, not classifying results as no-calls based on FF reduces the need for repeat cfDNA testing as well as invasive procedures with the associated anxiety and added clinical and financial burden.

ACKNOWLEDGMENTS
We thank K. Nicolaides for sharing published data. The paper was the result of fruitful discussions on the subject with several colleagues.

CONFLICT OF INTEREST STATEMENT
Howard S. Cuckle has received consulting fees and support for meeting attendance and/or traveling from PerkinElmer Inc.

ETHICS STATEMENT
This study analyzes the effects of using fetal fraction to discard samples from reporting. The only clinical data used in this paper is

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.