Droplet digital PCR is a cost‐effective method for analyzing long cell‐free DNA in maternal plasma: Application in preeclampsia

Long cell‐free DNA (cfDNA) can be found in the plasma of pregnant women and cancer patients. We investigated if droplet digital PCR (ddPCR) can analyze such molecules for diagnostic purposes using preeclampsia as a model.

- 1385 � A reduction in the percentage of long cfDNA was found in plasma from preeclamptic patients by SMRT sequencing.

What does this study add?
� Using droplet digital PCR (ddPCR), the percentages of long cfDNA >533 bp and >1001 bp in the plasma of normal pregnancies were 8.7% and 4.6%, respectively.
� Size analysis of cfDNA in maternal plasma using the ddPCR assay targeting LINE-1 regions is a new type of biomarker for preeclampsia.
� ddPCR is a cost-effective method for unlocking the diagnostic information of long cfDNA in plasma.

| INTRODUCTION
Analysis of circulating cell-free DNA (cfDNA) in maternal plasma offers a noninvasive method for studying pregnancy-related conditions.Recently, there has been growing interest in the study of the molecular features of cfDNA, 1 including the sizes, 2 preferred ends, 3 end motifs, 4 and jagged ends. 5These studies have provided new insights into the biology and clinical applications of cfDNA during pregnancy.As an example, by exploiting the size difference between maternal (modal size at 166 bp) and fetal (modal size at 143 bp) DNA molecules in maternal plasma, an approach was developed for detecting fetal aneuploidies. 6Moreover, such a size-based approach could be combined with the well-established count-based approach to determine whether a fetus has inherited a subchromosomal copy number aberration from its mother. 7udies on the molecular features of cfDNA mainly utilized the next-generation sequencing technology (i.e., the Illumina sequencingby-synthesis technology).However, this technology is not suitable for sequencing DNA fragments longer than 500 bp as long DNA fragments are generally not suited for cluster generation by bridge amplification on the flow cell. 8,9Recently, Yu et al. revealed a large proportion of analyzable long cfDNA molecules in maternal plasma 10 using a single-molecule sequencing technology (i.e., single molecule real-time [SMRT] sequencing by Pacific Biosciences [PacBio]).Using this technology, they further revealed a reduction in the proportion of such long cfDNA molecules in maternal plasma samples from pregnancies with preeclampsia, which enabled them to develop a method based on the size analysis of cfDNA for differentiating pregnancies with and without preeclampsia.Compared with the use of SMRT sequencing, the use of Illumina sequencing for the cfDNA size-based detection of preeclampsia showed inferior performance, which might be related to the incomplete size spectrum of cfDNA (i.e., typically ranging from 50 to 500 bp) generated with the Illumina platform.Hence, the analysis of the size distribution of cfDNA by properly selecting the analytical tools might reveal the previously unexplored biological properties of cfDNA that could be utilized for molecular diagnostics.
In this study, we aimed to investigate whether droplet digital PCR (ddPCR) can be used as a cost-effective approach to analyze the size alteration of cfDNA in maternal plasma samples.The first objective of this study was to use ddPCR as an independent technology to assess the percentage of long cfDNA in normal pregnancy.
For this purpose, we aimed to design ddPCR assays targeting a single-copy gene and analyze the long cfDNA at sizes of approximately 500 bp and 1000 bp, in accordance with the main sizes analyzed in previous studies using SMRT sequencing and nanopore sequencing by Oxford Nanopore Technologies (ONT). 10,11As a result, two ddPCR assays targeting a common house-keeping gene, the valosin-containing protein (VCP) gene, were designed to measure the percentages of plasma cfDNA >533 bp and >1001 bp, respectively.The second objective was to evaluate the feasibility of detecting preeclampsia using ddPCR-based size distribution analysis of plasma cfDNA.To determine the optimal parameters (i.e., the optimal size of the long amplicon and the minimum number of fragments required for ddPCR analysis) for the ddPCR assay to provide the best differentiating power between control and preeclamptic pregnancies, we conducted in silico dPCR simulations using cfDNA sequencing data generated using a PacBio sequencer. 10 a result, a ddPCR assay targeting the long interspersed nuclear element-1 (LINE-1) repetitive sequences was designed to determine the percentage of long cfDNA >170 bp in a plasma sample for the detection of preeclampsia.extracted using the QIAamp Circulating Nucleic Acid Kit (Qiagen).

| Sample collection and processing
The extracted DNA was quantified using Qubit 3.0 (Invitrogen).

| In silico digital PCR analysis
In silico digital PCR analysis was performed to determine the optimal parameters for the design of an assay that could reflect the difference in the size distribution of cfDNA in normal pregnancies and those complicated with preeclampsia (Figure 1).The analysis was conducted using SMRT sequencing data of plasma cfDNA from 10 preeclamptic and 10 normal pregnancies from a previous study 10 (Figure 1A). -138 We evaluated two parameters for optimizing the ddPCR assay: the size of the long amplicon and the minimum number of fragments required for the ddPCR analysis (Figure 1C).L% was calculated for each sample in the preeclamptic and control groups.The performance of the classifier that was based on the L% determined by ddPCR assays using different parameters (i.e., different long amplicon sizes or different numbers of HiFi reads) was evaluated using the area under the curve (AUC) of the receiver operating characteristic (ROC) curve (Figure 1D).The parameter value that provided the best performance was selected.The details are provided in the online Supporting Information S1, Materials and Methods.

| Size distributions of maternal plasma DNA molecules determined by different platforms
We analyzed plasma cfDNA samples from sixteen third-trimester pregnancies using the VCP 533 bp and VCP 1001 bp ddPCR assays.The median percentages of cfDNA molecules >533 bp and >1001 bp in plasma were 8.7% and 4.6%, respectively.Previous studies have evaluated the percentages of long cfDNA fragments >500 bp and >1000 bp in maternal plasma samples from thirdtrimester pregnancies using different sequencing platforms, including Illumina, PacBio, and ONT sequencing. 10,11Table 3 shows that droplet digital PCR detected higher percentages of long cfDNA molecules than the Illumina and ONT platforms but lower percentages of long cfDNA molecules than the PacBio platform.

| Size analysis of plasma DNA from preeclamptic patients measured by VCP assays
We then examined whether the percentage of long cfDNA molecules determined by the two VCP assays could be used to differentiate preeclamptic patients from healthy pregnant controls.Plasma cfDNA samples of ten preeclamptic and sixteen healthy control pregnancies were analyzed.Using the VCP 533 bp assay, the percentage of long cfDNA >533 bp was significantly lower in the preeclamptic group (median, 6.6%; range, 3.2%-10.4%)than in the control group (median, 8.7%; range, 5.4%-16.8%)(Mann-Whitney U test, p = 0.014) (Figure 3A).Using the VCP 1001 bp assay, the percentage of long cfDNA >1001 bp was lower in the preeclamptic group (median, 4.1%; range, 2.2%-6.2%)than in the control group (median, 4.6%; range, 3.5%-12.8%),but the difference was not statistically significant (Mann-Whitney U test, p = 0.121) (Figure 3B).Furthermore, the ranking of the datapoints corresponded well between the two VCP assays (Spearman's r = 0.87, p < 0.0001; supporting information, Figure S1), which indicated that combining the power of the two assays cannot further improve the differentiating power.

| In silico dPCR analysis using SMRT sequencing data to guide the design of the ddPCR assay
To design a ddPCR assay for a better differentiation between preeclamptic patients and healthy pregnant controls, we performed an in silico dPCR simulation using SMRT sequencing data of plasma cfDNA from preeclamptic and normal pregnancies from a previous publication 10 (Figure 1).The optimal size of the long amplicon was determined by conducting simulations with different sizes of the long amplicon ranging from 100 to 1000 bp.As shown in Figure 4A, a long amplicon size of 170 bp provided the highest AUC among the simulated sizes.To determine the minimum number of fragments required for ddPCR analysis, simulations were conducted using different numbers of HiFi reads ranging from 5 to 500,000.As shown in Figure 4B, with increasing numbers of fragments used, the median AUC increased while the SD decreased, reaching a plateau at 5000 fragments (median AUC ≥0.95, and SD < 0.015).Hence, the results of the in silico dPCR analysis indicated that a minimum of 5000 fragments would be required to achieve robust discriminatory power.
Consequently, if a ddPCR assay targeted only a single genomic locus (e.g., a single-copy gene), it would require at least 5000 haploid genome equivalents of DNA for each sample, which can be challenging for some patients with low plasma cfDNA levels.
In view of this, we developed a ddPCR assay targeting the LINE-1 repetitive sequences (Figure 2).The LINE-1 assay targeted approximately 1600 regions that were repeated across the human genome.We first conducted an in silico dPCR analysis using the SMRT sequencing data to evaluate the performance of the LINE-1 assay.As shown in the simulation, the preeclamptic group had a significantly lower percentage of long DNA fragments than the control group (median: 64% vs. 75%, p = 0.0002) (Supporting information, Figure S2A).The AUC of the ROC curve analysis was 0.95 (Supporting information, Figure S2B).

| Size analysis of plasma cfDNA from preeclamptic patients measured by the LINE-1 assay
We examined the size distributions of cfDNA in maternal plasma samples from the ten preeclamptic and sixteen normal pregnancies using the LINE-1 ddPCR assay.The preeclamptic group had a significantly lower percentage of long cfDNA of >170 bp (median, 28.9%; range, 24.6%-34.1%)compared to the control group (median, 35.1%; range, 30.3%-43.2%)(Mann-Whitney U test, p < 0.0001) (Figure 5A).
The AUC of the ROC analysis was 0.94 for differentiating the two groups (Figure 5B).These results were comparable to the predicted performance based on the in silico dPCR analysis.Using a cutoff of 30.2% of the percentage of long cfDNA >170 bp, the specificity and sensitivity for detecting preeclamptic patients were 100% and 80%, respectively.The percentage of long cfDNA >170 bp did not correlate with gestational age in a significant manner in both groups [Spearman correlation, p-values for preeclamptic subjects, and control subjects, were 0.679, and 0.536, respectively (Data not shown)].
We used ROC curve analysis to determine which marker would be the most useful for differentiating the preeclamptic and control subjects (Figure 5B).In this study, we confirmed the presence of long cfDNA in maternal plasma using ddPCR and demonstrated the use of ddPCR for the size analysis of cfDNA in maternal plasma for the detection of preeclampsia.A recent study using PacBio SMRT sequencing revealed a large percentage of analyzable long cfDNA molecules in maternal plasma. 10The percentages of long DNA molecules >500 bp and >1000 bp were determined to be 32.3% and 22.0%, respectively.A more recent study using nanopore sequencing by ONT also detected long cfDNA in maternal plasma, although in smaller percentages (5.4% for >500 bp and 0.79% for >1000 bp). 11Considering these differences based on the analytical platform used, the present study used a third method, droplet digital PCR, to determine the percentage of long cfDNA in maternal plasma.By using ddPCR assays targeting a single-copy gene, the VCP gene, we found that the percentages of long cfDNA >533 bp and >1001 bp in plasma samples from third-trimester pregnancies were 8.7% and 4.6%, respectively.The discrepancy in the percentages of long cfDNA fragments across the different analytical platforms may be explained by the fact that single-molecule sequencing is a locus-independent method which analyzes cfDNA molecules from across whole genome, whereas ddPCR is a locus-specific method which only analyzes cfDNA molecules from a targeted genomic locus.The latter would therefore be affected by the nonrandom fragmentation of cfDNA, 12,13 possibly giving different percentages of long cfDNA when ddPCR assays targeting different genomic regions were used.
Moreover, the two single-molecule sequencing platforms, particularly PacBio, have been reported to prefer sequencing long fragments, 11 whereas ddPCR measures the absolute number of long and short molecules at a targeted genomic locus.Nevertheless, using ddPCR, we confirmed the presence of long cfDNA in maternal plasma and provided a reference for further research on this topic.Previously, studies conducted by Chan et al. 14 and Fernando et al. 15 have used quantitative PCR (qPCR) and ddPCR, respectively, to study the size distribution of cfDNA in plasma samples from healthy pregnant women.For both methods, separate PCR reactions were required for short and long amplicons, which resulted in reduced accuracy in the analysis due to sampling variations between reactions.Perhaps this explains the unexpected finding by Fernando et al. that the percentage of cfDNA fragments >905 bp (i.e., 23%) was higher than the percentage of cfDNA fragment >490 bp (i.e., 14%) in maternal plasma. 15In the present study, we performed multiplex ddPCR to simultaneously quantify short and long amplicons in a single PCR reaction, thereby eliminating sampling variations.
While the percentages of long cfDNA as determined by the VCP 533 bp assay and the VCP 1001 bp assay differed between pregnant women with and without preeclampsia, the power of differentiation was suboptimal for clinical use.To enhance the power of differentiation, we developed the LINE-1 ddPCR assay based on the results of the in silico PCR simulation analyses using SMRT sequencing data.
The LINE-1 assay achieved a higher AUC of 0.94 for detecting preeclampsia compared to 0.79 for the VCP 533 bp assay and 0.69 for the VCP 1001 bp assay.The improvement can be attributed to two factors: (1) the use of 170 bp as a threshold for long cfDNA, and (2) the use of repetitive sequences rather than a single-copy gene.The threshold of 170 bp was also used in the previous study using the PacBio platform for the size analysis of cfDNA for detecting preeclampsia. 10 for the detection of preeclampsia.However, it should be noted that the percentages of long DNA fragments >170 bp determined by the in silico dPCR analysis using SMRT sequencing data (64% and 75%, respectively) were higher than those determined by the actual experiment (28.9% and 35.1%, respectively) for both preeclamptic and control groups.As a result, the cutoff value of the percentage of long fragments for differentiating between pregnancies with and without preeclampsia should still be derived experimentally.
We demonstrated that size distribution analysis of cfDNA using LINE-1 ddPCR assay had a performance comparable to that previously reported using SMRT sequencing for detecting preeclampsia early treatment with low-dose aspirin has been shown to reduce the incidence of preterm preeclampsia in high-risk pregnancies. 16Due to the small sample size of this study, further studies with larger sample sizes would be required to validate the findings.Furthermore, there may be some confounding factors (such as ethnicity, maternal age, other maternal health characteristics, the stage and severity of preeclampsia, etc.) that could influence the percentage of long cfDNA in maternal circulation.The effects of these factors should be explored in future studies with large sample In summary, we demonstrated that the size distribution analysis of maternal plasma cfDNA using the ddPCR targeting the LINE-1 region represents a potentially useful noninvasive approach for detecting preeclampsia.For the analysis, only a small amount of plasma cfDNA is required, namely 10 pg.In the future, this approach should be evaluated in larger cohorts with preeclamptic pregnancies of varying degrees of disease severity and at earlier stages of the disease.
Two ddPCR assays targeting the VCP gene were developed to profile nucleic acids longer than 1001 and 533 bp.For the VCP 1001 bp assay which measured the percentage of DNA fragments longer than 1001 bp, two pairs of primers (VCP_0 Forward Primer/VCP_0 Reverse primer, and VCP_1001 Forward Primer/VCP_1001 Reverse Primer) were used to amplify two regions (namely the 73-bp VCP 0 region and the 68-bp VCP 1001 region) that were separated by 1001 bp; and two probes (VCP_0 Probe and VCP_1001 Probe) were used to detect their respective amplicons.Similarly, for the VCP 533 bp assay which measured the percentage of DNA fragments longer than 533 bp, two pairs of primers (VCP_0 Forward Primer/VCP_0 Reverse primer, and VCP_533 Forward Primer/VCP_533 Reverse Primer) were used to amplify two regions (namely the 73-bp VCP 0 region and the 72-bp VCP 533 region) that were separated by 533 bp; and two probes (VCP_0 Probe and VCP_533 Probe) were used to detect their respective amplicons.The sequences of primers and probes are listed in Table 1.Droplet digital PCR reactions were performed using the QX ONE Droplet Digital PCR System (Bio-Rad).Details about the PCR protocol are provided in the online Supporting Information S1.Both assays used droplets containing positive signals for the 73-bp VCP 0 amplicon to determine the total number of DNA molecules in a sample.Droplets containing positive signals for both the VCP 0 and the VCP 1001 or VCP 533 amplicons were identified as containing long DNA molecules that were longer than 1001 or 533 bp.The percentage of long DNA molecules of >1001 bp or >533 bp was calculated by dividing the number of positive droplets containing long DNA molecules by the total number of droplets containing the VCP 0 amplicon.The calculation took into account the coincidental colocalization of two short amplicons that resulted in a false-positive result for a long molecule (see details in the online Supporting Information S1, Materials and Methods).

Figure
Figure1Billustrates the design of a digital PCR assay for analyzing the size distribution of cfDNA.The assay consisted of one forward primer, F, and two reverse primers, R1 and R2.The primer pair F/R1 would produce a short amplicon, whereas F/R2 would produce a long amplicon for a targeted genomic locus.The size of the short amplicon was fixed at 70 bp, which corresponded to the shortest possible size of an amplicon [that is, the lengths of two primers (25 bp � 2) and a probe (20 bp)].HiFi reads from SMRT sequencing that spanned the annealing sites for the F and R2 primers would be identified by the in silico dPCR as long DNA fragments, whereas HiFi reads that spanned the annealing sites of the F and R1 primers, but not the R2 primers, would be identified by the in silico dPCR as short DNA fragments.The percentage of long cfDNA fragments (denoted by L%) in a plasma sample was then calculated by dividing the number of dPCR long fragments by the sum of dPCR long and short fragments in the in silico dPCR analysis.

F I G U R E 1
An illustration of digital PCR design guided by in silico digital PCR analysis using long DNA sequencing data.(A) The analysis was conducted using single molecule real-time (SMRT) sequencing data of plasma cfDNA from 10 preeclamptic and normal pregnancies.(B) A droplet digital PCR assay for determining the size distribution of cfDNA used three primers (denoted by blue arrows): F, R1, and R2.The in silico dPCR would identify HiFi reads from SMRT sequencing spanning the annealing sites of F and R2 primers as long DNA fragments (denoted by thick orange lines), and HiFi reads spanning F and R1 primers but not R2 primers as short DNA fragments (denoted by thick green lines).The percentage of long cfDNA fragments (L%) in plasma was then calculated.(C) We evaluated two parameters for optimizing the ddPCR assay: the size of the long amplicon and the minimum number of fragments required.(D) L% was calculated for each sample in the preeclamptic and control groups.The performance of the classifier using different parameter values was evaluated using the area under the curve (AUC) of the receiver operating characteristic curve (ROC).[Colour figure can be viewed at wileyonlinelibrary.com] 1388 -GAI ET AL.

Figure 2
Figure 2 illustrates the design of a ddPCR assay based on the repetitive elements, LINE-1, for the size analysis of cfDNA that was used to differentiate preeclamptic patients from healthy pregnant subjects.This assay targeted approximately 1600 regions that were repeated across the human genome.The LINE-1 assay included three primers: LINE-1 Forward Primer, LINE-1 Reverse Primer 1, and LINE-1 Reverse Primer 2, as well as two probes, LINE-1_70 bp Probe, and LINE-1_170bp Probe.The LINE-1 Forward Primer/LINE-1 Reverse Primer 1 would produce a PCR product of 70 bp, and the LINE-1

2. 5 |F I G U R E 2 T A B L E 2
Statistical analysisStatistical analyses were conducted with R version 3.5.0(https:// www.R-project.org/).A p-value of <0.05 was considered statistically significant.The difference in the percentages of long cfDNA fragments between the preeclamptic and control groups was evaluated using the Mann-Whitney U test.The design of the LINE-1 droplet digital PCR assay.The assay targeted approximately 1600 LINE-1 repeated sequences in the human genome.It used a common forward primer and two different reverse primers to produce a short amplicon of 70 bp and a long amplicon of 170 bp within the LINE-1 repetitive elements.In a ddPCR reaction, droplets containing a long cfDNA fragment spanning the Forward Primer and the Reverse Primer 2 will show dual positive signals while the droplets containing a short cfDNA fragment spanning only the Forward Primer and the Reverse Primer 1 will show only a single positive signal.[Colour figure can be viewed at wileyonlinelibrary.com]The sequences for PCR primers and probes for the LINE-1 assay.LINE-1 forward primer 5 0 -CTCTGAGCTACGGGAGGACATT-3 0 LINE-1 reverse primer 1 5 0 -TTCTTCTAAATTTTTTTCAAAGTTTTCAAC-3 0 LINE-1 reverse primer 2 5'-TCCTGAGGCTTCTGCATTCTTC-3' LINE-1_70 bp probe 5 0 -Cy5-CTTTGCCTTTGGTTTG-MGB-3 0 LINE-1_170 bp probe 5 0 -FAM-AGCCTTGGTTTTCAG-MGB-3 0 3 | RESULTS The LINE-1 assay had the highest AUC (AUC = 0.94) when compared with the VCP 533 bp assay (AUC = 0.79) and the VCP 1001 bp assay (AUC = 0.69).F I G U R E 4 In silico digital PCR simulations were conducted using a range of (A) different sizes of the long amplicon and (B) different numbers of fragments.The performance of the cfDNA size-based classifier for differentiating preeclamptic patients and healthy pregnant subjects using different parameter values were evaluated on the basis of the area under the curve (AUC) of the receiver operating characteristic curve (ROC).[Colour figure can be viewed at wileyonlinelibrary.com]F I R E 5 (A) Box plots showing the percentages of long cfDNA fragments >170 bp determined by the LINE-1 droplet digital PCR assay in ten preeclamptic and sixteen control pregnancies.(B) Receiver operating characteristic (ROC) curves showing the performance of the size analysis of cfDNA using the LINE-1, the valosin-containing protein (VCP) 533 bp, and the VCP 1001 bp droplet digital PCR assays for differentiating preeclamptic patients and healthy pregnant controls.[Colour figure can be viewed at wileyonlinelibrary.com] Further studies are necessary to unravel the biological mechanisms contributing to the changes in the size distributions of cfDNA in preeclampsia.According to the results from in silico dPCR analysis, a minimum of 5000 fragments in the target region would be required for the plasma DNA size-based classifier for detecting preeclampsia.The LINE-1 assay targeted approximately 1600 repeated regions across the genome, resulting in a 1600-fold enrichment of target fragments compared to single-copy gene assays (e.g., the VCP 533 bp and the VCP 1001 bp assays).Thus, only 10 pg of cfDNA would be required (as one haploid genome equivalent corresponds to 3.3 pg of DNA) to obtain 5000 target fragments for the LINE-1 ddPCR assay.In other words, the use of repetitive sequences for the ddPCR-based size analysis of cfDNA allowed a more efficient use of cfDNA molecules in the plasma sample.Our study demonstrated that in silico dPCR simulation is an effective tool for guiding the design of ddPCR assays, thus reducing research costs and time.The experimental results of the LINE-1 ddPCR assay demonstrated comparable differentiating power to the in silico dPCR analysis (area under the ROC curve: 0.94 vs. 0.95)

(
area under the ROC curve: 0.94 vs. 1).The ddPCR-based approach has several advantages over the single-molecule sequencing-based approach: First, the cost is much lower for ddPCR-based analysis, with one test costing approximately $60 (in USD) (including DNA extraction at $28 and ddPCR in duplicate reactions at $32).In comparison, a PacBio SMRT sequencing analysis to detect long cfDNA in preeclampsia would cost approximately $1928 (including DNA extraction at $28 and PacBio sequencing using one flow cell at around $1900).Second, the turnaround time is much shorter as it takes only around 6 hours to complete the test.Finally, ddPCR requires only a small amount of plasma cfDNA (10 pg).Hence, the use of ddPCR made the detection of preeclampsia by the size analysis of cfDNA more clinically applicable.Further investigation is required to explore whether the reduction of long cfDNA in maternal plasma precedes the onset of clinical manifestations of preeclampsia.If so, the size analysis of cfDNA might enable early detection of at-risk pregnancies.A sensitive predictive biomarker for preeclampsia would be clinically valuable since 1392 -GAI ET AL.