Whole‐Genome Promoter Profiling of Plasma DNA Exhibits Diagnostic Value for Placenta‐Origin Pregnancy Complications

Abstract Placenta‐origin pregnancy complications, including preeclampsia (PE), gestational diabetes mellitus (GDM), fetal growth restriction (FGR), and macrosomia (MA) are common occurrences in pregnancy, resulting in significant morbidity and mortality for both mother and fetus. However, despite their frequency, there are no reliable methods for the early diagnosis of these complications. Since cfDNA is mainly derived from placental trophoblasts and maternal hematopoietic cells, it might have information for gene expression which can be used for disease prediction. Here, low coverage whole‐genome sequencing on plasma DNA from 2,199 pregnancies is performed based on retrospective cohorts of 3,200 pregnant women. Read depth in the promoter regions is examined to define read‐depth distribution patterns of promoters for pregnancy complications and controls. Using machine learning methods, classifiers for predicting pregnancy complications are developed. Using these classifiers, complications are successfully predicted with an accuracy of 80.3%, 78.9%, 72.1%, and 83.0% for MA, FGR, GDM, and PE, respectively. The findings suggest that promoter profiling of cfDNA may be used as a biological biomarker for predicting pregnancy complications at early gestational age.

Base on the number of gestation age-matched healthy controls in our cohort, we selected four gestation age-matched healthy controls for MA, FGR and PE. As the number of gestation age-matched healthy controls was limited by 267 GDM cases, we selected three controls for each GDM case. The gestational age of four pregnancy complications and their corresponding controls were well matched in four cohorts (Supplemental Table S1; p > 0.05, Mann-Whiney U test). According to the time of sample collection and the sample size, the samples collected from Nanfang Hospital were divided into two cohorts, including training cohort (70% of samples) and internal cohort (30% of samples). The samples involved in The Third Affiliated Hospital of Sun Yat-sen University (SYSU) and Cangzhou People's Hospital were taken as external cohort-1 and external cohort-2, respectively. As some control samples were used in more than one pregnancy complication prediction, the number of total control is less than the sum control number for each pregnancy complication.

Isolation of cfDNA and whole-genome sequencing
Maternal blood was collected using a cfDNA BCT tube (Streck, USA). Cell-free plasma DNA (cfDNA) was extracted from a plasma sample using the QIAamp DNA Blood Mini kit (Qiagen, German) by following the manufacturer's instructions. DNA concentration and integration were measured using Qubit (ThermoFisher Scientific, USA) and Agilent Bioanalyzer 2100 (Agilent Technologies, USA). DNA was eluted in 50 µL AE buffer and stored at −20°C. For Life platform, libraries were prepared using the Ion Torrent Ampliseq 2.0 kit (ThermoFisher Scientific), according to the manufacturer's instructions. Samples were barcoded and quantified by qPCR using the Ion Xpress Barcode Adapter and the Ion Library TaqMan quantitation kit (ThermoFisher Scientific), respectively. Sequencing libraries were then sequenced on Ion Proton System on a P1 chip. For illumina platform, the DNA libraries were prepared using the TruSeq DNA Sample Prep reagents (Illumina, USA). After quantification on the LabChip GX microfluidic platform (Perkin-Elmer), the libraries were then sequenced using NextSeq. DNA sequencing was performed at a depth of 0.3 average coverage.

ID Description File
Supplemental Figure 1 Promoter read depth patterns of highly and lowly expressed genes for pregnancy complications In this file Supplemental Figure 2 Functional enrichment analysis of genes with differential promoter coverages In this file Supplemental  Table S1 Clinical characteristics of study pregnancies in four cohorts In this file Supplemental  Table S2  Clinical characteristics of healthy pregnancies  In this file   Supplemental  Table S3 500 most highest-and lowest-expressed genes in tissues In a separated excel file Supplemental  Table S4 Tissue-specific genes of placenta and whole blood cells In a separated excel file Supplemental Table S5 The unexpressed genes in all tissues In a separated excel file Supplemental  Table S6 Gene transcripts with differential read coverages at the pTSS In a separated excel file Supplemental  Table S7 Functional annotation of genes with differential promoter coverage by retrieving literatures In this file Supplemental Table S8 The performance of classifiers for predicting pregnancy complications In this file Supplemental Table S9 The performance of the optimal gene combination in the training cohort for macrosomia In this file Supplemental Table S10 The performance of the optimal gene combination in the training cohort for FGR In this file Supplemental Table S11 The performance of the optimal gene combination in the training cohort for GDM In this file Supplemental Table S12 The performance of the optimal gene combination in the training cohort for PE In this file Supplemental Table S13 The logistical regression equations of classifiers In this file Supplemental Table S14 The performance of the optimal classifiers based on promoter profiling of tissue-specific genes In this file Supplemental Table S15  The AUC comparison between different classifier sets  In this file   Supplemental  Table S16   The performance of clinical features for predicting pregnancy  complications  In this file   Supplemental  Table S17 Functional annotation of genes in classifiers by retrieving literatures In this file Supplemental  Table S18 Performance evaluation of the classifiers In this file

Supplemental Figures
Supplemental Figure 1. Promoter read depth patterns of highly and lowly expressed genes for pregnancy complications. Mean expression levels of the 500 most-(Top500, red) and least-expressed (Bottom500, blue) genes in the placenta and their promoter read coverages of the cfDNA derived from pregnancies with PE ( Table S1. Clinical characteristics of study pregnancies in four cohorts   In these equations, the gene coverage around TSSs ranging from -1 KB to +1 KB were substituted with the discretized value "one" when the level of each gene was larger than the corresponding cut-off (see Supplemental Table S8-S11); otherwise, it was substituted with the discretized value "zero". If the result of Logit p was larger than the corresponding threshold, the detected subject was predicted as pregnancy complications (MA, GFR, GDM, and PE); otherwise as non-obstetrical syndromes. MA = macrosomia. FGR = fetal growth restriction. GDM = gestational diabetes mellitus. PE = preeclampsia.
Supplemental Table S14. The performance of the optimal classifiers based on promoter profiling of tissue-specific genes Whole blood-specific genes Placenta-specific genes Whole blood-and placenta-specific