Detection of bladder cancer using urinary cell-free DNA and cellular DNA

Background The present study sought to identify a panel of DNA markers for noninvasive diagnosis using cell-free DNA (cfDNA) from urine supernatant or cellular DNA from urine sediments of hematuria patients. A panel of 48 bladder cancer-specific genes was selected. A next-generation sequencing-based assay with a cfDNA barcode-enabled single-molecule test was employed. Mutation profiles of blood, urine, and tumor sample from 16 bladder cancer patients were compared. Next, urinary cellular DNA and cfDNA were prospectively collected from 125 patients (92 bladder cancer cases and 33 controls) and analyzed using the 48-gene panel. The individual gene markers and combinations of markers were validated according to the pathology results. The mean areas under the receiver operating characteristic (ROC) curves (AUCs) obtained with the various modeling approaches were calculated and compared. Results This pilot study of 16 bladder cancer patients demonstrated that gene mutations in urine supernatant and sediments had better concordance with cancer tissue as compared with plasma. Logistic analyses suggested two powerful combinations of genes for genetic diagnostic modeling: five genes for urine supernatant (TERT, FGFR3, TP53, PIK3CA, and KRAS) and seven genes for urine sediments (TERT, FGFR3, TP53, HRAS, PIK3CA, KRAS, and ERBB2). The accuracy of the five-gene panel and the seven-gene panel in the validation cohort yielded AUCs of 0.94 [95% confidence interval (CI) 0.91–0.97] and 0.91 (95% CI 0.86–0.96), respectively. With the addition of age and gender, the diagnostic power of the urine supernatant five-gene model and the urine sediment seven-gene model improved as the revised AUCs were 0.9656 (95% CI 0.9368–0.9944) and 0.9587 (95% CI 0.9291–0.9883). Conclusions cfDNA from urine bears great diagnostic potential. A five-gene panel for urine supernatant and a seven-gene panel for urine sediments are promising options for identifying bladder cancer in hematuria patients.

sensitivity and/or specificity at this time. Hence, the development of a reliable noninvasive bladder cancer testing method as an alternative to cystoscopy remains of great value [6].
There are a number of molecules that can be measured in urine, including cell-free DNA (cfDNA), cellular DNA, different RNA classes (e.g., microRNAs, long noncoding RNAs, messenger RNAs), proteins, and exosomes [6][7][8][9][10][11][12][13]. Previous studies have demonstrated that DNA biomarkers detected in plasma or urine could be used to predict the risk of bladder cancer in patients with hematuria [14][15][16][17]. However, the predictive accuracy is affected by genomic complexity, tumor grade, and insufficient genomic DNA. cfDNA is present extensively as degraded nucleic acid fragments in various body fluids [18][19][20]. The genetic alterations in urinary cfDNA are reflective of those found within tumor cells. It has been reported that malignant and benign hematuria are associated with different gene mutations [21][22][23]. Moreover, some studies have revealed that urinary cfDNA had a higher tumor genome burden than that of cellular DNA, which may have an influence on diagnosis efficiency. Therefore, comparing the diagnostic accuracy of potential biomarker candidates in bladder cancer would be very useful [5,24,25].
In this study, a set of 48 bladder cancer-related genes was assessed using next-generation sequencing (NGS) and evaluated to elucidate their potential to differentiate malignancy from benign hematuria. The mutations in the 48 genes of interest were analyzed in these four types of samples: urine supernatant, urinary sediment, plasma, and tumor tissue. The aim of this study was to compare the diagnostic accuracy of urinary cellular DNA with cfDNA for bladder cancer diagnosis in hematuria patients.

Patients and clinical sample collection
Institutional review board approval was obtained prior to study initiation (NCT03066310) and all of the involved patients signed informed consent forms. Ninety-two patients with bladder cancer and 33 controls were enrolled. Urine, plasma, and tumor tissue samples were collected from 16 patients (14 men and two women; mean age: 67 years, range 51-84 years) with confirmed bladder cancer at Xiangya Hospital, The Second Xiangya Hospital, Hunan Provincial Tumor Hospital, and Hunan Provincial People's Hospital between January 2017 and November 2017. Only urine samples were collected from the remaining 76 clinically diagnosed bladder cancer patients. Patients and bladder tumor characteristics are summarized in Table 1. In this study, 33 cases of nontumor bladder disease patients (20 men and 13 women; mean age: 54 years, range 21-81 years) with hematuria were included as controls. Urine was the only specimen collected from the control cases.

Methods
This study consisted of three phases: a pilot study, the main study, and the finalization of diagnostic modeling, which are shown circumscribed in grey, blue, and red, respectively, in Fig. 1. In the pilot study, a panel of selected genes was used to compare and select the optimal biological samples for further examination. Sixteen cases with hematuria were included in the pilot study and four biological fluids/tissues (i.e., plasma, urine supernatant, urine sediment, and cancer tissue) were tested. The main study section circumscribed in blue was designed to elucidate the concordance of mutations between the biological specimens with cancer tissues, to ascertain the gene numbers required in the gene panel optimized for the diagnostic model, and to compare the diagnostic performance of urine supernatant and urine sediments. The third and final section of the study was designed to finalize the diagnostic model.

DNA extraction
Urine samples were collected prior to operation/cystoscopy and stored at 4 °C. Depending on the amount collected, 10-50 mL of urine was centrifuged at 1600g for 10 min at 4 °C. The resultant urinary supernatant and sediment were then aliquoted into new tubes. The sediment was stored at − 80 °C until assay. A total of 2 mL of urinary supernatant underwent additional highspeed centrifugation for 10 min at 12,000g to ensure the removal of any remaining contaminating cells and stored at − 80 °C until assay. Plasma samples were stored and processed within 72 h after collection. cfDNA from both urine and plasma were extracted using the GenMag Circulating Nucleic Acid Kit according to the manufacturer's protocol. Tumor tissue samples were stored in a 1-mL EP tube at 80 °C for DNA isolation. Genomic DNA from urine sediment and tumor tissue was extracted using the DNeasy Blood &Tissue Kit (250; Qiagen, Hilden, Germany) according to the manufacturer's protocol. The cfDNA and genomic DNA were quantified using Qubit3.0 and stored at − 20 °C. Genomic DNA extracted from urine sediment and tumor tissue was digested with NEBNext double-stranded DNA fragmentase (M0348) into 100-to 500-bp fragments followed by 2 × XP bead cleanup. The purified DNA was quantified using Qubit3.0.

Selection of bladder cancer marker genes and primer design
The Cancer Genome Atlas (TCGA) (IntOGen, Goleta, CA, USA), COSMIC, My Cancer Genome, CIViC, and PubMed databases were screened for selecting bladder cancer-related genes. The 150 top-most frequent bladder cancer-related mutations from TCGA, COSMIC, as well as bladder cancer-related genes published in My Cancer Genome, CIViC, and PubMed were included in our study. In total, 48 genes were included and gene-specific primers (Additional file 1: Table S1) were designed for NGS.

DNA mutation screening by cfBEST
For detecting low-abundance mutations in cfDNA, we developed a robust and versatile NGS-based cfDNA allelic molecule-counting system termed the cfDNA barcode-enabled single-molecule test (cfBEST). The accuracy of cfBEST was found to be comparable to that of ddPCR in a previous study [26]. Three procedures were included: prelibrary construction, sequence library (seqlibrary) construction, and sequencing. During prelibrary construction, 10 ng cfDNA (or fragmented genomic DNA) was incubated with End Repair & A-Tailing Enzyme Mix and Buffer at 37 °C for 20 min and, then, 72 °C for 20 min. Adapters (Illumina, San Diego, CA, USA) harboring a barcode and flanking 30-to 40-bp sequences for further priming were ligated to the A-tailed cfDNA (or fragmented genomic DNA) at 20 °C for 15 min with the help of DNA ligase (The adapter ratio was 100:1), followed by 0.4 × XP bead cleanup. Prelibraries were amplified using a thermocycler through 10 cycles with index primers and 2 × KAPA HiFi Hot Start Ready Mix, followed by 1 × XP bead cleanup.
For seq-library construction, three consecutive amplifications with sequence-overlapped nested primers were employed as follows: PCR-1 for the enrichment of target fragments by using target primers-1 and p7, followed by 2 × XP bead cleanup and capturing with M-270 Dyna beads coated with streptavidin; PCR-2 for the repeated enrichment of target fragments, followed by 2 × XP bead cleanup; and PCR-3 using two universal primers containing P5 and P7 sequences, respectively, followed by 2 × XP bead cleanup.
For sequencing, the seq-libraries were quantified with the ABI Step One ™ real-time PCR system (Thermo Fisher Scientific, Waltham, MA, USA), and then sequenced with Illumina Next-Seq 500 (Illumina, San Diego, CA, USA). Reads with 2 × 75-bp pair-end sequences were used to calculate mutation and allele ratios. The cfBEST was first calibrated using a commercial cfDNA standard template Multiplex I cfDNA Reference Standard Set, Cat. No. HD780; Horizon, Cambridge, UK) for the evaluation of sensitivity and specificity.
Next-generation sequencing data were first used to trace the unique molecules of the template to be analyzed. The unique procedures included the following steps: (1) categorization of all of the reads with the same sequences with sequencing depth; (2) identification of correct barcodes and removal of reads with wrong barcodes; (3) identification of primers and deleting reads without primer sequences; (4) blasting with the Shuman reference genome and deleting reads with any of the following features: one-sided matching, two-sided matching with mapping quality of less than 20, outside the 200 nucleotides within the target, and wrong nucleotides in primer regions; and (5) determining the unique reads in a set having four or more reads while the majority subset of the reads are at least three times more than the  second-largest subset within the set sharing the same barcode. The unique molecules were further treated by trimming the two terminal nucleotides decoded with low quality by sequencing and barcode-introduced nucleotides. The remaining unique sequences were blasted with a reference sequence of the human genome (GRch37) to elucidate genetic variants using the program of BWA (version 0.7.11-r1034). The bam documents were sorted and indexed with sam tools (1.2-66-g44e1a74), then locally blasted with Genome Analysis TK (version 3.1-1-g07a4bf8). The SNP and Indel were called with samtools mpile up and annotated with annovar. The cfBEST was used for data analysis and variantcalling. The reads with the same starting and ending positions and the same barcode reads were referred to as unique reads, and the unique reads with less than four in depth were filtered out. The average of unique reads for each sample are shown in Additional file 1: Table S2 and Additional file 2: Figure S1 (boxplot). To increase the predictive accuracy of the mutation data, inclusion criteria for a reporting mutation had to fulfill two conditions: mutation frequency of 0.005 or more and two or more unique reads [26].

Testing and fitting the diagnostic model
Three steps were performed for the 125 pairs of urine supernatant and sediments examined. An approximate ratio of 7:3 of these samples was assigned to training and validation groups, respectively. First, the target genes with mutations identified in the samples were ordered based upon their class-predictive importance using the random forest algorithm (R version 3.2.3; R Foundation for Statistical Computing, Vienna, Austria). Second, according to the number of target genes, a logistic model was obtained using the general linear model (glm) function (R version 3.2.3): x = a 0 + a 1 * gene 1 + · · · + a n * gene n and the model value x was then calculated. Third, the model value x was substituted into the sigmoid function f (x) = 1 1+e −x to get a fit value, f(x), for diagnostic purposes. A malignancy is suggested when f(x) > threshold , while the benign state is indicated otherwise, if f(x) ≤ threshold.
The top n (n = 1, 2…19, respectively) of positive genes from the training group were logistically modeled, and the models derived from the training group were applied to the validation group. Each variable in the model function was repeated 100 times to ensure reproducibility. When a diagnostic model was obtained, it was then applied to the real samples in the clinical study.

Statistics
A two-tailed t-test (R version 3.2.3) was employed for the analysis of age, number of mutations, and mutation frequencies, with a p-value of less than 0.05 considered to be statistically significant. The glm function (R version 3.2.3) was used for modeling, and the model had to show a power of more than 0.95.

Differences in the overlap of gene mutations observed in urine supernatant, urine sediments, and plasma when compared with bladder cancer tissues
The cumulative mutation rates observed in the four types of body fluids/tissue samples from the 16 patients with hematuria were compared. As shown in Fig. 2, the cumulative mutation rate of DNA isolated from plasma was the lowest, while the cumulative mutation rates noted in urine supernatant and urine sediments were higher and closer to that of the cancer tissue. Similarly, the urine samples also showed a higher overlap of mutations relative to the cancer tissue, with the overlap being higher than that seen in the plasma samples (Additional file 3: Figure S2). The numbers of mutations identified that were identical to those seen in the cancer tissue were 24, 18, and 1, respectively, in the DNA samples isolated from urine supernatant, urine sediments, and plasma. In addition, there was no significant difference in average mutation depth among the four types of samples (Additional file 2: Figure S1). These data clearly demonstrate that urine supernatant and sediments better reflect the genetic changes in bladder cancer tissue samples as compared with the plasma and may, hence, be better suited for diagnostic purposes.

Urine supernatant and urine sediments for genetic diagnostics of bladder cancer in an expanded cohort of hematuria samples
The above pilot study compared different body fluids/tissues, considering their diagnostic value in bladder cancer, and demonstrated the superiority of urine supernatant and sediments. We then recruited an expanded cohort of 125 cases (92 bladder cancer cases and 33 controls) with hematuria. As compared with in the pilot study of 16 cases in which positive mutations (i.e., shared with the paired cancer tissue) were identified in 14 and 10 genes, the number of genes with mutations identified in the 125 cases were 19 and 15 genes, respectively, in urine supernatant and urine sediments (Additional file 4: Figure S3).
The kappa values of the detection rates of supernatant and sediment samples with gene variations are shown in Table 2. Only 9 of 92 cancer samples had a kappa value of 0, while 10 of the 33 control samples showed the same, which indicated that the consistency of supernatant and sediment samples in the cancer group achieved better  Table 3, where the ACTB, CUL1, EGFR, and U2AF1 genes were only detected in a portion of urine supernatant samples. For the 15 genes with mutations shared by both the urine supernatant and sediments, genes with relatively higher mutation rates in cancer patients nearly overlapped in both samples, including genes, such as TERT, FGFR3, TP53, PIK3CA, and KRAS ( Table 4). As is illustrated in Fig. 3, there was a high mutation relevance ratio for urine supernatant and urine sediments of the cancer samples, while few mutations were identified in the controls. However, mutations PIK3CA p.H1047R and FGFR3 p.S249C were found at a high frequency among urine sediments in the normal sample. As plotted in Fig. 4, there were no significant differences in mutations in the urine supernatant relative to in the paired urine sediments (p = 0.201 by t-test).

Diagnostic model based on urine supernatant and urine sediments
The genes with positive mutations (i.e., shared with the paired cancer tissue) were ranked by employing random forest analysis using the mutation data from urine supernatant and sediments. Logistic models were developed using the training group and then tested in the validation group. The detailed parameters used in these analyses are listed in Additional file 1: Table S3. These logistic analyses highlighted two powerful combinations of genes for genetic diagnostic modeling: five genes for urine supernatant (TERT, FGFR3, TP53, PIK3CA, and KRAS) and seven genes for urine sediments (TERT, FGFR3, TP53, HRAS, PIK3CA, KRAS, and ERBB2). As shown in Fig. 5, all four diagnostic parameters in areas under the receiver operating characteristic (ROC) curves (AUCs) using urine supernatant nearly reached their plateaus when the combination included five genes, while they definitively reached the plateaus when the combination included seven genes for the urine sediments.
After identifying the two combinations of genes useful in genetic diagnostic modeling, a serial calculation of AUCs for individual and different combinations of genes was performed (Additional file 1: Table S4 and Fig. 6). Among the AUCs derived using the 125 urine samples, the AUC of a five-gene panel from urine supernatant [AUC: 0.94; (95% confidence interval (CI) 0.91-0.97] (Fig. 6c) and that of a seven-gene panel from urine sediments (AUC: 0.91, 95% CI 0.86-0.96) (Fig. 6d) performed better than all of the others.

Discussion
This study compares the diagnostic potential of urine supernatant, urine sediment, and plasma with tumor tissue samples obtained from the same subjects in the identification of malignancy in patients presenting with hematuria. In total, 48 bladder cancer-related candidate genes were analyzed in these four types of specimens. The cfDNA mutations identified in the urine supernatant and sediment were found to be the richest in comparison with plasma samples drawn from the same cases analyzed. Bioinformatics analysis of the urinary DNA mutation information yield diagnostic models consisting of five target genes (TERT, FGFR3, TP53, PIK3CA, and KRAS) and seven genes (TERT, FGFR3, TP53, HRAS, PIK3CA, KRAS, PIK3CA, and KRAS), using urine supernatant or urine sediments respectively, for the successful identification of malignancy in patients with hematuria.
The number and frequency of mutations among different biological samples, such as plasma, urine supernatant, and urine sediment, can be variable depending on the type and presence of metastasis of bladder cancer. Although the cases were limited, the data from the study demonstrated that the urine of patients with malignant bladder cancer showed the highest total number of mutations, with the normal urine sample exhibiting the lowest average number of total mutations. Although the total number of mutations is informative, it could not be directly used for diagnostics as these totals overlapped between patients suffering from malignant bladder cancers and the controls.
In the pilot study of 16 cases, FGFR3 ranked the highest in terms of cumulative mutation frequency and this was subsequently validated in the expanded cohort of 125 cases; hence, this gene was included in both the five-gene and seven-gene diagnostic panels. In the pilot analysis comparing mutations from different specimens, the second richest source of mutations was observed in the gene KDM6A [22]. However, when the cohort was expanded to 125 cases, cumulative mutations in KDM6A were no longer elevated in bladder cancer samples so this gene was not used for construction of the diagnostic panel.
As a driving gene in bladder cancer, TERT mutations have been suggested to be useful in the genetic diagnosis and monitoring of bladder cancer recurrence [22,27,28]. Overall, TERT mutations can be found in about 50% of bladder cancers (COSMIC database) and the mutation rate could be as high as 70% [28]. In addition to its involvement in the development of bladder cancer, mutations in the TERT reporter region have been used in the screening of other cancer types, such as lung cancer [29]. As illustrated in Fig. 3, high rates of the TERT promoter-region mutation C228T were observed in the tested samples.
The present study narrowed the 48 candidate genes down to five genes or seven genes in our diagnostic models for identifying malignancy in subjects with hematuria. Considering the 12 false negatives (Additional file 1: Table S5) identified in this analysis, more biomarkers are still required for further screening as the 12 false negatives could still not be correctly diagnosed even when using the entire panel of 48 genes. Another possible explanation for the false negatives might be variations in genes involved in carcinogenesis in different subpopulations of patients from whom the data were obtained. Furthermore, the integration of more cancer-related biomarkers may increase the sensitivities and specificities in the genetic diagnosis of bladder cancer. The combination of mutation analysis and methylation assays could substantially increase the power in the genetic diagnosis Averaged diagnostic parameters of the logistic models for the genetic diagnosis of bladder cancer. For urine supernatant, all parameters of the models are satisfactory when gene combinations having three, four, and five genes (TERT, FGFR3, TP53, PIK3CA, and KRAS) were used. In contrast, the models using urine sediments require the combination of seven (TERT, FGFR3, TP53, HRAS, PIK3CA, KRAS, and ERBB2) or more genes to attain satisfactory predictive potential. The X-axis represents the number of genes factored into the model. The Y-axis plots diagnostic potential, with 1.0 representing perfect disease discrimination of bladder cancer as well [30]. In a study with 31 cases, 24 reported an AUC of 0.96 (95% CI 0.92-0.99) with a sensitivity of 93% and specificity of 86% when the mutation analysis covered FGFR3, TERT, and HRAS as well as when the methylation assay covered the OTX1, ONE-CUT2, and TWIST1 genes [14]. Additionally, RNAs in the urine may also serve as cancer biomarkers [9,11,31]. These epigenetic and RNA biomarkers should be evaluated in future research with larger sample sizes.
Clinical information, including subject age and gender, was integrated into the diagnostic models developed using the urine supernatant and sediments. We have tested the possible impact of adding age and gender in improving the diagnostic power of our model based on the genetic information. As shown in Additional file 5: Figure S4, the AUC of 0.94 (95% CI 0.91-0.97) associated with the fivegene model using urine supernatant improved slightly to 0.97 (95% CI 0.94-0.99) when demographic information was added. With the addition of age and gender, the diagnostic power of the urine sediment seven-gene model improved, showing an AUC of 0.96 (95% CI 0.93-0.99). No significant differences were found between staging, gender, and age when we tested the model performance with different staging data, different genders, and different ages (Additional file 1: Tables S6-S8). Additionally, the five-gene model of urine supernatant showed no significant differences in comparison with the urine sediment seven-gene model (Additional file 1: Table S9).