Integrated analysis of circulating tumour cells and circulating tumour DNA to detect minimal residual disease in hepatocellular carcinoma

and considered significant at P < DCP, des-gamma-carboxy tomography; MRI, magnetic resonance imaging; PET-CT, positron emission tomography.

. Clinical and pathological characteristics of HCC patients and univariate analysis of recurrence  Table   S1 and Figure S2A

CTCs enrichment and identification
CTCs enrichment was performed using a CytoQuest CR microfluidic system (ABNOVA). Buffy coats were rinsed with RPMI medium and gently injected into an asialoglycoprotein receptor (ASGPR)-coated microfluidic slide at room temperature (CAT: KA4573, ABNOVA). (1)(2)(3)(4) After the slide was fixed and ventilated, the captured cells were immunostained with an antibody cocktail containing anti-pan-cytokeratins Candidate somatic mutations were detected by comparing the sequencing data from tumor tissue samples using MuTect1 and Strelka software. The criteria adopted for defining mutations in tumors were allele fraction ≥1% and total reads ≥ 4. All selected mutations were validated by manual inspection using the Integrated Genome Viewer (IGV). (8)  on an Illumina Novaseq 6000 with a median depth of 13417× after removing duplicate molecules. SNV calling and annotation of candidate variations were performed as previously described. (7) The ctDNA/cfDNA ratio model

ctDNA by PPWES and bioinformatics analysis
The ctDNA/cfDNA ratio model was performed using R statistical environment version 3.6.3. The model was developed to estimate the ctDNA fraction based on allele frequency and sequencing depth of somatic mutations in tumor tissue and paired plasma samples, which was described in the previous study. (9) It was based on the following assumptions: (a). Because of the low fraction of ctDNA, there is a chance that some mutations present in the tumor tissue were not detected in the corresponding plasma sample. The algorithm is also based on the assumption that this tumor-plasma mismatches results from the low concentration of mutant molecules and/or randomness in sampling. (b). The estimation model assumes that mutant allele reads, and non-mutant wild-type allele reads fit well with a binomial distribution. To evaluate the biological noise of random mutations in patient plasma samples, we determined the fraction of 20 mutations that were not detected in matched tumor samples. To achieve 100% specificity, customised primers from more than five patients were pooled for cross-validation and were used to exclude non-specific noise. For example, a tumor-specific T > C mutation of gene A was identified in the matched plasma sample, and this variant was identified in another plasma sample, but it was not detected in the corresponding tumor. We determined that if the alteration numbers of the tumor-specific T > C mutation of gene A in the matched plasma was at least 2-fold that of another non-tumor-specific plasma, the mutation was regarded as true. The ctDNA PPWES positive patients were defined as ctDNA fraction > 0.

ctDNA by UPTS and bioinformatics analysis
Whole genome libraries of ctDNA were enriched for the target regions using previously described mutation capsule technology. (7) We profiled the following target regions in ctDNA: (a) the coding regions of TP53, CTNNB1, AXIN1, and the TERT promoter region; and (b) HBV integrations. (10,11) The enriched and amplified libraries were sequenced on an Illumina HiSeqX Ten instrument with 150-bp paired-end sequencing to a median depth of 6140× for ctDNA after removing duplicate molecules. Sequencing reads were processed using self-developed program to extract tags and remove sequence adapters. 7 We removed residual adapters and low-quality reads with Trimmomatic (v0.36). Clean reads which were mapped to the human reference genome hg19 and HBV genome with BWA (v0.7.15). 10 SNV/indels in the targeted regions were called using samtools mpileup (v 0.1.1722). 12 The criteria adopted for defining mutations in ctDNA were an allele fraction ≥ 0.03%. HBV integrations were identified by Crest, and at least four soft-clip read supports were needed. 13 To ensure accuracy, reads with the same tags, start, and end coordinates were grouped into unique identifier families (UID families). UID families containing at least two reads, in which at least 80% of reads were of the same type were defined as effective unique identifier families (EUID families). Each mutation frequency was calculated by dividing the number of alternative EUID families by the sum of the alternative and reference frequencies. The mutations were manually reviewed using IGV. Candidate variations were annotated using the Ensembl Variant Effect Predictor (VEP). 13

Statistical analysis
Statistical analysis was conducted using R software (Version 3.6.1). The correlations between circulating tumor markers and clinical parameters were calculated and visualized using the "ggplot2", "Hmisc" and "corrplot" R packages. The heatmaps were visualized by "pheatmap". The "Survival" and "survminer" were used to perform univariate and multivariate analysis of the Cox regression model and for visualization.
Continuous parameters were dichotomized for RFS, which was generated by Kaplan-Meier method and compared by using Log-rank test. All statistical tests were two-tailed and were considered significant at P < 0.05.