Characterisation of dysplastic liver nodules using low-pass DNA sequencing and detection of chromosome arm-level abnormalities in blood-derived cell-free DNA

High-grade dysplasia carries signi ﬁ cant risk of transformation to hepatocellular carcinoma (HCC). Despite this, at the current standard of care, all non-malignant hepatic nodules including high-grade dysplastic nodules are managed similarly. This is partly related to dif ﬁ culties in distinguishing high-risk pathology in the liver. We aimed to identify chromosome arm-level somatic copy number alterations (SCNAs) that characterise the transition of liver nodules along the cirrhosis – dysplasia – carcinoma axis. We validated our ﬁ ndings on an independent cohort using blood-derived cell-free DNA. A repository of non-cancer DNA sequences obtained from patients with HCC ( n = 389) was analysed to generate cut-off thresholds aiming to minimise false-positive SCNAs. Tissue samples representing stages from the multistep pro-cess of hepatocarcinogenesis ( n = 184) were subjected to low-pass whole genome sequencing. Chromosome arm-level SCNAs wereidenti ﬁ edinlivercirrhosis,dysplasticnodules,and HCC toassesstheirdiscriminativecapacity. Samples positive for 1q + or 8q + arm-level duplications were likely to be either HCC or high-grade dysplasticnodules as opposed to low-grade dysplastic nodules or cirrhotic tissue with an odds ratio (OR) of 35.5 (95% CI 11.5 – 110) and 16 (95% CI 6.4 – 40.2), respectively ( p < 0.0001). In an independent cohort of patients recruited from Nottingham, UK, at least two out of four alterations (1q + , 4q (cid:1) , 8p (cid:1) , and 8q + ) were detectable inblood-derived cell-free DNA of patients with HCC ( n = 22) but none of the control patients with liver cirrhosis ( n = 9). Arm-level SCNAs on 1q + or 8q + are associated with high-risk liver pathology. These can be detected using low-pass sequencing of cell-free DNA isolated from blood, which may be a future early cancer screening tool for patients with liver cirrhosis. No other con ﬂ icts of interest were declared.


Introduction
Early events in hepatocarcinogenesis involve the progression of dysplastic nodules (DNs) to overt hepatocellular carcinoma (HCC). Such a contention has been supported by the frequent appearance of nodule-innodule lesions containing both dysplasia and cancer [1]. In comparison to regenerative nodules, DNs have an increased risk of malignant transformation [2]. Genomic features common to hepatic dysplasia and carcinoma demonstrate a step-wise increase in chromosome arm-level somatic copy number alterations (SCNAs) [3,4], chromosomal instability [3,5], and TERT promoter mutations [6,7]. Among dysplastic nodules, a histological distinction can be drawn between those that are 'low-grade' DNs and those that are 'high-grade'. The latter are assumed to be the immediate precursors of overt HCC. The rate of HCC development was found to be significantly higher in high-grade DNs than in low-grade DNs and regenerative nodules [8].
Accurate histological diagnosis of cancer or nodules that have a high likelihood of evolving into cancer is required. Such diagnosis is currently based on morphology, supplemented with immunohistochemistry where available, but the diagnosis is subjective and remains challenging in routine clinical practise [9]. The difficulty in histological diagnosis is focused on differentiating low-grade DNs from high-grade DNs and high-grade DNs from histologically well-differentiated HCC and there is a need for additional diagnostic tests which could be applied to fixed tissue or blood samples. Previous studies with focus on the prevalence of arm-level SCNAs identified 1q+ and 8q+ as common events in HCC [10], significantly less common in DNs [3,4], and rare events in cirrhosis [11]. However, their discriminatory role has not been investigated previously. We aimed, therefore, to identify the utility of arm-level SCNAs as an aid for stratifying high-and low-risk pathology along the cirrhosis-dysplasia-carcinoma axis using low-pass sequencing.
High-quality deep sequencing analysis of the HCC genome has been conducted in various contexts and has led to thorough characterisation of its main genomic features [10,[12][13][14][15]. However, costly and timeconsuming deep sequencing is not required for detection of arm-level SCNAs. For several decades, conventional karyotyping and chromosomal microarrays were standard [16]. More recently, low-pass next-generation sequencing (NGS) was validated with as little as 5 ng of DNA needed to generate a copy number karyogram [17][18][19] and reports of higher accuracy with an average coverage of 0.25x and mosaic levels as low as 20% [20].
Cell-free DNA refers to non-encapsulated DNA circulating freely in the blood stream. In cancer patients, a variable proportion of cell-free DNA originates from tumour cells. HCC is known to possess relatively high amplitude gains in copy number on 1q and 8q in comparison to other parts of the genome [10,11]. Therefore, both chromosomal arms offer natural signal amplification, boosting detectability within the plasma pool of cell-free DNA. Therefore, we tested the applicability of our study findings on a blood-derived cell-free DNA cohort.

Study design
The Cancer Genome Atlas (TCGA; https://portal.gdc. cancer.gov/) is a public repository which includes high-quality genomic sequencing data from cancer tissue as well as matched non-cancer tissue of patients with cancer. We first downloaded all the 'cancer' and 'noncancer' copy number segment files of patients with liver hepatocellular carcinoma (TCGA-LIHC) which were used as the derivation cohort ( Figure 1) [10]. We used Figure 1. Description of study cohorts. (A) Derivation/TCGA cohort: TCGA-LIHC non-cancer sub-cohort (n = 389) was used to generate thresholds above or below which copy number gains or losses were called in test samples, respectively. TCGA-LIHC cancer sub-cohort (n = 379) was used to identify key arm-level SCNAs in HCC. The thresholds were validated for false positives against 7457 patients constituting the TCGA threshold validation sub-cohort, which is non-cancer tissue DNA of patients diagnosed with 26 different types of cancer but not HCC. (B) Validation cohort 1 (formalin-fixed, paraffin-embedded) was used to validate the ability of key arm-level SCNAs identified from the derivation cohort in classifying low-and high-risk liver pathology. (C) Validation cohort 2 (blood-derived cell-free DNA). The total number of samples per cohort and the main clinical features of samples in each cohort are highlighted. HGDN, high-grade dysplastic nodules; LGDN, low-grade dysplastic nodules; MDHCC, moderately differentiated HCC; PDHCC, poorly differentiated HCC; WDHCC, well-differentiated HCC.
the TCGA-LIHC non-cancer sub-cohort to generate thresholds above or below which copy number gains or losses were called in cancer samples, respectively. Next, we identified key arm-level SCNAs in the TCGA-LIHC cancer sub-cohort. Using the first validation cohort, we analysed the ability of key arm-level SCNAs to discriminate between low-and high-risk pathology. Lastly, we used the second validation cohort to test the principle that low-pass sequencing of cell-free DNA from blood samples offers a future early cancer screening tool for patients with liver cirrhosis using liquid biopsies. • Validation cohort 1formalin-fixed, paraffin-embedded (FFPE) tissue from explanted liver specimens of 59 patients with HCC and/or dysplastic nodules complicating cirrhosis due to hepatitis C (supplementary material, Table S1 for clinico-pathological characteristics and Table S2 for histopathology). Patients were recruited between August 1999 and April 2013. FFPE tissue was processed as detailed in Supplementary materials and methods. All H&E slides were reviewed by a single experienced liver histopathologist (JIW), who identified clear examples to represent the range of hepatocellular lesions, outlined the lesions on the glass slides, and recorded their differentiation and morphological pattern according to the World Health Organisation (WHO) classification [21]; the annotated H&E slides were then scanned using an automated scanning system with a Â20 objective to produce digital images and uploaded to a digital pathology server: https://bit.ly/31QugTa (supplementary material, Table S2) [22]. DNA was extracted from the nodules and from cirrhotic tissue geographically distant to the nodules. HCC nodules (n = 106, including WD-HCC n = 44) and DNs (n = 46, including high-grade DNs, n = 28, and low-grade DNs, n = 18) were retrieved from 59 patient livers. • Validation cohort 2 -Blood-derived cell-free DNA was obtained from 31 patients (HCC, n = 22, and cirrhosis without HCC, n = 9) from Nottingham, UK with various aetiologies (Figure 1 and supplementary material, Table S1 for clinico-pathological characteristics). Patients were recruited between September 2016 and September 2017. HCC was diagnosed radiologically according to European Association for the Study of the Liver (EASL) criteria [23].

Sample processing
Detailed DNA extraction, library preparation, quality control, sequencing, and bioinformatics methodology can be found in Supplementary materials and methods and Tables S3 and S4. In brief, for validation cohort 1, test DNA was extracted from FFPE dysplastic or malignant tissue. Control DNA was extracted from the patient's own cirrhotic tissue geographically distant to the nodule. Test and control DNA were characterised in comparison to a reference pool of normal DNA downloaded from the 1000 Genomes Project (see Supplementary materials and methods) [24].
For validation cohort 2, test cell-free DNA was extracted from the plasma of patients diagnosed with HCC. Control cell-free DNA was extracted from the plasma of patients diagnosed with liver cirrhosis and found to be free of cancer on imaging for 6 months after sample collection. Test and control samples were characterised in comparison to the patient's own buffy coat genome.
The average coverage per genome was calculated by multiplying the number of aligned reads by the read length in base pairs, or double the read length if pairedend, and dividing the result by 3 giga base pairs (supplementary material, Table S3). We identified the predicted proportion of tumour DNA within the eluted pool of extracted DNA using ABSOLUTE (https://software. broadinstitute.org/cancer/cga/absolute; supplementary material, Table S5) [25]. The proportion of tumour DNA within TCGA data was directly downloaded from Genomics Data Commons (https://gdc.cancer.gov/ about-data/publications/pancanatlas) and can be found in supplementary material, Table S6.
Plasma cell-free DNA sequencing An 8.5-ml blood sample was obtained from each patient and centrifuged within 2 h. The plasma and buffy coat portions were extracted into different cryovials. Both were re-centrifuged followed by re-extraction and storage at À80 C until further analysis. DNA concentration was measured fluorometrically (PicoGreen ® ; Thermo Fisher Scientific, Waltham, MA, USA) and fragmentation assessed using an Agilent 2200 TapeStation (Agilent Technologies, Santa Clara, CA, USA). DNA libraries were prepared using tagged primers as detailed in Supplementary materials and methods and then labelled using unique 6-bp tags to enable multiplexing of libraries. Equal quantities of DNA libraries that passed quality control were pooled for cluster amplification and multiplexed on the same sequencing lane. Two DNA library pools were prepared: a cell-free DNA sample pool and a reference sample pool. The cell-free DNA sample pool included cell-free DNA samples from the HCC (test) group and the liver cirrhosis (control) group.
Genomic characterisation of dysplastic liver nodules 3 Ten nanograms of DNA library per sample was included in the cell-free DNA pool and sequenced (Illumina HiSeq3000; Illumina, San Diego, CA, USA) using paired-end sequencing with a read length of 151 and average coverage of 3-4x (IQR 2.8x-3.8x) per sample. A description of the reference sample pool can be found in Supplementary materials and methods. FastQ files were output by the sequencer, two files per sample, a file for each read. File integrity was verified using MD5checksum and quality controlled using FastQC (Babraham Institute, UK, https://www. bioinformatics.babraham.ac.uk/projects/fastqc/). Adaptor sequences were trimmed using Cutadapt [26]. Nucleotide sequences were aligned against the human genome assembly 19 (https://www.ncbi.nlm.nih.gov/assembly/ GCF_000001405.13/) using Burrows-Wheeler aligner [27]. Sequences with poor mapping qualities less than 37 were not used. The number of aligned reads can be found in supplementary material, Table S3. The aligned reference read lengths were trimmed to match aligned test read lengths. Each genome was divided into 100-kbp non-overlapping windows. The ratio of test to reference number of reads per window was normalised according to the most abundant ratio using CNAnorm [28] (supplementary material, Figure S1), where the most abundant ratio was considered as ratio 1 and the other normalised ratios were calculated accordingly. GC correction was performed using CNAnorm and breakpoints were called using DNAcopy [29].

Detection and definition of arm-level somatic copy number alterations
For the derivation cohort, copy number segment files were downloaded from TCGA, https://gdc.cancer.gov/, including the TCGA-LIHC cancer sub-cohort (n = 379) and the TCGA-LIHC non-cancer sub-cohort (n = 305 blood-derived and 84 solid tissue).
For the validation cohorts, copy number segment files were generated using CNAnorm [28]. Detailed bioinformatics may be found in Supplementary materials and methods. Centromeric locations were obtained from the University of California Santa Cruz (UCSC) genome browser and the mean value of normalised test to reference ratio for each autosomal arm was calculated (supplementary material, Table S7).
Generation of thresholds using the TCGA-LIHC non-cancer sub-cohort (n = 389) Two thresholds were generated for each autosomal arm to mark gains and losses. The thresholds were set at the fifth and 95th centile values for losses and gains, respectively (supplementary material, Table S8). Therefore, less than 5% of the TCGA-LIHC non-cancer sub-cohort exceeded the threshold for a gain or had values inferior to the threshold for a loss.
Validation of thresholds using the TCGA threshold validation sub-cohort (n = 7457) The TCGA threshold validation sub-cohort comprised non-cancer DNA sequences obtained from all TCGA patients who had 26 different types of cancer (supplementary material, Table S9). To investigate the potential for false-positive results, the thresholds were tested against copy number segment files of the TCGA threshold validation sub-cohort (n = 7457).

Proportion of TCGA-HCC samples passing the threshold
The TCGA-LIHC cancer sub-cohort was used to identify the proportion of tumour samples passing the threshold for each autosomal arm. Autosomal arms where the proportion of tumour samples passing thresholds was higher than 75% were identified. Figure 2 shows two examples of autosomal arms where the proportions of TCGA-HCC samples passing the threshold were higher than 75%.

Statistical analyses
Cumulative frequency incorporates the frequency and amplitude of autosomal arm alterations within a group of lesions. This was used to display whole genome karyotype for groups within both validation cohorts (supplementary material, Figure S2).
To assess the capacity of key SCNAs identified using the TCGA/derivation cohort in discriminating low-and high-risk pathology, we analysed their prevalence within each group and calculated the odds ratio (OR) with 95% confidence interval (CI) according to Altman [30].

Comparison of arm-level SCNAs in low-and high-grade dysplastic nodules
To identify the potential for arm-level SCNAs in aiding the discrimination of high-grade DNs from low-grade DNs, the prevalence of key arm-level SCNAs within both groups was examined. The likelihood of a dysplastic nodule being high grade (n = 28) as opposed to low grade (n = 18) in the presence or absence of 1q+, 4qÀ, 8pÀ, 8q+, and 17pÀ was measured using the OR. Development of high-grade dysplasia was associated with 1q+ OR = 8 (95% CI 1.5-41.5) and 8q+ OR = 5.8 (95% CI 1.4-24.5), as shown in Figure 3.

Comparison of 1q+ and 8q+ in low-and high-risk pathology
We defined liver cirrhosis and low-grade dysplasia as 'low-risk pathology', while high-grade dysplasia and cancer were defined as 'high-risk pathology'. As 1q+ and 8q+ were detected in high-grade DNs significantly more than in low-grade DNs, we aimed to explore their utility for discriminating between 'low-risk' versus  Genomic characterisation of dysplastic liver nodules 5 'high-risk' pathology. We therefore examined the prevalence of 1q+ and 8q+ within both groups. The odds of pathology being 'high risk' (n = 134) as opposed to 'low risk' (n = 50) in the presence or absence of 1q+ or 8q+ was measured using the OR. Development of 'high-risk' pathology was associated with 1q+ OR = 35.5 (95% CI 11.5-110) and 8q+ OR = 16 (95% CI 6.4-40.2).
Validation of thresholds using the TCGA threshold validation sub-cohort (n = 7457) To investigate the potential for false-positive results, the thresholds were tested against non-cancer DNA sequencing obtained from all TCGA patients who had 26 different types of cancer (n = 7457) but not liver cancer. In this cohort, the thresholds were crossed in 445 (6%) of 1q, 404 (5.4%) of 8q, 396 (5.3%) of 4q, and 226 (3%) of 8p. None of the cases concurrently crossed more than one of the four thresholds.
Arm-level SCNAs associated with early features of malignancy are detectable using blood-derived cellfree DNA Cell-free DNA from the plasma and reference DNA from the buffy coat of patients with HCC (test group, n = 22) was compared with that from patients with liver cirrhosis (control group, n = 9). The test group included patients within Milan criteria (n = 7) and patients with AFP < 20 ng/ml (n = 9) (Figure 4). None of the patients recruited to the control group developed HCC after a median follow-up of 22.4 months.
Supplementary material, Table S10 shows the sensitivity and specificity for each of the key arm-level SCNAs. At least two or three of 1q+, 4qÀ, 8pÀ, and 8q+ were present in 22/22 and 16/22 of the patients with HCC, respectively ( Figure 4). All patients in the control group were negative for 1q+, 8pÀ, and 8q+, while four patients were positive for 4qÀ. None of the control group patients had more than one of the four arm-level SCNAs. Such patterns were in close resemblance to those observed in the TCGA threshold validation subcohort and validation cohort 1 (Table 1 and Figure 5).

Discussion
The European Association for the Study of the Liver (EASL) recommends the development of tools to stratify patients at high, intermediate, and low risk for HCC [23]. In this study, we found that among five key arm-level SCNAs in HCC, 1q+ and 8q+ were significantly associated with either early cancer or high-grade dysplasia, whereas such events were rare in low-grade dysplasia or cirrhosis. We demonstrated that the alterations can be detected using low-pass sequencing in blood-derived cell-free DNA of patients with HCC but not cirrhotic patients with no known HCC.
The detection on blood-derived cell-free DNA is likely related to two factors. Firstly, the high prevalence of arm-level SCNAs in HCC, TCGA data has revealed that arm-level SCNAs are more prevalent than SCNAs at gene level and more prevalent than common mutations such as in TERT and CTNBB1 [10,31]. Secondly, the natural signal amplification at duplication hotspots such as 1q and 8q within a pool of cell-free DNA makes the signal more detectable despite the low proportion of circulating-tumour DNA. Cell-free DNA was directly compared against buffy coat DNA from the same patient as a reference, thus focusing on the cancer-related alterations. Previous studies found variable concordance between matching pairs of tumour and liquid biopsies, as recently reviewed [32,33]. This was not the focus of our study, which was on testing the utility of detecting specific chromosomal arm-level SCNAs using low-pass sequencing of cell-free DNA from patients with HCC. The prevalence of such SCNAs in cell-free DNA may be different to the tumour or the background liver tissue. Similarly, area under receiver operating curve analysis was not performed; the study was designed as a test of feasibility and not designed for in-depth analysis, which is the focus of currently ongoing work designed using PRoBE (prospective specimen collection, retrospective blind evaluation) [34].
There is variability in the literature regarding the optimal method of identifying a true copy number event [4,11,35]. The conventional z-score analysis is known

6
W Fateen et al to be affected by the depth of sequencing, due to its reliance on the standard deviation of the sequenced read density from the reference group [11,[36][37][38]. We have generated fixed thresholds using TCGA sequencing data of matched non-cancer tissue obtained from patients diagnosed with HCC. The thresholds were validated on three independent datasets (TCGA dataset, an FFPE cohort, and a cell-free DNA cohort). More than 94% of 1q or 8q ratios obtained from non-cancer tissue (n = 7457) were within a tight range between 0.997 and 1.013, and with similar patterns observed across two independent datasets of patients with liver cirrhosis and no cancer sequenced in-house (Table 1 and Figure 5). This indicates a false-positive rate of less than 6% but much lower if both thresholds for 1q and 8q are crossed as this was not observed in any of the DNA  Genomic characterisation of dysplastic liver nodules 7 tested across three independent control cohorts. On the other hand, either 1q or 8q threshold was crossed in 104/106 HCCs in validation cohort 1 and in 20/22 HCCs in validation cohort 2, while both 1q and 8q thresholds were crossed in 78/106 HCCs in validation cohort 1 and in 11/22 HCCs in validation cohort 2, which indicates a potentially satisfactory negative predictive value (supplementary material, Tables S7 and S8). The thresholds are easily reproducible, unaffected by the depth of sequencing, and external validation highlights the potential for promising accuracy (supplementary material, Table S9). A landmark study from Hong Kong investigating patients with hepatitis B virus found clear evidence of SCNAs in plasma DNA of 84.4% of patients who had HCC and 22.2% of patients with cirrhosis [11]. Our study showed similar results (Table 1) for patients with diverse background aetiologies and using low-pass sequencing. Our study agrees with deep sequencing studies and SNP arrays reporting the common copy number features in HCC as shown in supplementary material, Figure S2 [ [10][11][12]14,39]. Two earlier studies investigating gene transcriptional profiles have proposed molecular markers targeted at discriminating stages of hepatocarcinogenesis. This included dysplastic nodules and early HCC. However, in both studies, most of the dysplastic nodules included were low-grade DNs (n = 10/16). Therefore, it is not clear if the same signatures would discriminate between high-grade DNs and early HCC [40,41]. The rate of HCC development is significantly higher in high-grade DNs than in low-grade DNs and regenerative nodules [8]. Moreover, an international consensus panel did not find difficulty in differentiating between low-grade DNs and early HCC. The diagnostic discrepancy arose in the discrimination of low-grade DNs from high-grade DNs and of high-grade DNs from well-differentiated HCC [1]. Our study included a considerable number of high-grade DNs (n = 28) aiming to address this issue. Previous studies identified the increase in chromosomal instability from DNs to HCC [3,5]. More recently, Torrecilla et al [4] reported copy number data on low-grade DNs (n = 14), high-grade DNs (n = 15), and small HCC (n = 17) and identified 1q+, 8q+, and 8qÀ as potential 'gate-keeper' events, due to their prevalence in dysplastic nodules. Our work agreed, characterising 1q+, 8pÀ, and 8q+, as well as 4qÀ and 17pÀ, as early events in hepatocarcinogenesis.
EASL clinical practice guidelines currently recommend biopsy for liver nodules larger than 1 cm that do not show typical HCC features on at least one out of two imaging modalities [23]. The American Association for the Study of Liver Diseases (AASLD) endorses the Liver Imaging Reporting And Data System (LI-RADS) and recommends biopsy for lesions classed as 'probably HCC (LR-4)' or 'malignant but not HCC (LR-M)' [42]. Histopathologists face inherent difficulties in discriminating between high-grade DNs and well-differentiated HCC [43,44]. EASL recommends the use of a panel of three immunohistochemistry (IHC) antibodies to aid the histological discrimination of high-grade DNs from well-differentiated HCC [23]. However, a recent collaboration of Eastern and Western expert pathologists validating the EASL recommended panel found a sensitivity of 52% and recommended two further IHC antibodies to raise the sensitivity to 93%, suggesting that this may be difficult to apply in routine clinical practice [9]. Our study showed that 1q+ and 8q+ were significantly more prevalent in high-grade DNs than in low-grade DNs. 1q +, 4qÀ, 8pÀ, and 8q+ were significantly more prevalent in well-differentiated HCC than in high-grade DNs. Our study suggests that identification of arm-level SCNAs (1q+, 4qÀ, 8pÀ, and 8q+) has the potential to improve the current inter-observer agreement on tissue histopathological distinction between high-grade DNs and well-differentiated HCC [1,9].
Our study made an attempt to characterise low-grade DNs, high-grade DNs, and well-differentiated HCC according to their broad chromosomal structure. The detection of such chromosomal features is technically simple and applicable in day-to-day clinical practise. For instance, cytogenetic analysis of BCR/ABL1 translocation, HER2 amplification, and ALK rearrangement is the current standard of care for guiding the management of chronic myeloid leukaemia [45], breast cancer [46], and lung adenocarcinoma [47], respectively. Unlike pre-malignant lesions in other cancers such as Barrett's oesophagus [48] and colonic polyps [49], high-grade dysplasia in HCC currently is treated similarly to any other 'low-risk' lesion, such as regenerative nodules. This could be related partially to the difficulty in identifying and discriminating different stages of premalignant hepatic nodules. With rising indications for targeted liver biopsy, and known histopathological challenges even amongst experts, incorporating cytogenetic examinations for 1q+ and 8q+ may provide more objective discrimination even for non-expert pathologists. Moreover, liquid biopsy is a future non-invasive tool as such features can be detectable using low-pass sequencing (3-4x coverage). This is significantly less costly and less time-consuming in comparison to deep sequencing, which is not required for the purpose of detection of key arm-level SCNAs. Low-pass sequencing is still likely to be superior to alternative techniques as it enables delicate discrimination of ratios between 0.99 and 1.01 using small quantities of starting DNA.
Our study had some limitations; most dysplastic nodules (n = 42/46) were extracted from livers that harboured HCC as well, and the results may be different in livers harbouring DNs without HCC. Conversely, such livers may or may not ever develop cancer and further longitudinal studies on the natural history of DNs are required to identify the most appropriate DNs for study of genomic predictors of malignant transformation. Torrecilla et al recently found significantly lower prevalence of arm-level SCNAs in DNs that were retrieved from cirrhotic livers that did not have cancer. The lower prevalence reported by Torrecilla et al may be related to the biology of such DNs or higher thresholds for calling SCNAs [4]. Moreover, a recent study 8 W Fateen et al based on phylogenetic analysis of single nucleotide variants and copy number profiles found evidence of independent growth of DNs and HCC within the same patient liver [50]. The majority, but not all (n = 34/44), of well-differentiated HCCs in validation cohort 1 were small (i.e. ≤2 cm; IQR = 11-21 mm). Our study was designed to discriminate histopathological features rather than nodule size. Our patient group in validation cohort 2 included generally more advanced cancers (only seven out of 22 were within Milan criteria, two of whom had liver transplantation and five were considered for transplants). Further work is required to determine whether early HCC which has less vascular invasion, or indeed dysplastic nodules, also releases cell-free DNA with detectable arm-level SCNAs into the peripheral circulation. Lastly, validation cohort 1 was all related to HCV, with a question about the generalisability, but the findings had no obvious correlation with aetiology in the cell-free DNA dataset. In conclusion, 1q+ or 8q+ is associated with highrisk liver pathology, e.g. cancer or high-grade dysplasia, but not low-risk pathology, e.g. low-grade dysplasia and liver cirrhosis. Detection of 1q+, 4qÀ, 8pÀ, and 8q+ in the tissue may aid in distinguishing types of liver nodules and in subsequent decision making. Arm-level SCNAs can be detected in blood-derived cell-free DNA using low-pass sequencing, which may be useful as a tool for the surveillance, diagnosis, and monitoring of HCC in patients with cirrhosis.

SUPPLEMENTARY MATERIAL ONLINE
Supplementary materials and methods Figure S1. Karyogram with associated density plot for sample representing normalised, smoothed, and segmented ratios Figure S2. Cumulative frequency of genome-wide arm-level SCNAs in HCC tissue and cell-free DNA as well as dysplastic nodules Figure S3. Tagged adaptors (referred to in Supplementary materials and methods) Figure S4. Forward and reverse adaptors ligated to DNA inserts (referred to in Supplementary materials and methods) Figure S5. Universal primer annealing with one of the two attachment sites to the flow cell (P5) (referred to in Supplementary materials and methods) Figure S6. Index primer annealing to the enrichment PCR reaction mix; it adds the tag as well as the second flow-cell attachment site (P7) (referred to in Supplementary materials and methods) Figure S7. Full DNA library design (referred to in Supplementary materials and methods) Figure S8. The distribution of number of reads in test and reference genomes (referred to in Supplementary materials and methods) Figure S9. Marked reduction in noise levels after trimming off poorly mapping windows (referred to in Supplementary materials and methods) Table S1. Patient demographics Table S2. Scanned slide identifiers, histopathology description, and source of tissue Table S3. Sequencing. DNA quantity, DNA library quantity, quantity of library added to the pool, size selection, PCR enrichment, type of tag, paired sequencing, read length, sequencer model, total reads, and aligned reads Table S4. Primer tags. Full primer sequences (mentioned only in Supplementary materials and methods) Table S5. Tumour content of study cohorts. Predicted proportion of tumour DNA within the eluted pool of extracted DNA using ABSOLUTE Table S6. Tumour content of the TCGA cohort. The proportion of tumour DNA within the eluted pool of DNA. Downloaded from Genomics Data Commons (https://gdc.cancer.gov/about-data/publications/pancanatlas) Table S7. Test to reference ratio. The mean value of normalised and GC-corrected test to reference ratio for each autosomal arm Table S8. Thresholds. A table outlining the 5th and 95th percentile thresholds for each chromosomal arm, below and above which losses and gains were called, respectively Table S9. TCGA threshold validation sub-cohort Table S10. Validation of frequent arm-level SCNAs on the blood-derived cell-free DNA cohort Table S11. Data frame outlining normalised, smoothed, and segmented data within each window (referred to in Supplementary materials and methods) Genomic characterisation of dysplastic liver nodules 11