Isolation and next generation sequencing of archival formalin‐fixed DNA

Abstract DNA from archived organs is presumed unsuitable for genomic studies because of excessive formalin‐fixation. As next generation sequencing (NGS) requires short DNA fragments, and Uracil‐N‐glycosylase (UNG) can be used to overcome deamination, there has been renewed interest in the possibility of genomic studies using these collections. We describe a novel method of DNA extraction capable of providing PCR amplicons of at least 400 bp length from such excessively formalin‐fixed human tissues. When compared with a leading commercial formalin‐fixed DNA extraction kit, our method produced greater yields of DNA and reduced sequence variations. Analysis of PCR products using bacterial sub‐cloning and Sanger sequencing from UNG‐treated DNA unexpectedly revealed increased sequence variations, compared with untreated samples. Finally, whole exome NGS was performed on a myocardial sample fixed in formalin for 2 years and compared with lymphocyte‐derived DNA (as a gold standard) from the same patient. Despite the reduction in the number and quality of reads in the formalin‐fixed DNA, we were able to show that bioinformatic processing by joint calling and variant quality score recalibration (VQSR) increased the sensitivity four‐fold to 56% and doubled specificity to 68% when compared with a standard hard‐filtering approach. Thus, high‐quality DNA can be extracted from excessively formalin‐fixed tissues and bioinformatic processing can optimise sensitivity and specificity of results. Sequencing of several sub‐cloned amplicons is an important methodological step in assessing DNA quality.

somatic mutations found in tumour biopsies are of prognostic importance and can be used to guide therapy in many cancers (Dancey et al., 2012). Unravelling the genetic basis of inherited disease is also reliant on DNA, and historical formalin-fixed (FF) or formalin-fixed paraffin wax-embedded (FFPE) samples may be the only source of DNA from affected but deceased family members. However, although only minimal fixation in formalin (preferably none) is suggested for tissues that might be used for genomic studies, in practice the samples may have been fixed for weeks or months before embedding in paraffin wax. Moreover, there are many archives of congenitally malformed human hearts (and other organs) used for educational and anatomical research purposes (Crucean et al., 2017).
As the majority of genes causing congenital heart malformation still remain to be discovered, it is possible that these accurately phenotyped hearts, permanently stored in formalin to preserve integrity of the specimen, could provide a valuable resource to identify gene variants and to correlate causative mutations with morphological appearances.
FF tissues are not generally considered suitable for genetic/ genomic analyses due to a range of corrupting effects directly attributed to formalin. One significant issue is the formation of methylene bridges between proteins and DNA (Metz et al., 2004).
However, these are potentially reversible (Hoffman et al., 2015) and in recently developed techniques such as chromatin immunoprecipitation (ChIP), formalin is specifically used to bond DNA fragments to relevant peptides and allow their isolation for sequencing (Nelson et al., 2006a). Formalin exposure also produces deamination of cytosine to uracil (Williams et al., 1999). Such deamination events occur during life, but Uracil-N-glycosylase (UNG) normally excises the deaminated base, allowing DNA repair mechanisms to replace the missing base with reference to the complementary strand (Do, 2012). UNG is commercially available and has been suggested as a treatment for DNA extracted from FF material (Do, 2012). Historically, the major limitation to sequencing FF material was due to extensive fragmentation of DNA (Suzuki et al., 1994), which prevents chain termination (Sanger) sequencing. However, next generation sequencing (NGS) technologies perform best with short-length template and shearing or transposon-mediated cleavage to produce efficiently fragments of 150-250 bp, produces paired-end reads of 75 bases in length (Head et al., 2014).
Following our recent morphological study of congenitally malformed hearts (Crucean et al., 2017) we were asked by clinicians and anatomists whether this extensive collection of fully morphologically characterised organs, routinely stored in formalin, could be used to discover causative mutations by exome sequencing. In the absence of studies dealing with genomic interrogation of heavily formalin-fixed samples, we addressed ourselves to this clinically important question. We first sought to isolate high quality genomic DNA that would be suitable for both NGS and Sanger sequencing. Although successful isolation of DNA can be assessed in terms of yield and overall fragment length, for our studies we were more concerned with the efficacy of the extracted DNA to act as a template within the PCR reaction and the accuracy of sequence obtained. As such, this study is the first to analyse systematically the distribution of errors in PCR amplicons through bacterial sub-cloning of transcripts. We also assessed the effect of enzymatic treatment with UNG and, surprisingly, found that a different range of artefacts are present in PCR amplicons after UNG is employed. Finally, we performed next generation exome sequencing on myocardial tissue that had been stored for 2 years in formalin and compared the results with those obtained from NGS of DNA freshly extracted from peripheral blood from the same patient. Although the high rates of false-positive results in FF-DNA preclude its use as a primary source of DNA for discovery of unknown causative genes, we suggest that it may be useful as a confirmatory DNA source.

| Tissue samples
A single piece of surgically discarded pulmonary artery that had been stored in buffered formalin at room temperature for over 8 months, was used to investigate optimal methods for DNA extraction.
The comparison of genomic sequence by NGS used DNA extracted from a small biopsy of right ventricular myocardium taken from a heart that had been stored in buffered formalin for 2 years (following transplant operation). DNA obtained from peripheral blood lymphocytes was available from the same individual. In both cases, material was obtained from the Department of Cellular Pathology, Royal Victoria Infirmary, Newcastle upon Tyne, and studies were carried out under the approval of the Newcastle and North Tyneside 1, NRES Committee; REC Reference: 15/NE/0311.

| Tissue processing and digestion
Pieces of the pulmonary artery sample were cut into approximately 1-mm 3 cubes, weighed and washed in glycine-Tris-EDTA (GTE) buffer (100 mM glycine, 10 mM Tris-HCL pH 8.0 and 1 mM EDTA) using a shaker water bath at 37°C with changes of GTE buffer as indicated in experimental results. Tissues were then digested in 3 ml lysis buffer (1 M Tris-HCl pH 8.0, 10% sodium dodecyl sulphate [SDS], 0.5 M EDTA) supplemented with different quantities of proteinase K (Sigma-Aldrich) as indicated. Optionally, tissue was treated with 2 U Uracil-N-glycosylase (UNG; New England Biolabs) at 50°C for 1 hr (Liu et al., 2013). Chelex resin (Bio-Rad) was added to make a final concentration of 10% and the resulting solution either heated to 100°C or autoclaved to 120°C and 980 mbar pressure for 10 min. After treatment with 40 µg ml -1 RNaseA (Sigma-Aldrich) for 5 min at room temperature, DNA was isolated by phenol-chloroform extraction, resuspended in 50 ml Tris-EDTA buffer and stored at −20°C (Green and Sambrook, 2012 ).
Extraction of DNA from right ventricular biopsy was performed according to this schedule, under the optimised conditions indicated in experimental results.
DNA was also extracted from the pulmonary artery sample using the GeneRead system (Cat. No. 180,134,Qiagen group,UK)

| Bacterial sub-cloning and Sanger sequencing
To facilitate Sanger sequencing, PCR products were inserted into the pGEMT-easy vector system II (Thermo Fisher Scientific, Inc.). Twenty colonies were sub-cloned and amplicons purified using QIAprep Spin Miniprep Kit (Cat. No. 27,104,Qiagen Group) according to the manufacturer's instructions. Plasmid insert DNA was subjected to Sanger sequencing using both forward SP6 (Cat. No. Q5011, Promega) and reverse T7 (Cat. No. Q5021, Promega) direction primers.

| Next generation exome sequencing
Libraries for NGS were produced from Illumina Nextera exome sequencing kits (Illumina). DNA obtained from formalin-fixed right ventricle was processed using TrueSeq Exome kit (Illumina, San Diego, CA, USA), and lymphocyte-derived DNA using TrueSeq Rapid Exome kit (Illumina) according to the manufacturer's instructions.
Quantification for NGS was carried out using the Genomic DNA ScreenTape Analysis kit.
Tape STaTion analysis was used (2200 Tape

| Analysis of exome sequencing
Bioinformatic analysis was carried out using GATK 3.4 workflow on a high-performance computing cluster (Newcastle University). The bioinformatic processing scripts are available at https://github.com/Andre wSkel ton/PID-WES-GATK3.4-SGE. Briefly, after reads were pooled and converted to BAM format, they were aligned to the Hg19 genomic scaffold using BWA-mem. Duplicate reads were identified and realignment considering known insertions and deletions was carried out using Picard Tools. Base quality score recalibration was performed, As default settings, both hard filtering and joint calling analyses use data provided at the time of sequencing to limit analysis based on the quality of bases in the individual reads. This metric is the Phred (Q) score. For example, a score of Q30 describes a base with an error probability of 1 in 1,000, i.e. one that is likely to be 99.9% accurate. We were therefore able to re-run analyses at lower stringency (Q20: including data likely to be 99.0% accurate; Q10: data likely to be 90% accurate).

| Statistics
All experiments were run in triplicate and data presented as an average and standard error (SEM). Statistical significance was evaluated using one-way ANOVA or unpaired t test (SPSS version 22, IBM) Only p < .05 was accepted as significant and correction for multiple testing was included.

| Combined autoclaving and Chelex resin improve DNA quality
Prior to carrying out NGS, we first sought to extract genomic DNA that would produce optimal performance as a template for PCR. A standard method to isolate DNA from tissues is by tissue digestion with proteinase K (PK), followed by heating to 100°C to inactivate the enzyme. After treatment with RNAse, genomic DNA can then be recovered by salt precipitation or with a resin column for downstream applications (Green and Sambrook, 2012 ). Heating to higher temperature has been suggested to improve DNA extraction (Idris, 2015) and we asked whether this, in combination with Chelex resin, might improve the quality of DNA extracted in comparison with previous methods (Legrand et al., 2001;Cao et al., 2003;Shi et al., 2004;Nelson et al., 2006b;Wang et al., 2013;Pandey et al., 2014;Idris, 2015). A single piece of human pulmonary artery stored in formalin for 8 months was used to evaluate different conditions for DNA extraction. Pieces were diced into 1-mm 3 cubes and washed extensively in glycine-containing wash buffer. Digestion with repeated additions of PK (Cao et al., 2003;Shi et al., 2004) was then performed.
In comparison to the basic extraction protocol, there was a fourfold increase in DNA yield when samples were either autoclaved at 120°C, or alternatively Chelex resin was employed ( Figure 1b).
Surprisingly, when Chelex was combined with autoclaving, the yield was not significantly greater than from heating to 100°C alone ( Figure 1b). In all cases, high purity DNA (Nanodrop 280/260 absorption) was obtained ( Figure 1c).
Gel electrophoresis and densitometry (Schindelin et al., 2012) was used to demonstrate the distribution of fragment lengths Having demonstrated the synergistic effect of Chelex resin and autoclaving on the quality of DNA obtained from FF tissue, we next asked whether it might be possible to reduce pre-washing and the amount of PK used in order to reduce the time required and cost of the procedure. Previous reports have suggested methods for extraction of genomic DNA from FF samples but, empirically, used extensive washing steps (Pandey et al., 2014) and large amounts of PK (Legrand et al., 2001;Pandey et al., 2014). To assess these requirements, we compared extensive washing and PK usage with protocols using less ( Figure 2a). Reduction of washing from 72 to 24 hr duration did not affect the yield or quality of DNA obtained. Nor did washing at an increased temperature of 55°C (Legrand et al., 2001), or halving the amount of PK used, alter the yield purity or fragment length distribution of the DNA isolated (Figure 2b-e). On the basis of these refinements, this final 'improved protocol' was used to isolate formalin-fixed DNA for the remaining studies ( Figure 3).

| UNG improves DNA yield but does not improve the fidelity of amplicon sequence
UNG can be used to overcome the deamination effects of formalin by excising mutated uracil bases from the DNA strand (Do, 2012).
Used in vitro, this means that corrupted DNA template is severed, favouring intact template. Commercial kits designed for extraction of DNA from formalin-fixed tissues, for example the GeneRead system (Qiagen) utilise UNG in association with a PK digestion and resin-column DNA purification. We therefore evaluated the effect of adding UNG to our improved protocol and also compared these results with those obtained with the Qiagen GeneRead system Formalin fixation leads to corruption of sequence as well as severing of DNA (Williams et al., 1999;Do, 2012). We evaluated the extent of this sequence corruption, comparing our initial protocol with the improved protocol, with and without the addition

| Comparison of next generation exome sequencing results from formalin-fixed tissue and lymphocyte-derived DNA
Finally, we asked how optimally extracted genomic DNA from formalin-fixed tissue would perform as template for next gen-  (Figure 6b). Closer LD-DNA FF-DNA a inspection indicated the SNPs were in keeping with formalin corruption effects, with G > A and C > T predominating (Figure 6c).

| Bioinformatic analysis impacts on sensitivity and specificity of variant calling
Once sequencing data have been pre-processed they are ready to be compared with the human genomic scaffold, using HaplotypeCaller to identify genomic variants. Further filtering is required to remove artefacts due to erroneous reads. There are several ways to do this within the GATK toolkit and we were able to compare the performance of these on FF-DNA, using LD-DNA results as a 'gold standard'. Ideally, if FF-DNA were as reliable as LD-DNA, there would be complete matching of findings from both samples and therefore 100% specificity (no false positive or spurious findings) and 100% sensitivity (no false negative or missed findings). The first analysis compared variants produced by single   We also took the opportunity to examine the effect of filtering based on the quality of original sequencing reads, indicated by the Phred Score (Q), which can be used to identify reads that have a high probability of exhibiting base calling errors. For example, a score of Q30 describes a base with an error probability of 1 in 1,000, or likely to be 99.9% accurate, whereas the Q score of 10 (Q10) equates to a likely error rate of 1 in 10, or 90% accuracy. Normally Q30 is used, as high-quality genomic DNA is unlikely to contain many errors. However, in our case we wondered whether accepting a lower threshold for errors might maintain transcripts with errors in different places (as might be expected as a response to formalin fixation), providing more depth of accurate base sequence overall.
Using a Phred score of 10, separate VQSR analysis of lymphocyte DNA produced 66,402 variants, whereas FF DNA produced 67,519 variants. The sensitivity and specificity were reduced to 55.8% and 58%, respectively. The increased stringency achieved by increasing Phred scores to Q20 or Q30 had no effect on the number of variants reported in LD-DNA but markedly reduced the number of variants reported from FF-DNA to 58,709 at Q20 and 54,898 at Q30. This did not change the capability of the VQSR correctly to identify variants (sensitivity remained at 55.8%-55.9%) but efficiently reduced the number of false variants and increased specificity from 58% to 63% and 67.4%, respectively. Taken together, these analyses indicate that group calling with the GATK pipeline is an effective method of filtering variants from FF-DNA and that there is additional benefit in using higher levels of stringency with FF-DNA.

| D ISCUSS I ON
In this study, we have optimised the extraction of DNA obtained from overly formalin-fixed human tissue and evaluated its performance as a template for chain termination (Sanger) sequencing and NGS. Proteinase K is a preferred reagent to digest tissue and liberate DNA for such molecular biology studies. However, the presence of residual formalin within tissue that might inactivate the enzyme has been a concern and this has led to the production of protocols with extended washing and repeated addition of fresh reagent (Legrand et al., 2001;Cao et al., 2003;Shi et al., 2004;Pandey et al., 2014;Idris, 2015). This increases the time needed to extract DNA and also makes the process expensive. Our studies have shown that extensive washing is not required and that halving the amount of Proteinase K used in published protocols does not affect the collection of high-quality DNA. It is likely that further reductions will be possible. Although our studies are focused on overly formalinfixed and sub-optimal tissue samples, it is likely our protocols will be of value in other less extremely corrupted formalin-fixed samples. Following digestion, the extraction of DNA was enhanced both by autoclaving and by addition of the Chelex resin. Autoclaving provides more energy to separate covalent bonds (Shi et al., 2004) but the mechanism of action for Chelex is unclear. Chelex may sequestrate metal ions and protect from DNAses (Walsh et al., 1991) or, alternatively, it may bind to and protect denatured single strand template DNA, shielding it from fragments of complementary DNA that would normally interfere with the PCR reaction (Dietrich et al., 2013). The reduction in yield but improvement in PCR performance when both were used together may reflect greater stripping of contaminating fragments to reveal cleaner, but less abundant highquality template. Thus, the efficacy of a DNA extraction protocol is reflected less by the yield or fragment length of DNA collected and more by the length of amplicons, which closely match the fragment length of the template genomic DNA. It is also possible that quantitation of DNA extracted from formalin-fixed tissues may be less accurately measured by processes such as spectrophotometry because of variable fragmentation.
Bacterial sub-cloning allowed us to identify clearly the range of transcript sequences and thus accurately determine the fidelity of sequence in the PCR template. Although the fragmentation of DNA produced by formalin exposure cannot be reversed, it has been suggested that the corruption of DNA by deamination to produce uracil residues can be overcome by treatment with UNG (Do, 2012 and is unlikely to be responsible for the observed sequence errors.
However, it is clear that the results obtained using our improved protocol of DNA extraction, with and without UNG, are very similar.
Thus until the biology of UNG is more fully understood, there is no advantage from its use.
NGS is now a relatively cheap and rapid method of reading nucleic acids by generating millions of short overlapping sequences and it should be ideally suited to fragmented DNA, as shearing of DNA into short fragments is a requirement of the process. However, the un-corrupted DNA template obtained from FF tissues is diluted by other template fragments of inherent poor quality, leading to a reduced number of successful reads and more duplicates. Filtering high-quality reads from corrupted reads is essential and highly dependent on bioinformatic processes. The joint calling approach using VQSR and an exome catalogue is extremely useful. In this study, we have shown that although sensitivity cannot be improved by joint calling and variant quality score recalibration, changing the stringency of read quality by increasing Phred score filtering to Q30 has a major effect on the specificity.
Whereas for population-based studies, some degree of error due to formalin corruption can be tolerated and differences between groups can still be shown to be statistically different, this is not appropriate for analysis of individuals. With maximal specificity of 67% and sensitivity of 56%, there is a major risk of missing a disease-causing variant or, conversely, suggesting one in error. Historically, formalin-fixed archival hearts have been used to investigate genes causing congenital heart malformations. For example, the Leipzig collection of malformed hearts comprises 292 formalin-fixed human hearts collected between 1954 and 1982 (Craatz et al., 2002) and has been used for genetic analysis. For example, analysis of hearts within the collection suggested that a mutation in HAND1 (A126fs) might cause hypoplastic left heart syndrome (Reamon-Buettner et al., 2008).
Although the authors took care to determine whether this was a true finding, in retrospect it seems likely that it was an artefact of formalin exposure and unlikely that the variant was responsible for the malformation in the archived hearts. Notably, their study suggested that many hearts with different forms of hypoplastic left heart syndrome had the same A126fs variant in HAND1, and their findings have not been confirmed by subsequent genomic studies (Esposito et al., 2011) or animal modelling of the variant y (Firulli et al., 2017).
Our study has shown that it is possible to extract DNA from formalin-fixed tissues for genomic studies involving Sanger sequencing and NGS. Importantly, careful bioinformatic processing can reduce false negative findings, but approximately half of all variants will not be detected due to damage to the genomic DNA by formalin. We would therefore recommend that formalin-fixed tissue can be used as a confirmatory source, for example when the presence in several family members has already suggested a specific variant, but in isolation, DNA from formalin-fixed tissues is not a reliable source of novel variants. Finally, we suggest the use of bacterial sub-cloning and Sanger sequencing of PCR amplicons to confirm variants of possible clinical relevance.