In this review, the use of automated DNA sequencing techniques to determine the sequence specificity of compounds that interact with DNA is discussed. The sequence specificity of a DNA-damaging agent is an essential element in determining the cellular mechanism of action of a drug. A number of DNA-damaging compounds are mutagenic, carcinogenic, as well as being widely used as cancer chemotherapeutic agents. The distribution of lesions in a sequence of DNA can give vital clues in the determination of the precise mechanism of interaction of the agent with DNA. The DNA sequence specificity of a number of DNA-damaging agents has been delineated using automated DNA sequencing technology, and these studies are discussed in this review. The current state-of-the-art methodology involves capillary electrophoresis with laser-induced fluorescence detection usually on an Applied Biosystems ABI 3730 capillary sequencer. This current technique has higher resolution, greater sensitivity, higher precision, more rapid separation times, is safer and easier to perform than previous methods. The two main methods to determine the DNA sequence selectivity of compounds that interact with DNA are described: end labelling and the polymerase stop assay. The interaction of the antitumour drug, bleomycin, with DNA is utilized to illustrate the recent technological advances.
capillary electrophoresis with laser-induced fluorescence
The determination of the DNA sequence specificity of a DNA-damaging agent is crucial in understanding the cellular mechanism of action of a drug. A large number of DNA-damaging agents have been shown to have mutagenic and carcinogenic properties, and several are in widespread clinical use as antitumour drugs. The sequence selectivity of a DNA-damaging agent can be an essential component in elucidating the exact mechanism of interaction of the agent with DNA. The DNA sequence specificity of a number of mutagenic and carcinogenic compounds has been determined including UV light, DMS and other alkylating agents (1). Similarly, the DNA sequence specificity of several DNA-damaging agents that are clinically used as antitumour drugs has been determined including cisplatin, bleomycin and chlorambucil (1). These studies have produced important information on the mode of action of these drugs as mutagenic, carcinogenic and cancer chemotherapeutic agents.
There are two main methods to determine the DNA sequence selectivity of compounds that interact with DNA: end labelling (Figure 1A) and the polymerase stop assay (linear amplification) (Figure 1B). The end-labelling technique can detect DNA damage caused by agents that result in strand breakage (or that can easily lead to strand breakage). The polymerase stop assay can detect DNA damage that does not cause strand breakage but can hinder the passage of DNA polymerase (or other enzyme).
With the end-labelling technique, a sequence of DNA is modified at one end to incorporate an easily detectable label, for example a radioactive atom or a fluorescent molecule (Figure 1A). The usual fluorescent label is 6-FAM (6-carboxyfluorescein). The usual radioactive label is 32P that can either be added with gamma [γ-32P]-ATP and polynucleotide kinase or [α-32P]-dNTP and DNA polymerase. It is important that only one end of the DNA molecule is labelled and it is straightforward to achieve this at the 5′-end via PCR and a single 5′-labelled oligonucleotide. However, it is more difficult to label only one 3′-end and restriction enzyme cleavage, and gel purification is usually required to achieve this. There is a PCR technique that can avoid this latter problem with the correct choice of primers (2,3).
With the polymerase stop assay, a labelled oligonucleotide is extended by DNA polymerase until polymerization terminates at the site of adduct formation (Figure 1B). The technique depends on the efficiency of DNA polymerase termination by the adduct. Cisplatin adducts, for example, can prevent the passage of DNA polymerase at a frequency of >90% (4). A further enhancement of the technique uses thermal cycling to linearly amplify the extension products to achieve a greater signal (5–7). Other enzymes can be used apart from DNA polymerases, such as RNA polymerase (8), λ exonuclease (9) and exonuclease III (10). The linear amplification technique can be utilized to examine the DNA sequence specificity in intact mammalian cells (11,12).
With both techniques, a means of determining the precise site of DNA damage is required. This is usually provided by DNA sequencing reactions: Maxam–Gilbert chemical sequencing for end labelling (Figure 2) and Sanger dideoxy sequencing for the polymerase stop assay. In the latter case, the same primer oligonucleotide is used for both DNA sequencing and the polymerase stop assay.
Figure 2 illustrates the information that can be obtained from a DNA sequence specificity experiment. Following on from Figure 1A, the bleomycin-cleaved DNA fragments were electrophoresed on a capillary gel, and the precise sizes and intensities of each fragment were established (Figure 2). With reference to the Maxam–Gilbert G+A sequencing reactions, the exact position of bleomycin DNA damage can be elucidated (Figure 2B). From this data, the DNA sequence specificity of bleomycin was determined to be 5′-GT-3′ and GC with lesser cleavage at GA, GG and AT. This type of DNA sequence specificity information provides precise data concerning drug–DNA interactions and also enables a more detailed understanding of the mechanism of action of bleomycin to be deduced.
Another technique that can be used with an end-labelled sequence of DNA is a ‘footprinting’ or protection assay (13,14). ‘Footprinting’ is utilized to determine the sequence selectivity of compounds that do not form covalent adducts with DNA, for example protein transcription factors, intercalating agents and minor groove binders. With ‘footprinting’, a DNA-damaging agent, such as DNase I or DMS, is utilized to evenly damage DNA along an end-labelled sequence. When a protein or other agent is bound to DNA, it will protect the part of the DNA sequence where it is bound from the DNA-damaging agent. Hence, the DNA-binding site will be revealed by a diminution of DNA damage in that area, that is a ‘footprint’. ‘Footprinting’ suffers from the drawback that a site must be saturated to obtain a good signal and minor sites may go unnoticed.
A further approach with an end-labelled sequence of DNA is enzymatic detection of lesions. For example, endonuclease cleavage can be used to identify pyrimidine dimers induced by UV irradiation (15).
The Investigation of DNA-Damaging Agents Using Automated DNA Sequencers
In 1986, the advent of commercial automated DNA sequencers was a major milestone in DNA sequencing technology (16,17) that eventually led to the human genome project. The automated sequencers were also widely used in genetic polymorphism studies where deletions or insertions could be easily determined by the base pair resolution power of the automated machines. However, the automated sequencing technology could also be adapted to other less obvious uses, and this is the subject of this review: the determination of the DNA sequence specificity of compounds that interact with DNA, utilizing automated DNA sequencers.
With the introduction of automated DNA sequencers, the main change was the substitution of a fluorescent label for the previously used radioactive labelling. This had obvious safety benefits as well as greater sensitivity and precision, with the introduction of laser-induced fluorescence detection replacing film-based autoradiography (Table 1). It also allowed four different fluorescent ‘colours’, representing the four DNA bases, to be run in the same lane. A subsequent major development was the introduction of capillary electrophoresis instead of slab gel electrophoresis that resulted in further increased sensitivity and precision (18–20). The type of polymer used in the capillaries was a problem until Applied Biosystems developed the POP7 polymer matrix that minimized DNA–capillary interactions (21).
Table 1. The advantages of CE-LIF with an automated DNA sequencer compared with previous manual technology using radioactive labelling and slab gel electrophoresis
Before the advent of the automated DNA sequencer
Current usage of CE-LIF with an automated DNA sequencer
CE-LIE, capillary electrophoresis with laser-induced fluorescence.
Radioactive labelling (usually 32P) has safety concerns
Fluorescent labelling has no safety implications
Quantification of band intensity difficult.
GeneMapper software makes quantification of peaks straightforward.
Slab gel electrophoresis
Lower resolution (nearest bp)
Higher resolution (0.1 bp)
Shorter DNA sequences can be analysed (100–200 bps)
Longer DNA sequences can be analysed (500–800 bps)
The first example of the use of an automated DNA sequencer to determine the DNA sequence specificity of a DNA-damaging agent was in 1991 (22). They used an Applied Biosystems 370A automated DNA sequencer with polyacrylamide slab gels to investigate the DNA sequence selectivity of dimethyl sulphate damage in a fluorescently end-labelled PCR product. The N7 methylated sites on the DNA were converted to strand breaks by treatment with piperidine. They stated at the end of their article ‘The full potential of this technique becomes even more obvious when the large number of drugs that interact in a sequence-specific fashion with DNA are considered. In this case, this technique permits the analysis of DNA–drug interactions at unusually low drug concentrations. Moreover, minor binding sites can also be revealed. In addition, the rapid analysis of a large number of samples permits the examination of diverse sequences in an efficient manner’ (22). However, the research community did not immediately embrace this technique and continued to use radioactive labelling with conventional polyacrylamide gels to determine the DNA sequence specificity of DNA-damaging agents (1).
In a subsequent article, Glickman’s group used the technique to examine the distribution of UV-induced adducts in a fluorescently end-labelled PCR product and the correlation with the mutational spectrum (15). They utilized a polymerase stop assay with T7 DNA polymerase, the 3′-5′-exonuclease activity of T7 DNA polymerase and Micrococcus luteus UV endonuclease to detect the UV-induced adducts. They further refined this technique in another gene system (23). A fluorescently end-labelled PCR product was also employed to examine the sequence specificity of several alkylating agents (24).
The DNase I footprinting technique was modified for the automated DNA sequencer with a fluorescently end-labelled PCR product to examine the sequence selectivity of DNA-binding proteins (25). The precise site of ribozyme cleavage was determined using a fluorescently end-labelled substrate and an automated DNA sequencer (26). It has also been utilized to probe the secondary structure of a rRNA sequence (27).
With regard to other techniques, Hardman and Murray (28) employed thermal cycling (linear amplification) and Taq DNA polymerase to investigate the sequence specificity of haedamycin and cisplatin on an automated DNA sequencer. This methodology permitted a longer DNA sequence to be analysed with higher precision and accuracy.
The ligation-mediated PCR technique was adapted for the automated DNA sequencer and is capable of analysing the sequence specificity of DNA damage in a single-copy mammalian gene at base pair resolution (29). It has been used to study the sites of alkylation, mutation spectra and their repair in yeast cells (30) and transcription factor footprinting in rat cells (31).
The technology was improved by the introduction of capillary electrophoresis (Table 1). Capillary electrophoresis had significant advantages compared with conventional slab gel electrophoresis technology: higher resolution, greater sensitivity, higher precision, more rapid separation times, more efficient loading procedure and minimal postelectrophoresis manipulation (32).
In 2000, the protein-DNase I footprinting technique was updated for use with the capillary-based ABI Prism 310 DNA sequencer (14). The DNase I footprinting method was further updated in 2006 to provide a more accurate sizing of the cleavage products using dideoxy sequencing reactions (33). They also compared the fluorescent-labelling technique with the previously used 32P radioactive-labelling method and found very similar results from both methods; however, the fluorescent-labelling technique has several advantages as aforementioned. The DNase I footprinting technique has been further automated with a 96-well format to provide faster screening of compounds (34). In vivo and in vitro nucleosome footprinting in yeast cells has been performed with an automated DNA sequencer (35).
Capillary electrophoresis with laser-induced fluorescence (CE-LIF) has also been used to examine the GC/AT preference of DNA-binding agents (36,37), the levels of DNA damage in mammalian cells (38), the levels of abasic sites in DNA after damage by alkylating agents (39) and the accurate sizing of DNA fragments produced from alkylation by styrene oxide and subsequent cleavage (40).
More recently, using the ABI3730 automated DNA capillary sequencer, the DNA sequence specificity of cisplatin and four analogues has been determined in a DNA sequence containing telomeric repeats (41,42). The sequence specificity of bleomycin damage in fluorescently end-labelled DNA containing 17 telomeric repeats has been determined using CE-LIF (3). For both cisplatin and bleomycin, intense damage occurred in the telomeric repeat regions.
The Current Situation
The current state-of-the-art technique to determine the sequence specificity of DNA-damaging agents would utilize fluorescent labelling and capillary electrophoresis on the ABI 3730 automated sequencer. For the polymerase stop assay, a fluorescently labelled primer with thermal cycling (linear amplification) would be employed. The most appropriate size standards are dideoxy sequencing reactions using the same fluorescently labelled oligonucleotide as primer. For work with the ABI 3730 automated DNA capillary sequencer, 6FAM (6-carboxyfluorescein) is the preferred fluorescent label as FAM-labelled primers can be economically synthesized rather than the more expensive proprietary VIC, NED or PET fluorophores.
For the end-labelling technique, a fluorescently labelled PCR product would be used as the target DNA sequence (Figure 2) (3,40). For a 5′-end-labelled PCR product, a FAM-labelled oligonucleotide and an unlabelled oligonucleotide would be used as primers. For 3′-end labelling, fluorescent dUTP can be incorporated only at one 3′-end using specific primers and is the method of choice for generating fluorescently 3′-end-labelled DNA (2). Nguyen and Murray recently provided an improvement on this method (3). The most appropriate size standards are Maxam–Gilbert chemical sequencing reactions with the fluorescently labelled PCR product; the G+A reaction is the easiest and safest to perform (Figure 2) (43).
For protein ‘footprinting’, a fluorescently end-labelled PCR product would be the target DNA sequence (33). Other techniques to determine sequence selectivity could also be used with automated sequencers including ligation-mediated PCR (29–31), terminal transferase-dependent PCR (44), RNA polymerase (8) and exonuclease digestion (9,10).
Problems with the Use of CE-LIF and Automated DNA Sequencers
There are a number of potential problems with the use of CE-LIF with an automated DNA sequencer to determine the sequence specificity of DNA-damaging agents. These include the following: (i) correct sizing of products containing differing DNA sequences, (ii) comparing different fluorescent labels, (iii) overly damaged DNA with more than one lesion on the same DNA strand, (iv) artefact bands and (v) the precise chemical identity of the 5′- or 3′-ends. These problems are discussed in the following sections.
Correct sizing of products containing differing DNA sequences
During the process of capillary electrophoresis, the DNA molecules are separated as they pass through a sieving matrix, with smaller DNA fragments moving faster than larger DNA fragments (40). The problem of determining the correct size of a DNA fragment in capillary electrophoresis occurs as a result of the sizing method that the GeneMapper software (Applied Biosystems, Mulgrave, Australia) uses on the ABI sequencers. It involves LIZ fluorescently labelled size markers that are a different DNA sequence from the analysed sequence. Because DNA sequences of the same length (same number of nucleotides) but with a different DNA sequence can migrate with a different mobility on capillary gels, there can be a discrepancy between the actual size and the reported size from the GeneMapper software. This discrepancy can be as large as six bases (40). A correction table and algorithm have been used to reduce the discrepancy to <1 bp (40,45). However, the simplest way that this problem can be overcome is the use of Maxam–Gilbert or dideoxy DNA sequencing reactions on the same sequence of DNA being analysed, as DNA size markers on the capillary gels (Figure 2B) (40). In this manner, because the DNA sequencing reactions are performed on the same DNA sequence, this sizing problem is resolved.
Comparing different fluorescent labels
The fluorescent dyes that are used to label DNA can vary greatly in molecular weight and chemical identity. Hence, a DNA sequence that is labelled with different fluorophores will have different mobilities during electrophoresis. For the traditional DNA sequencing capability of the automated sequencers, the software has an algorithm that corrects for these differing mobilities. However, for DNA sequence specificity determination experiments, no reliable algorithms are available. To overcome this problem, because capillary-to-capillary variation is minimal in a particular electrophoresis run, a single fluorophore can be used and each experimental assay can be in a separate capillary. The LIZ size markers (for ABI machines) can be aligned and used to calibrate and standardize each electrophoresis run. In this manner, an experiment with several assays can each be performed in a different capillary in the one electrophoresis run. Generally, there would be several blanks, DNA sequencing reactions as size markers and DNA damage reactions in different capillaries in the one experiment that is electrophoresed at the same time. The GeneMapper software fluorescence traces can then be overlapped and used to analyse the experiment.
Overly damaged DNA with more than one lesion on the DNA strand
A problem arises if the damaged DNA strand has two lesions on the same DNA molecule. Because there is a label at one end of the DNA strand, only the lesion closest to the labelled end is detected (8,46–49). Hence, there is a bias towards shorter fragments that are closer to the labelled end of the molecule and larger fragments ‘appear’ to be damaged to a lesser extent. This problem can be ameliorated by not over-damaging the DNA to give a lower level of damage. However, this will lead to a low signal intensity of DNA damage and a lower signal-to-noise ratio. To overcome this problem, a correction algorithm has been developed to quantitatively allow for two lesions (or more) on the one DNA strand (8,47). This correction algorithm apportions a fraction of the damage closer to the primer, to damage sites further away from the primer using the following formula:
where An is the apparent percentage of damage at a particular base pair, Cn is the corrected percentage of damage, n is the position of the DNA damage band, and the closest damage band to the labelled end is n = 1 and for n = 1, Cn = An.
As an example, bleomycin DNA damage in a fluorescently end-labelled PCR product containing seventeen telomeric repeats, (GGGTTA)17, is shown in Figure 3. As can be observed, the bias is towards the shorter fragments and there is a gradual decline in the intensity of the damage bands as the fragments lengthen from T1 to T17 (Figure 3). Upon application of the correction algorithm, the band intensity associated with the seventeen telomeric repeats became very similar (Figure 4). This shows that the problem of more than one lesion on the same DNA strand can be overcome by the use of a correction algorithm.
In our preliminary experiments with the Taq DNA polymerase/polymerase stop/linear amplification system on the automated DNA sequencer, a number of artefact bands that were not associated with DNA sequences were observed. These high-intensity bands were a problem because they obscured the damage bands. Using an ethanol precipitation step, these artefact bands could be removed (41). It is likely that that the artefact bands are FAM fluorophores attached to nucleotides and/or oligonucleotides. The 5′- to 3′-exonuclease activity of Taq DNA polymerase has been shown to cleave oligonucleotides that are partially hybridized to DNA (50). It does not cleave off single nucleotides but rather cleaves off small oligonucleotides – hence, there are a number artefact bands, consisting of small fluorescently labelled oligonucleotides, that migrate at anomalous relative mobilities. These artefact peaks can be identified because they have a broad symmetrical peak.
The precise chemical identity of the 5′- or 3′-ends
The capillary gels used in automated DNA sequencers can have problems with the mobilities of certain chemical groups. If the damaged DNA has different chemical ends, then the mobilities can vary depending on the precise molecular identity of the end of the DNA molecule. For example, for DNA molecules containing the same number of nucleotides, compared with DNA strands with a 3′-OH end, dideoxy sequencing products with a 3′-H end migrate on average 0.12 bps faster, a 3′-phosphate end group migrates on average 0.63 bps faster, and a 3′-phosphoglycolate end group migrates on average 4.4 bps faster than a 3′-OH end (T.V. Nguyen and V. Murray, manuscript in preparation). These are average values, but there is also variation at different parts of the DNA molecule with relative differences of up to 6 bps for the 3′-phosphoglycolate group. Hence, it is very important to use a molecular weight size standard that is chemically exactly the same at both the 5′- or 3′-ends in a DNA sequence specificity damage experiment. These mobility problems are not as crucial in traditional polyacrylamide–urea DNA sequencing gels where the discrepancy is generally <1 bp (51).
In our work with bleomycin, we were able to examine the cleavage products with 3′-end labelling but not with 5′-end labelling (3). This is because with 3′-end labelling, bleomycin cleaves to give a 5′-phosphate as does the Maxam–Gilbert G+A sequencing reaction (used as a size standard) and restriction enzyme digests. Hence, all the 5′-ends are chemically the same and a DNA sequence specificity experiment can be accurately performed. However, with 5′-end labelling, bleomycin cleaves to give a 3′-phosphoglycolate, the Maxam–Gilbert G+A sequencing reaction results in a 3′-phosphate, restriction enzyme digests have a 3′-OH ends, and dideoxy sequencing reactions have a 3′-H end. As these 3′-ends can have different mobilities of up to 6 bps, an accurate DNA sequence specificity experiment cannot be performed on a capillary automated DNA sequencer. In this situation, an estimate with an error of several base pairs is the best that can be accomplished. In order that an accurate determination of the DNA sequence specificity is obtained, the 3′-phosphoglycolate moiety could be chemically converted into a 3′-phosphate or 3′-OH (52). Alternatively, switching to a polymerase stop assay could be the optimal strategy in this case.
The use of automated DNA sequencers to determine the sequence specificity of DNA-damaging agents results in a more precise and accurate assessment than previous manual sequencing methods. The use of capillary-based automated DNA sequencers, for example the ABI 3730, has further advanced the technique to provide a more automated process with higher resolution, greater sensitivity, higher precision, more rapid separation times, a more efficient loading procedure and minimal postelectrophoresis manipulation. These newer techniques have been utilized to determine the detailed DNA sequence specificity of several biologically relevant DNA-damaging agents. In addition, a number of problems associated with the technique can be overcome utilizing various methodologies.
In the near future, the use of next generation massively parallel DNA sequencers such as those produced by Illumina (short read) or Roche (long read) could be utilized (53,54). However, there are a number of technical issues that need to be overcome before this technology can be used for DNA sequence specificity studies. The main problem with the technology is the step involving the preparation of the DNA libraries that require a free 3′-OH group, and as aforementioned, some agents do not cleave DNA or have a blocking group at the 3′-end (bleomycin). These problems can be surmounted, and it is expected that this technology will be used for DNA sequence specificity studies in the coming years.