Speed-up of DNA melting algorithm with complete nearest neighbor properties



We describe an optimized algorithm, which is faster and more accurate compared to previously described algorithms, for computing the statistical mechanics of denaturation of nucleic acid sequences according to the classical Poland-Scheraga type of model. Nearest neighbor thermodynamics has been included in a complete and general way, by rigorously treating nearest neighbor interactions, helix end interactions, and isolated base-pairs. This avoids the simplifications of previous approaches and achieves full generality and controllability with respect to thermodynamic modeling. The algorithm computes subchain partition functions by recursion, from which various quantitative aspects of the melting process are easily derived, for example the base-pairing probability profiles. The algorithm represents an optimization with respect to algorithmic complexity of the partition function algorithm of Yeramian et al. (Biopolymers 1990, 30, 481–497): we reduce the computation time for a base-pairing probability profile from O(N2) to O(N), where N is the sequence length. This speed-up comes in addition to the speed-up due to a multiexponential approximation of the loop entropy factor as introduced by Fixman and Freire22 and applied by Yeramian et al.25 The speed-up, however, is independent of the multiexponential approximation and reduces time from O(N3) to O(N2) in the exact case. A method for representing very large numbers is described, which avoids numerical overflow in the partition functions for genomic length sequences. In addition to calculating the standard base-pairing probability profiles, we propose to use the algorithm to calculate various other probabilities (loops, helices, tails) for a more direct view of the melting regions and their positions and sizes. This can provide a better understanding of the physics of denaturation and the biology of genomes. © 2003 Wiley Periodicals, Inc. Biopolymers 70: 364–376, 2003