Comparative analysis of genome maintenance genes in naked mole rat, mouse, and human

Genome maintenance (GM) is an essential defense system against aging and cancer, as both are characterized by increased genome instability. Here, we compared the copy number variation and mutation rate of 518 GM-associated genes in the naked mole rat (NMR), mouse, and human genomes. GM genes appeared to be strongly conserved, with copy number variation in only four genes. Interestingly, we found NMR to have a higher copy number of CEBPG, a regulator of DNA repair, and TINF2, a protector of telomere integrity. NMR, as well as human, was also found to have a lower rate of germline nucleotide substitution than the mouse. Together, the data suggest that the long-lived NMR, as well as human, has more robust GM than mouse and identifies new targets for the analysis of the exceptional longevity of the NMR.

(http://genome.ucsc.edu/; Kent et al., 2002) to map the corresponding human and mouse protein sequences to the different genomes, looking for full length sequences, and used GeneWise and MUSCLE alignments to check for pseudogenes , gene structure, and homology.
We also checked whether the two genome maintenance genes duplicated in the NMR genome, TINF2 and CEBGP, are also duplicated in the closely related African mole rat species, the Damaraland mole rat (DMR) (Fang et al., 2014). We found only one copy each of TINF2 and CEBGP in the DMR genome using the GLEAN gene annotation pipeline and GMAP (Wu and Watanabe, 2005) Thus, as the NMR and DMR separated ~26 million years ago, the genome maintenance duplications we found may contribute to the 10+ years longer lifespan of the NMR.

Comparing nucleotide substitution rates
To study how genes evolved in human, mouse, and naked mole rat, for each gene, we first extracted from GeneWise alignment files for the longest coding and peptide sequences from each species. We then aligned the nucleotide coding sequences (based on the alignment of their corresponding peptide sequences) and calculated the number of nucleotide substitutions per site between each possible pair of the three species. We aligned the coding sequences based on the alignment of their corresponding peptide sequences that we generated using MUSCLE (Edgar, 2004). For each GM or random gene, we used Kimura's 2-parameters model (Kimura, 1980) to calculate K HM , K HN , and K MN , the nucleotide substitutions per site for the human-mouse, human-NMR, and mouse-NMR sequence comparisons, respectively. For these protein-coding genes, we also calculated Ka and Ks, the number of substitutions per synonymous site and per nonsynonymous site, for all three comparisons using Li's method (Li, 1993). To calculate the substitution rates, we used the divergence time reported previously (Kim et al., 2011): 93 million year ago (MYA) between human and rodent and 73 MYA between mouse and naked mole rat.
To compare K, Ka, and Ks of a GM gene in each of mouse, naked mole rat, and guinea pig lineages separately, we used the coding sequence of its human ortholog as the out-group and calculated K, Ka, and Ks between human and mouse, naked mole rat, and guinea pig respectively (Supplementary Figure 2).
To compare K, Ka, and Ks of a radom gene in each of human, mouse, and naked mole rat lineages separately, we also used the coding sequence of its chicken ortholog as the out-group and calculated K, Ka, and Ks between chicken and human, mouse, and naked mole rat, respectively (Supplementary Figure 3).
We also investigated the nature of nonsynonymous codon changes to the protein coding sequences among human, mouse, and naked mole rat. To do this, we used the aforementioned MUSCLE-generated peptide sequence alignments. Given the alignment of a particular gene, we only considered the residue sites with at least one sequence change, ignoring gaps and invariant sites among all three species. Using BLOSUM62, we scored the pairwise comparison between two species at these sites with nonsynonymous substitutions in a gene, and then summed these scores to give a final score for the gene between the two species. The smaller the score, the more different the two sequences are (and thus more deleterious the changes are). We calculated the scores of the human-mouse, human-NMR, and mouse-NMR sequence comparisons for all genome maintenance genes and a set of random genes. By using the human gene sequences as the reference, it can be seen that the alignment scores of genome maintenance genes are higher for naked mole rat than mouse (P = 0.03888 by Wilcoxon rank sum test). However, randomly selected genes do not show this tendency (P = 0.7555 by Wilcoxon rank sum test). The results indicate that genome maintenance genes are more conserved in naked mole rat than that in mouse.