Lineage‐specific genomics: Frequent birth and death in the human genome

Frequent evolutionary birth and death events have created a large quantity of biologically important, lineage‐specific DNA within mammalian genomes. The birth and death of DNA sequences is so frequent that the total number of these insertions and deletions in the human population remains unknown, although there are differences between these groups, e.g. transposable elements contribute predominantly to sequence insertion. Functional turnover – where the activity of a locus is specific to one lineage, but the underlying DNA remains conserved – can also drive birth and death. However, this does not appear to be a major driver of divergent transcriptional regulation. Both sequence and functional turnover have contributed to the birth and death of thousands of functional promoters in the human and mouse genomes. These findings reveal the pervasive nature of evolutionary birth and death and suggest that lineage‐specific regions may play an important but previously underappreciated role in human biology and disease.


Introduction
The large and varied diversity observed between individuals and across species is reflected in high levels of genetic diversity. The study of this diversity between mammalian species has been possible on a genome-wide scale since the publication of the first complete drafts of the human and mouse genomes in 2001 [1] and 2002 [2], respectively. The subsequent emergence of next-generation sequencing technologies [3] has led to an explosion of whole-genome sequencing, such that public databases now host many mammalian genomes, and 39 of these can currently be directed viewed and compared through the Ensembl genome browser (www.ensembl.org) [4]. The first personal genome sequence was only published in 2007 [5], but has now been joined by a number of studies, including the 1,000 Genomes Project which sequenced over 1,000 individual genomes [6] and those from deCODE Genetics which sequenced over 2,500 individual genomes from the Icelandic population [7].
This wealth of data has stimulated the field of comparative genomics, which investigates both the similarities and differences between genomes. Much early work focussed on identifying shared features between sequenced genomes and restricted itself to the small proportion of the genome which encodes for protein-coding genes. Many of these genes have been deeply conserved throughout evolution from yeast to human, the lineages of which diverged approximately one billion years ago [8]. The number of protein-coding genes found in vertebrate species is relatively constant and, unexpectedly, does not appear to correlate with our assumptions regarding organismal complexity [9]. Not all protein-coding genes, however, are evolutionarily ancient. For example, C20orf203 is found only in the human genome, and is absent from closely related primates. This gene is highly expressed in the brain, and is further upregulated in Alzheimer's disease, which suggests a potential role for this lineage-specific gene in the development of the disease [10]. There are now several reports of genes that have been born and died in various species, through a variety of mechanisms (fully reviewed by Kaessmann [11]).
Comparative genomics has also been applied to studying the remaining, non-coding, regions of the genome, which make up almost 99% of the genome [12] and contain a wealth of transcriptional and regulatory elements. MicroRNAs (miRNAs) -short, approximately 22 nt long, non-coding RNA genes primarily involved in negative regulation of proteincoding genes [13] -are often deeply conserved across a range of divergent species [14]. Long non-coding RNAs (lncRNAs) are a relatively unstudied class of non-coding transcripts that are over 200 bp long [15]. lncRNAs show modest evolutionary constraint, which has been interpreted as indicating that these sequences have been conserved across species because they encode a biological function [16][17][18]. Similarly, non-coding regulatory elements such as enhancers, which positively regulate gene expression at a distance [19], have been computationally predicted in regions which show increased evolutionary conservation across species. This approach has been demonstrated to have a 45% success rate in predicting enhancers using comparative genomics alone [20].
As for protein-coding genes, there are also corresponding examples of non-coding RNA genes that have been emerged during evolution, such as the mouse-specific lncRNA Poldi. This lncRNA is restricted to the post-meiotic cells of the testis, and promotes sperm motility and testis development [21]. Similarly, the miRNA miR-941 -which was born in the human lineage between one and six million years ago from the expansion of an evolutionarily unstable tandem repeat sequence -was recently discovered to be important for neurotransmitter signalling in the brain [22].
There are a handful of known examples of enhancer birth and death in the human genome. For example, the non-coding element HACNS1 has evolved rapidly in humans and only the human sequence, but not the orthologous primate sequence, is able to function as a limb enhancer in mouse reporter experiments [23]. Alternatively, an enhancer for the AR gene, which is conserved across most mammalian species has been deleted from the human genome. The activity of this enhancer is correlated with the formation of whiskers and penile spines in non-human mammals, and it has been speculated that this loss in humans may be linked to increased monogamous reproductive strategies relative to other primates [24].
On a genome-wide scale, non-coding elements are less conserved between mammalian species than protein-coding genes [25,26]. This has led many to speculate that organismal diversity is not in fact driven by changes to the protein-coding gene set, but by divergence in the regulatory mechanisms responsible for controlling their expression [27]. Due to their increased volatility, much recent work has therefore focussed on the birth and death of such non-coding, regulatory elements within the human genome.
In this essay, I will first discuss sequence turnover, where sequence is either inserted (born) or deleted (dies) along a lineage. I will then describe analyses of functional elements that have been identified through experimental profiling (Box 1) and were subsequently shown to completely turn over between lineages despite conservation of the underlying DNA sequence (functional turnover). Where such profiling has been done in multiple species, it is possible to further define functional turnovers as gains and losses down individual lineages and throughout this essay I will also refer to these as birth and death events, respectively. Finally, I will examine those studies that have considered both the birth and death of sequence and function in the same experimental system. I will show that the birth and death of entire regulatory elements are frequent occurrences within the human genome, and will suggest that future research is likely to focus on both the transcriptional regulatory and phenotypic consequences of these events to normal and perhaps also pathogenic human diversity.

Sequence turnover is common in the human genome
The insertion or deletion of sequence along one of the lineages that separates two species (collectively known as 'indels') results in gaps in the sequence alignments which describe the relationship between orthologous sequences within the two genomes being compared. Insertions can be discriminated from deletions by comparing the sequences of three or more genomes simultaneously. The principle of parsimony makes the assumption that the most likely evolutionary history for a set of related sequences is the scenario that can be explained by the minimal number of mutations. In this way, a sequence is defined as having been deleted if it is present in the outgroup species, while an inserted sequence will be absent from this species (Fig. 1). This type of analysis also allows one to identify the lineage along which the insertion or deletion has taken place. Most alignment tools and analysis programs, however, treat these gaps as missing data. Within the human population, there are likely to be many millions of polymorphic indels which are found in some, but not all, individuals. A study of 79 diverse human genomes reported almost two million small indels [28]. However, the limited overlap between this and other studies suggests that this is an under-estimation of the total number of indels segregating within the human population, and that there are many indels yet to be discovered [28]. Longer regions of sequence that has either been inserted or deleted within an individual genome -known as structural variants and generally defined as longer than 1 kb -are less common, and only approximately 20,000 have so far been detected by the 1,000 Genomes Project [6]. Within individuals, these variants of different lengths have been found to be associated with differences in gene expression [29,30].
Polymorphic indels disrupt the coding sequence of over 6% of annotated human genes, but 72% of genes contain an intronic indel [31]. The reduced frequency of evolutionarily conserved, functional material within intronic sequences [32] implies that these coding sequence-disrupting indels are likely to confer a substantial genetic load. Many deletions are shared across populations, and have been present since humans migrated out of Africa [33]. The lower average frequency of deletions relative to insertions in the population suggests that sequence loss is more damaging than the birth of new sequence [34].
The distribution of indels throughout the genome has been used to quantify the amount of functional, but lineagespecific, sequence within the genome. This model assumes that indels occur randomly within the genome and that unexpectedly large distances between indels therefore contain sequence which is preserved by natural selection [35], presumably because this sequence conveys a biological function as yet unknown. By comparing the quantity of sequence defined to be functional using this metric from a range of different pairs of species alignments, the authors determined that this quantity rapidly decreases as the evolutionary distance between the species being compared increased [36]. This implies that most functional material is conserved only within a narrow range of related species, and that there must be a rapid turnover of functional sequence and a large quantity of lineage-specific sequence within mammalian genomes. This rate of sequence turnover is not constant between genomes, and appears to be higher along the mouse lineage, where sequence is preferentially deleted at a particularly high rate [37,38]. The vast majority of this evolutionary volatile sequence is found outside proteincoding gene borders, and it has been predicted that 110-143 Mb (50%) of functional non-coding DNA sequence within the human genome has turned over in the last 130 million years [39].

What mechanisms drive sequence birth and death in the genome?
There are a number of molecular mutations that insert or delete sequence in the genome. Transposable elements, which are capable of jumping around the genome, make up approximately half of the human genome [40] and are divided into two major classes. Retrotransposons duplicate via an RNA intermediate ( Fig. 2A) before the new copy is reintegrated into the genome at a distant site. Retrotransposition does not typically include the copying or movement of intronic and surrounding regulatory DNA. The other class of DNA transposons use a cut-and-paste mechanism (Fig. 2B) in which the entire DNA sequence is excised and then reintegrated into the genome. Repetitive elements, and particularly retrotransposons, are enriched at both speciesspecific enhancers and gene promoters [41]. Promoters, which are the site of RNA polymerase II complex assembly and transcription initiation [42], are enriched for repetitive elements only at sequences that have been inserted, rather than deleted, in both the human and mouse genomes [43]. One class of transposons, known as long terminal repeats (LTRs), is particularly common at tissue-restricted promoters, which is consistent with the previously reported role for LTRs in driving such a limited expression profile [44,45]. Despite the association between simple repetitive elements and sequence deletion [46], no such relationship was found between repetitive sequences and promoters which have been deleted along either the human or mouse lineages [43]. Instead, simple repeats were found to be enriched at newly inserted promoters that are broadly expressed, but it remains unknown which types of this family of repeats are responsible or the manner in which they drive widespread expression across tissues.
Sequence can also be inserted or deleted from the genome through the activity of normal cellular processes, such as recombination and replication. Unequal crossing-over takes place when non-homologous regions are paired during cell division. This can result in one chromosome gaining sequence and the other losing the same sequence (Fig. 2C), but this Figure 1. By comparison to a third outgroup species, the lineage along which a mutation took place can be identified and an alignment gap (indel) can be classified as an insertion or deletion. In this example, a gap in the orthologous sequence in both the mouse and pig genomes reveals that novel sequence has been inserted in the human genome -as shown by the blue triangle -and that there has been a birth of sequence on the human lineage. Conversely, if there is no gap in the orthologous pig sequence, then a sequence death (pink triangle) is inferred to have taken place on the mouse lineage. exchange of sequence need not be reciprocal as shown here. Indeed, a comparative analysis of the human, chimp and macaque genomes has suggested that recombination is more associated with the gain of sequence than sequence loss in the human genome [47]. DNA replication can also create indels through replication slippage (Fig. 2D), if the DNA polymerase skips over a region (known as 'forward slippage') to remove sequence or skips back to replicate a region twice (known as 'backward slippage'), resulting in the insertion of a second copy of the sequence. These replication errors are most frequent at regions containing nearby tandem duplications and, although it has been suggested that they are responsible for most births of recently arisen short insertions in the human lineage [48], replicative errors are actually more likely to be associated with the loss of sequence across the entire genome [47].
Despite reports of sequence gain and loss within the human genome, their accurate discovery remains a difficult task, requiring the development of specialised computational pipelines [49]. The two commonly used genome-wide alignments have been built with different methods -the Ensembl EPO pipeline [50] builds alignments and reconstructs candidate ancestral genome sequences across multiple species simultaneously while the UCSC BLASTZ alignments [51] are generated from small, local sequence alignments of two species which are then extended into larger blocks of related DNA sequences. Multi-species alignments are then build separately using these pairwise alignments. The difference in these approaches results in substantial differences in the amount of aligning sequence, e.g. UCSC aligns 1.0 Gb (33%) of the human genome to mouse while Ensembl aligns only 820 Mb (26%), and similar discrepancies in the amount of sequence which is estimated to have been gained or lost within the human genome. Furthermore, progressive alignment algorithms incur a greater penalty when creating insertions rather than deletions [52], which hinders robust discrimination of these separate classes of mutations. Further improvements in our ability to identify regions which are inserted and deleted within whole-genome alignments and the human population are likely to take account of the different mechanisms, and sometimes complicated, evolutionary histories which generate these events. Our increased knowledge of driving forces between these events should improve our ability to predict and accurately detect when an insertion or deletion has truly taken place, rather than as now solely defining them as positions within the genome where alignment pipelines fail to identify orthologous sequences.
Tissue-restricted regulatory elements show frequent functional turnover Genome sequencing projects have been followed by a second wave of functional genomics studies, as exemplified by the work of the ENCODE consortium [53]. Functional genomics combines experimental techniques with advances in DNA sequencing to investigate the functional role of genes and other regulatory DNA sequences throughout the genome (see Box 1).
Large-scale functional turnover of both transcription factor (TF) binding and promoter locations have been reported. A comparison of four liver-specific TFs (FOXA2, HNF1A, HNF4A and HNF6) in human and mouse revealed that 41-89% of their binding locations within aligning sequence were found in only one of these two species [54], implying a substantial rate of functional turnover. This suggests that there are many births and deaths of these binding sites along the two lineages, but this could not be confirmed from the data published in this study (Fig. 1). This high rate of TF binding turnover in the liver takes place across much of the animal clade (Fig. 3A) [26] and can even be detected between individual rodent lineages, suggesting that these turnover events are evolutionarily very rapid [55].
Complete functional turnover of genetic elements, such as enhancers and promoters, is also frequent between mammalian species. While turnover is less prevalent in cis-regulatory modules that contain multiple TFs bound to the same locus [55], only 279 (less than 1%) of enhancers active in the liver alone are conserved across 10 placental genomes [41]. Promoters defined epigenetically by the presence of trimethylated histone H3 lysine 4 (H3K4me3) in the same system seem to be less susceptible than enhancers to this type of functional turnover [41], but this may not reflect the true turnover rate as promoters defined by their transcriptional output using CAGE turn over more frequently [56] (see also below 'Both sequence and functional turnover contribute to the birth and death of functional promoters' for a more detailed discussion where promoter turnovers were polarised into births and deaths). There may be further differences between promoter and TF-binding site functional turnover because these events at promoters are often accompanied by changes to the underlying sequence [43] while binding site locations within rodents can turn over without changes to the underlying DNA sequence [55]. It remains unclear what mechanisms, such as the lack of cooperative binding partners or a compaction of the local chromatin state, are responsible for driving these TF turnover events.
The function of DNA-binding factors or the length of their DNA-binding motif may also be related to their evolutionary volatility. For example, the binding locations of the insulator protein CTCF -which also has an unusually long binding motif -are much more conserved between mammalian species than most transcription factors, and therefore less likely to be gained or lost between mammalian species [57]. Fifteen per cent (5,178/33,966) of alignable binding sites in human are also present in each of macaque, mouse, rat and dog [58]. Unlike TFs which often possess tissue-specific roles in regulating gene expression, CTCF binding sites are largely consistent across tissues [59]. CTCF is important for regulating the three-dimensional structure of the genome, e.g. by insulating transcriptionally active from inactive regions [60] and it also demarcates the borders of DNA sequences which are anchored to the nuclear periphery [61]. Like individual CTCF binding sites, this structural role for CTCF appears to be conserved as is the higher-order genome structure which it regulates [62].
Both the mouse ENCODE [63] and FANTOM5 [64] collaborative projects have carried out comparative functional genomics analyses across a range of tissues, and confirmed this rapid functional turnover throughout the human and mouse genomes. Tissue-restricted elements are more susceptible to turning over, perhaps due to the increased functional constraints on pleiotropic elements that are active across tissues, and the immune system and testis appear to be the tissues throughout the body with the greatest rates of turnover [43]. While it is likely that the many of the changes observed within immune cells are driven by positive natural selection to avoid host pathogenicity [65], it is currently unclear to what extent sexual selection and the locally elevated mutation rate at active sites in the germ cells [43,[66][67][68] contribute to the functional element turnover within the testis.

The transcriptional and phenotypic consequences of functional turnover remain unclear
There is evidence that the turnover of an individual binding site can be compensated by the birth of a binding site at a second site within a gene locus (Fig. 3B). For example, approximately 25% of species-specific TF binding site functional losses in the liver were mirrored by the gain of a separate, species-specific, binding site within 10 kb [26]. Similarly, while 53% of genes targeted in an OCT4 knockdown in embryonic stem cells in both human and mouse contained nearby OCT4-NANOG binding, only 15% of these binding sites were found at the same position in both species [57]. These observations suggest that, while the rapid evolutionary turnover of TF binding sites may be driven by a high mutation rate at these sites [68], this may be matched by a strong selective pressure to prevent a subsequent divergence in the transcriptional output regulated by these factors [69].
Furthermore, the trans environment within the cell is more conserved than the individual elements themselves, as TF-to-TF interactions [70] and TF network topologies [71] are similar between human and mouse. These results are consistent with the independent observation that human chromosome 21, when inserted into mouse hepatocytes, behaves in largely the same manner as the human chromosome when in human cells. This again suggests that species-specific binding is due to changes at binding sites themselves but that the same transacting TFs can still drive DNA binding in both species [72].
There is little evidence that the majority of these lineagespecific elements, although defined by their functional activity, are directly involved in transcriptional regulation. It is known that not all TF binding sites have a direct effect on gene expression [73,74]. Binding sites that are identified in the liver and are conserved in multiple mammalian species show greater functional enrichments (e.g. disease ontology annotations), and are found near genes with a higher expression than species-specific binding sites [75], suggesting that those sites that have been gained or lost down individual lineages are less likely to be functionally important in positively driving gene expression. Furthermore, gene expression and nearby TF binding divergence in the liver do not appear to be generally correlated within closely related mouse species [69]. However, these conclusions are contradicted by the observation that mouse-and human-specific binding of the glucocorticoid receptor in macrophages were associated with species-specific upregulation of neighbouring genes upon glucocorticoid stimulation [76]. Further studies of this type are required to determine whether the liver or the macrophage is the more representative system.
Population genetics studies also support the argument that few regulatory elements that have been born along the human lineage possess a biological function. Although enriched for disease-and trait-associated variants, the nucleotide diversity for human-specific DHSs is relatively high and comparable to that of fourfold-degenerate sites in exonic sequence, which further suggests that DHSs possess a relatively limited proportion of functional sequence [77]. These elements do, however, experience at least some purifying selection, as indicated by their reduced diversity relative to sequences defined biochemically to be inactive [78].
The ultimate test of functionality of lineage-specific regions is to disrupt them to determine their biological role. Whether, as predicted computationally [36], these elements are frequently responsible for a phenotype has yet to be tested in a systematic manner.

Functional births and deaths of regulatory elements may be associated with expression changes at nearby genes
With data from only two species, these studies are largely limited to describing turnover events, and matched data from a third species is required to discriminate functional births from deaths along individual lineages (Fig. 1). Shibata et al. [79] measured DNaseI hypersensitive sites (DHSs) in human, chimp and macaque fibroblasts and identified hundreds of gains and losses along both the human and chimp lineages. DHSs that were born along each lineage were associated with up-regulation of nearby target genes, while DHSs which died were associated with the concomitant downregulation of nearby genes. However, most differential gene expression could not be explained by the simple gain or loss of DHS sites. Both gained and lost DHSs were more likely to be experiencing positive selection specifically along the lineage in which they had been gained and lost, respectively. Active enhancers and promoters were similarly identified in embryonic limbs from human, macaque and mouse by profiling the location of acetylated histone H3 lysine 27 (H3K27ac) [80]. Promoters identified using this data have gained activity along the human lineage more rapidly than enhancers (13 vs. 11%), but the vast majority of both classes of elements are gained through the co-option (exaptation) of existing sequence rather than insertion of novel sequence [80]. A similar study in the same species mapped two epigenetic marks (dimethylated histone H3 lysine 4, H3K4me2) and H3K27ac during human, macaque and mouse corticogenesis to confirm a high rate of human-specific promoter and enhancer birth [81]. These human-specific elements were frequently found at, or near to, genes important for cortical development, suggesting that they may play important roles in regulating human-specific aspects of this important biological process. A collection of histone modifications and protein-binding sites have also been profiled in matched human, mouse and pig pluripotent stem cells, where divergence in the intensity of these binding factors at gene promoters is correlated with gene expression divergence [82]. However, these authors did not explicitly examine functional birth and death of these elements between the lineages studied.
These three-species experiments also differ from those mentioned above in their methodology for detecting lineagespecific elements. Those studies described above which focused on liver-specific transcription factors identified binding regions in each species independently and then defined lineagespecific regions as those in orthologous regions for which no binding peak had been discovered in other species. The description of an individual region as being lineage-specific is dependent on the genome-wide alignments used to identify orthology as these show clear discrepancies in the amount of sequence which can be aligned between species (see 'What mechanisms drive sequence birth and death in the genome?' above). The degree of overlap required to identify orthologous regions also affects the detection of functional turnover events. Some studies consider a single 1 bp overlap between regions as sufficient to define them as being conserved while others have required at least a 50% overlap in reciprocal comparisons between species [41], which will reduce the number of lineagespecific regions that can be identified from the same data. In contrast, these studies describing functional genomics data from other tissues [79][80][81], combine these alignments with statistical methods, such as edgeR [83], to detect lineagespecific regions as those orthologous regions which also show differential levels of histone modifications or chromatin accessibility between species. This approach does not depend on calling peaks in all species and will therefore account for regions with evidence for binding that just misses the threshold for calling a peak as significant within one of the related species. The use of a statistical framework also makes it possible to quantitatively measure the confidence in a single region being truly lineage-specific and how these regions differ from those identified in the same system which show binding in all species, albeit at significantly different levels. datasets to make these statistical assessments, these more complex approaches, using more than simple genomic overlaps, will likely be considered the more robust approach to detect functional turnover in future.

Both sequence and functional turnover contribute to the birth and death of functional promoters
While both mechanisms of birth and death in the genomesequence and functional turnover -are clearly important contributors to lineage-specific genomics, it is only recently that they have been explicitly investigated simultaneously in the same experimental system.
The FANTOM5 project, which identified promoter locations across a range of matched human and mouse cell lines and tissues, described the half-life of promoters when aligned to increasingly divergent species [64]. Evolutionary history varied with both expression profile and promoter class, where broadly expressed protein-coding promoters and tissuerestricted ncRNA promoters were more deeply conserved. These patterns have been similarly observed within aligning exonic sequence in both protein-coding genes in human [84] and lncRNAs in Drosophila [18].
The sequences of a large number of promoters have been born or died along the human lineage (conservatively 2,472 and 2,818, respectively), since its divergence with mouse [43]. As seen for regulatory elements within the ENCODE datasets, the gain and loss of promoters is enriched within immune cells and the testes and brain-biased promoters were less likely to show either type of sequence turnover. Genes that experienced at least one of these turnover events were enriched for evidence of positive selection acting on their coding sequence, suggesting promoter turnover may be related to adaptive evolution throughout the encoded protein, and not just at the turnover site [79]. However, within the human population, both inserted and deleted promoters showed no evidence of either positive or purifying selection, suggesting that, as for the species-specific TF binding sites described above, many of these may not be phenotypically relevant.
Many promoters whose sequence has been conserved between human and mouse have experienced functional turnover (22 and 13% of aligned promoters in human and mouse, respectively), as they show no detectable evidence of transcription in the opposing species. These species-specific promoters are specifically associated with decreased evolutionary constraint at the promoter elements [43]. Similar levels of evolutionary constraint were seen at promoters with matched, divergent or reduced expression between species, suggesting that differences in transcriptional output were not driven by sequence changes at the promoter or at cis-regulatory elements found at a constant distance from the promoter. This contrasts with the inverse correlation seen between expression and substitution rate divergence at promoters activated in Figure 4. Current state of published, functional genomics data from various mammalian species which are related as shown in the phylogenetic tree. Branch lengths indicate the genome-wide estimate of the neutral substitution rate at fourfold degenerate sites. ChIP-seq datasets describing the location of several TFs [53], CTCF [57], H3K27ac and H3K4me3 [40] in the liver are available for up to 20 mammalian species, nine of which are shown here. Further studies of three species simultaneously have examined DHSs in fibroblasts [72]; various histone modifications and TFs in pluripotent stem cells [75]; H3K27ac in the developing limb [73]; and H3K4me2 and H3K27ac during corticogenesis [74]. The ENCODE, mouse ENCODE and FANTOM consortiums have published large collections of datasets from human and mouse tissues and cell lines. Comparative functional genomics studies within populations of the same species are likely to be a focus of future research.

R. S. Young
Prospects & Overviews .... lipopolysaccharide-stimulated macrophages [85]. Whether these differences are specific to the macrophage timecourse profiled here, or are a general feature of stress-response genes remains unclear.
While sequence gain and loss are clearly important factors in promoter evolution along the mouse and human lineages, the lack of well-matched data across more species remains the limiting factor for resolving the large number of functional turnovers into births and deaths at aligned sequence. As shown in Fig. 4, beyond the liver, the number of mammalian genomes that has been sequenced outnumbers the number of tissues that have been comprehensively profiled experimentally in multiple species. Even when datasets from multiple tissues are available from consortia such as ENCODE and FANTOM, matched samples are usually only available for human and mouse, hence precluding this discrimination of functional gain from loss at aligning sequences.

Conclusions and prospects
The birth and death of both sequence and function is a common occurrence within the human genome, and represents an important contributor to genetic diversity. These turnover events have been observed at both distal regulatory elements and functional promoters and confirm that, while a useful predictor, evolutionary conservation is not required to identify functioning, lineage-specific elements in the human genome.
The availability of large amounts of functional genomics data in both human and mouse have already allowed evolutionary turnover to be investigated across a number of tissues. Extending the available datasets to more distantly related species will permit functional turnover events to be resolved into births and deaths along individual lineages (Fig. 4). Investigating the dynamics of births and deaths within the human population should reveal any phenotypic consequences, and whether they are associated with disease, such as autoimmune disorders, which could aid in the development of personalised strategies to treat these.
Investigations into the combined effects of sequence and functional turnover in the birth and death of genetic elements have only recently been attempted. Further work will likely focus on the direct relationship between these, for example one might ask if functional deaths result in biologically unimportant sequence that is then a target for sequence deletion and the complete removal of the element from the genome.
Despite this work, the biological relevance of these evolutionarily volatile elements still remains unclear. Do they drive the diversification of gene expression profiles, or do they simply represent the neutral churn of redundant genetic elements in the genome? By carefully matching current datasets to increasing amounts of functional genomics data from multiple species [86], we now have an exciting opportunity to reveal the role of evolutionary birth and death in shaping the mammalian genome and its regulatory apparatus.