Domain shuffling and the increasing complexity of biological networks


  • Sandro J. de Souza

    Corresponding author
    1. Ludwig Institute for Cancer Research, São Paulo branch, São Paulo, Brazil
    2. Institute of Bioinformatics and Biotechnology, São Paulo, Brazil
    • Ludwig Institute for Cancer Research, São Paulo branch, São Paulo, Brazil.
    Search for more papers by this author


original image

Domains can spread among proteins in a process called domain shuffling and this has been identified as one of the major mechanisms leading to the formation of new proteins throughout evolution. This process has an impact on the topology of protein-protein interaction networks as it may create new hubs and also increase interconnectivity.


DDI, domain-domain interaction; PPI, protein-protein interaction.


The notion that “new” proteins were the major genetic units responsible for phenotypic novelty was prevalent in the late 1970s and in the 1980s. Although this notion is still valid for specific cases, especially involving master genes like PAX6, nowadays it is generally believed that new phenotypes rarely emerge through the origin of a completely new protein 1. Many alternatives have been proposed. Some authors, for example, have suggested that altered gene expression patterns are responsible for the emergence of new traits 2, 3. The availability of genome-wide datasets from a variety of species and the consequent emergence of more systemic views on how genes and proteins act has led us to understand that the evolution of biological novelty (and consequently complexity) is a direct consequence of new functional pathways and modules or the rewiring of pre-existing ones. But how have these biological networks evolved? Most of the studies on this problem are centered on protein-protein interaction (PPI) networks, especially because of the availability of this type of data for many species. Many authors have shown that gene or genome duplication seems to have an important role in the evolution of PPI networks 4–6. In support of this, paralogous proteins present a conserved pattern of interactions and therefore share more partners than expected by chance 5. Furthermore, networks modeled according to this duplication/divergence idea display topological properties very similar to real biological networks 4, 7, 8. Pereira-Leal et al. 6 showed that duplication of self-interaction nodes is crucial for the evolution of more interconnected networks. Compared to gene and genome duplication, little attention has been dedicated to domain shuffling.

The evolutionary history of domain shuffling

Domains are the most important functional units in proteins and a significant proportion of PPIs occur through domain-domain interactions (DDI). Evidence accumulated in the last 35 years has shown that domains can spread among proteins in a process called domain shuffling 9, 10. It is generally accepted, at least for eukaryotes, that the recombinations that lead to domain shuffling are mediated by intronic sequences 11–13 through exon shuffling events 14.

Domain shuffling has been identified as one of the major mechanisms leading to the formation of new proteins throughout evolution 11, 12, 15. Bursts of domain shuffling events are clearly associated with the emergence of biological novelty, such as multi-cellularity. Most of the proteins that compose the extra-cellular matrix were built by domain shuffling 15–17. The same is true for cell surface receptors, whose expansion in multi-cellular animals with a more developed nervous system is clearly associated with domain shuffling 18.

A feature that illustrates the widespread occurrence of domain shuffling is the excess of domains flanked, at the DNA level, by introns of the same phase (the position of the intron within the codon) in many metazoans 11–13. An excess of this type of domain is a feature of this mechanism because the shuffling of a non-symmetric domain is expected to be deleterious since it would disrupt the open reading frame of the host gene. There is a huge excess of domains flanked by phase 1 introns in complex eukaryotes, especially mammals (reviewed by refs. 15, 16).

Preliminary results from my lab have shown that domain shuffling has been important in protein evolution since the early days of eukaryotes; although a dramatic increase occurred in the last billion years (França et al., submitted). We have been able to show that domain shuffling events mediated by phase 1 introns are clearly associated with the emergence of multi-cellularity. While some unicellular organisms show an excess of domains flanked by phase 0 introns, multicellular animals have a dramatic excess of domains flanked by phase 1 introns. Tricoplax adhaerens, a primitive metazoan, shows an intermediate pattern with an excess of domains flanked by phase 0 or phase 1 introns (França et al., submitted).

Domain shuffling goes systems biology

In the last few years, we have witnessed an increasing interest in the role of domain shuffling in the origin and evolution of biological networks. For example, by mapping DDIs onto PPI networks, Itzhaki et al. 19 showed that there is a catalog of interacting domains repeatedly used throughout evolution to mediate PPIs. Although domain shuffling was not explicitly explored in their paper, their data is in accordance with a model in which domain shuffling is important in defining the topology of PPI networks. Evlampiev and Isambert 8 modeled the evolution of PPI networks under a scenario of genome duplication, followed by extensive domain shuffling. They concluded that evolved networks are robust to extensive domain shuffling. This issue has been explored further by a few authors more recently.

First, Cancherini et al. 20 hypothesized that domain shuffling would impact the topology of PPI networks. That was confirmed when they observed that the average degree (number of interactions in a node) was significantly higher in proteins with evidence of at least one domain shuffling event when compared to proteins having no detected domain shuffling events. Proteins with at least one domain shuffling event were also enriched with self-interacting proteins when compared to proteins with no detected domain shuffling events. This reinforces the notion that self-interacting proteins are important for the evolution of PPI networks. More interestingly, there was evidence of enrichment of “promiscuous” domains, i.e. those with several interacting partners, in the domain shuffling events across metazoa. This evidence led Cancherini et al. 20 to conclude that domain shuffling is one of the major mechanisms in the evolution of hubs (nodes with a high degree of interactivity) in PPI networks. All these aspects are illustrated in Fig. 1, where the shuffling of a single promiscuous domain (the red triangle) to another protein dramatically changes the topology of the sub-network, creating a new hub.

Figure 1.

Two sub-networks before (A) and after (B) a hypothetical domain shuffling event. The shuffling involved a promiscuous domain (red triangle from the central node to the node at the upper right corner). The event dramatically changes the topology of the sub-network, generating a new hub as well as promoting a more interconnected sub-network.

One interesting aspect emerging from the study of Cancherini et al. 20 is the possibility that domain shuffling has a role not only in creating hubs, but also in increasing the inter-connectivity of networks. This is a critical aspect of the evolution of organismal complexity. Figure 1 also illustrates this, since the sub-network on the right (panel B) is clearly more interconnected than the sub-network in panel A (before the shuffling event). Furthermore, domain shuffling events could also allow cross-communication between functional modules and pathways. This could be achieved by the origin, mediated by domain shuffling, of a new protein that serves as a bridge between two functional modules or pathways. These connecting elements may facilitate the evolution of complex phenotypic traits.

More recently, Chen et al. 21 reported that a significant fraction of new genes in the Drosophila genus are essential, despite their relatively recent appearance. Many of these “new” genes are actually chimeric; products of either illegitimate recombination or retroposition. These authors argue, based on simulations using PPI networks, that the essential nature of these “new” genes is likely due to their effect on biological networks. It remains to be determined if the acquisition of interacting domains by these genes is the main driving force behind their effect on network topology.

It is reasonable to envisage that domain shuffling may have an important role in the evolution of other types of biological networks. For example, in a genome-wide analysis of domain shuffling across different organisms, Kawashima et al. 22 found several transcription factors to be vertebrate-specific products of domain shuffling. Trans-activation domains in these proteins seemed to have been acquired by domain shuffling. The same conclusion was reached by Rodrigues-Caso et al. 23 in their analysis of the human transcription factor PPI network. The spread of trans-activation domains, as well as other types of DNA-binding domains, would have a significant impact on the evolution of genetic regulatory networks.


It is generally believed that domain shuffling speeds up evolution by creating new proteins in a rapid fashion; an argument originally developed by Gilbert 14 for exon shuffling. Recent evidence, discussed here, brings new perspectives to the field. Rather than simply creating new proteins, the shuffling of interacting domains is able to redefine the topology of biological networks either by creating hubs or increasing the interconnectivity of sub-networks. I envisage that the availability of genome sequences from many new species will allow the determination of the exact timings for the origins of domain shuffling events that created topological novelties in biological networks. Furthermore, the availability of sequences from many individuals within the same species will allow the identification of the evolutionary forces behind the emergence of these novelties. As a consequence, establishment of an association between genetic and phenotypic novelties will be facilitated.


This paper was written during my Tinker Visiting Professorship at the Department of Ecology and Evolution and at the Center for Latin American Studies, both at the University of Chicago. I am indebted to Manyuan Long and members of his group for exciting discussions on this topic.