• Open Access

A general strategy for cellular reprogramming: The importance of transcription factor cross-repression


  • Isaac Crespo,

    1. Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg, Luxembourg
    Search for more papers by this author
  • Antonio del Sol

    Corresponding author
    1. Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Luxembourg, Luxembourg
    • Correspondence: Antonio del Sol, Ph.D., Luxembourg Centre for Systems Biomedicine (LCSB), 7, avenue des Hauts-Fourneaux, L-4362 Esch-sur-Alzette, Luxembourg. Telephone: (+352) 46 66 44 6982; Fax: (+352) 46 66 44 6982; e-mail: antonio.delsol@uni.lu

    Search for more papers by this author

  • Author contributions: I. C. and A.d.S.: conceived the idea for the paper and contributed to writing the paper; I. C. wrote software, performed the experiments, and analyzed the data; A.d.S.: coordinated and supervised the project.


Transcription factor cross-repression is an important concept in cellular differentiation. A bistable toggle switch constitutes a molecular mechanism that determines cellular commitment and provides stability to transcriptional programs of binary cell fate choices. Experiments support that perturbations of these toggle switches can interconvert these binary cell fate choices, suggesting potential reprogramming strategies. However, more complex types of cellular transitions could involve perturbations of combinations of different types of multistable motifs. Here, we introduce a method that generalizes the concept of transcription factor cross-repression to systematically predict sets of genes, whose perturbations induce cellular transitions between any given pair of cell types. Furthermore, to our knowledge, this is the first method that systematically makes these predictions without prior knowledge of potential candidate genes and pathways involved, providing guidance on systems where little is known. Given the increasing interest of cellular reprogramming in medicine and basic research, our method represents a useful computational methodology to assist researchers in the field in designing experimental strategies. Stem Cells 2013;31:2127–2135


The central role of transcription factor (TF) cross-repression determining cell fate is one of the most important concepts emerged from years of lineage differentiation research [1-4]. In its simplest formulation, two regulators that negatively influence each other establish a bistable “toggle switch,” readily explaining the two mutual exclusive cell fate outcomes. More complicated schemes also include TFs autoregulation and antagonistic cross-regulation of target genes. Several examples of these binary cell fate choice mechanisms have emerged in the last 10 years [5-14]. Integration of this knowledge can be represented in a binary decision tree from embryonic stem cells to differentiated cells passing by different progenitors [1] (Fig. 1). This tree defines distinct paths between different cell types in a Waddington's landscape [15-17], where different cell types can be interpreted as steady stable states of cellular gene regulatory networks termed as attractors. Cross-repression motifs not only determine binary decisions in the tree, but based on their bistable behavior, characterized by mutually exclusive gene expression states, they also play a key role in the stability of each possible cell fate. Furthermore, experimental evidences have demonstrated that perturbations of genes belonging to these motifs are able to trigger transitions between these binary cell fate choices [18, 19]. Indeed, although attractor's stability is determined by a regulatory core composed of one or several interconnected positive feedback loops, known as positive circuits [20], these cross-antagonistic motifs are shown to be localized on the top of the hierarchical organization of the set of positive circuits, whose attractor states change from one binary cell choice to the other. Hence these motifs constitute master switches between binary cell fate choices (intralineage transdifferentiation). The strategy of perturbing top positive circuits in such hierarchical organization can be extended to transitions between any given pairs of cellular phenotypes even if they are not derived from a direct common progenitor. In particular, these transitions can include other types of cellular reprogramming, that is, the transition of a differentiated cell to another cell type, either to a progenitor cell (dedifferentiation) or to another differentiated cell type coming from a different progenitor cell (interlineage transdifferentiation). In these cases, a more complex set of positive circuits with mutually exclusive gene expression stable states could determine these transitions. This strategy leads to the identification of a small number of genes (reprogramming determinants) triggering the transitions between different cellular phenotypes. Indeed, in the last decade several labs have experimentally demonstrated that despite differences of cell types in the expression of thousands of genes, perturbation of few reprogramming determinants are usually able to trigger cellular transitions from one stable cellular phenotype to another [21-23]. Nevertheless, these experiments [24, 25] have relied on a brute force search of effective cocktails of TFs to achieve desired cellular transitions, and therefore, due to the combinatorial complexity of this problem, they constitute a time- and resource-consuming strategy. Hence, this fact together with the increasing interest in cellular reprogramming urge to develop strategies to systematically identify optimal combinations of reprogramming determinants capable of inducing cellular transitions. A number of computational models aiming at understanding cell fate and reprogramming have been proposed in literature [24-29]. They attempt to model the dynamic behavior of specific parts of the gene regulatory network (GRN) that govern the dynamics of a larger network. Although these models give some insights into the relevant network motifs in cell fate decisions, they are usually quite complex, relying on large number of input parameters and constraints, and only consider small fractions of previously known genes to model the regulatory mechanism, and most importantly, they do not provide a systematic platform to identify key regulatory motifs that guarantee cellular stability and are likely to be involved in the transitions between different stable cellular states. One step forward in this direction is the methodology developed by Chang et al. [25] to test, compare and rank different recipes based on their simulated efficiency and fidelity to reprogram somatic cells to induced pluripotent stem cells (iPS) in a model that considers certain level of stochasticity. However, this methodology lacks any strategy to look for better combinations or to improve the efficiency and fidelity and relies on a preliminary list of candidate genes both for the network reconstruction process and the selection of combinations to test.

Figure 1.

Cell identity cascading landscape representing the cellular transcriptional program. Paths between pluripotent and differentiated cells, representing cellular differentiation process pass through stable expression profiles corresponding to multipotent progenitors. Binary cell fate decisions at multipotent progenitor level are characterized by cross-repression motifs of competing transcription factors. Transdifferentiation between somatic cells are divided in those sharing a direct precursor cell (intralineage transdifferentiation), where cross-repression motifs, which determine cell fate decision, play a key role in stabilizing binary cell decisions and transitions between them; and those without a direct precursor (interlineage transdifferentiation), characterized by a more complex molecular mechanism underlying cellular transitions. Blue and red colors in cross-repression motifs and GRN stability core represent mutually excluding expression states for a given pair of cellular phenotypes, standing for downregulation and upregulation respectively. “→” represents activation or positive regulation and “|” represents inhibition or negative regulation. Abbreviations: ES cells, embryonic stem cells; GRN, gene regulatory network.

Here, we propose a cellular transition-dependent method that identifies candidates for reprogramming determinants by focusing on stability motifs in gene regulatory networks. Given that the approach does not require a preliminary list of candidates, it can be applied to biological systems without prior knowledge on it. Our method initially searches for differentially expressed positive circuits (DEPCs), for which the expression levels of their genes change between two different cellular phenotypes. Furthermore, a hierarchical organization of these circuits is analyzed in order to identify master regulatory positive circuits, which directly or indirectly regulate the states of the other DEPCs.

Finally, given the stochastic nature of molecular interactions and abundances in gene regulatory networks affecting cellular reprogramming efficiency and fidelity, we use a previously introduced network topological characteristic termed retroactivity [30], which positively correlates with expression noise [31], in order to detect combinations of genes in master regulatory DEPCs that are more affected by expression noise and need to be controlled in order to minimize information loss during signal transmission in gene regulatory networks. These gene combinations are the best candidates for reprogramming determinants according to our model.

We selected three representative biological examples of cellular reprogramming with experimental information on reprogramming determinants inducing effective transitions between cellular phenotypes in order to assess the applicability of our method. These examples are the transdifferentiation from T-helper lymphocyte Th2 to Th1 (intralineage transdifferentiation), from myeloid to erythroid cells (interlineage transdifferentiation), and from fibroblast to hepatocyte (distant interlineage transdifferentiation). In the Th2-Th1 example, we identified GATA3 and T-bet as potential inducers of Th2 to Th1 T-helper transdifferentiation, which is in full agreement with previously reported experimental observations [32, 33]. Our results showed that cells committed to become megakaryocytes or erythrocytes in the erythroid lineage can be reprogrammed to the myeloid lineage and become granulocytes or macrophages by perturbation of a single reprogramming determinant, that is, the activation of GATA1. This induced transition has been experimentally validated [19]. Finally, the application of our method to the example of fibroblast to hepatocyte reprogramming allowed us to detect combinations of reprogramming determinants that induce this cellular transition. Among these detected combinations, the combined activation of HNF4 and FOXA2 has been experimentally validated by the work of Sekiya and Suzuki published in 2011 [34].

In conclusion, here we propose, to our knowledge, the first method that systematically identifies combinations of genes (reprogramming determinants), which are potentially capable of inducing transitions between specific pairs of cellular phenotypes, without prior knowledge of possible candidates for reprogramming determinants. Our method generalizes the principle of TF cross-repression in binary lineage decisions in the sense that it searches for master regulatory positive circuits, which contribute to the stability of cellular gene regulatory networks, and whose genes are differentially expressed with respect to specific pairs of cellular phenotypes. Perturbations of combinations of genes belonging to these circuits that swap their steady stable states are expected to induce transitions between these phenotypes. We believe that considering the increasing interest of the research community in using cellular reprogramming in the establishment of cell disease models and regenerative medicine, our method constitutes a useful computational protocol that aims to assist researchers in the field in designing experimental strategies.


A popular framework for conceptualizing and describing cellular transitions is that of the landscapes proposed by Waddington [15-17], where cellular phenotypes may be seen as stable steady states (termed as attractors) of GRNs represented as wells separated by the so-called epigenetic barriers. These barriers are established by those elements stabilizing GRNs in their attractors. Given that cellular reprogramming implies a transition between two cellular stable transcriptional programs (two attractors of the GRN), it is necessary that the corresponding GRN was at least bistable. The presence of positive circuits or positive feedback loops (the sign of a circuit is defined by the product of the signs of its edges, being activation positive and inhibition negative) in a GRN is a necessary condition for the existence of at least two attractors (multistability) [20]. Hence, some of the positive circuits constitute the stability elements of the GRN. In particular, there are positive circuits whose genes are differentially expressed between two given attractors. By swapping the states of these circuits it should be possible to induce transitions from one attractor to another, similarly to how transitions between cell types derived from a common progenitor cell can be induced by swapping the states of cross-repression motifs. Given the stochastic nature of molecular interactions in GRNs, perturbations of different combinations of genes belonging to these positive circuits can trigger these transitions with different efficacy.

Description of the Method

Here, we propose a method to design reprogramming protocols based on the topological relationship between the elements involved in the stabilization of specific attractors. The hierarchical organization analysis of strongly connected components (SCCs) formed by one or more DEPCs allows us to identify combinations of genes belonging to master regulatory DEPCs that should be perturbed in order to directly or indirectly target all DEPCs and consequently to induce specific cellular transitions. Finally, we select among these combinations of genes those with highest interface out-degree that refers to the number of genes that are directly regulated by them. The reason for this step is to minimize the retroactivity effect on master regulatory circuits [30, 31], which considers the increased time response of these circuits after noise or external perturbations. This allows us to minimize the expression noise due to retroactivity contextualized to the specific cellular transition under study. In other words, we select combinations of genes participating in more transcriptional regulation events in order to minimize DEPCs time response and the stochastic behavior of GRN under perturbation and therefore to minimize information loss during signal transmission. This strategy allows us to narrow down a huge combinatorial searching problem to a set of minimal combinations that constitutes alternative reprogramming protocols and the output of our method.

The method can be described with the following three steps, which are shown in Figure 2:

  1. Detecting master regulatory SCCs.
  2. Determining master regulatory DEPCs for each master regulatory SCC.
  3. Detecting reprogramming determinant genes within master regulatory circuits.
Figure 2.

Design of cellular reprogramming protocol in three steps. (A): Detecting master regulatory SCCs. In this first step, those positive circuits or positive feedback loops in the gene regulatory network (GRN) whose genes change their expression levels between two cellular phenotypes are selected from the population of network circuits. These DEPCs form SCCs. A hierarchical analysis in the space of these SCCs allows us to determine master regulatory SCCs. SCC 1 and 2 are located on the top of the hierarchy of the represented toy network without displaying connectivity between them. These SCCs should be independently perturbed to guarantee that the perturbation signal reaches every DEPC in the GRN. (B): Detecting master regulatory DEPCs. Within each master regulatory SCC, a master regulatory DEPC is determined based on a retroactivity score (interface out-degree) or, in other words, based on the number of genes directly regulated by this circuit. The master regulatory DEPC is the one with the highest interface-out degree. In this toy example, Circuit 1 (composed of genes “a,” “b,” and “c”) is the master regulatory DEPC of the SCC 1, and Circuit 1 (composed of genes “p” and “o”) of SCC 2 is the other master regulatory DEPC. These master regulatory DEPCs are colored in red in the retroactivity ranking table. (C): Detecting reprogramming determinants. Once the master regulatory DEPCs have been determined, the selection of final reprogramming determinants is based on maximizing the sum of individual gene interface out-degrees included in the combination. In this toy example, gene “a” is the one with highest retroactivity within the Circuit 1 of the SCC 1. Similarly, gene “p” has the highest interface out-degree in its respective circuit and SCC. Therefore, the reprogramming determinants are “a” and “b” (both should be perturbed to induce the hypothetical cellular transition). Blue and red colors in network nodes represent mutually excluding expression states for a given pair of cellular phenotypes, standing for downregulation and upregulation, respectively. “→” represents activation or positive regulation and “|” represents inhibition or negative regulation. Abbreviations: DEPC, differentially expressed positive circuit; SCC, strongly connected component.

Detecting Master Regulatory SCCs

In order to detect master regulatory SCCs or clusters of DEPCs that should be independently perturbed, it is necessary to detect and list all positive circuits or positive regulatory feedback loops. We also need to identify network attractors corresponding to the two phenotypes of the cellular transition under interest. Once we have this information we proceed to determine, among the entire set of positive circuits, which are DEPCs for this specific cellular transition, meaning that the expression levels of their genes change between involved cellular phenotypes. These DEPCs can be clustered forming SCCs, and these SCCs (if there is more than one) can be interconnected. In order to detect which are the SCCs that should be independently perturbed to guarantee that all DEPCs are reached by the perturbation signal, we analyze the hierarchical organization of SCCs formed by DEPCs. It is worth stressing that this hierarchical organization is cellular transition dependent since it is based on positive circuits that change between initial and final cellular phenotypes (See Methods section for details about the circuit's detection, attractor computation, and hierarchical analysis).

Determining the Master Regulatory DEPCs for Each Master Regulatory SCC

DEPC with higher degree interface is considered the master regulatory circuit of each specific SCC. The degree interface of a circuit is the count of genes directly regulated by genes belonging to the circuit. These DEPCs' master regulators should be independently perturbed in order to induce the desired cellular transition and minimal combinations of genes able to target all master regulatory DEPCs equal in number to the number of such DEPCs. In other words, the perturbation of one gene per master regulatory DEPCs is required. Since different minimal combinations (equal in number) can arise from this procedure, we aim to select the best combinations according to retroactivity contribution criteria. It is worth stressing that despite the degree interface could be calculated for any circuit in the GRN, the method only pay attention on those genes that belong to DEPCs when comparing two attractors, given that they are the ones that are going to be destabilized and restabilized in the original and final attractor, respectively.

Detecting Reprogramming Determinant Genes

Identification of genes belonging to DEPCs master regulators with maximum gene degree interface means that they are the most regulatory genes, and therefore are mainly responsible for DEPCs retroactivity. This set of genes constitutes the reprogramming determinants. If more than one combination of reprogramming determinant candidates equal in number of genes and interface out-degree, all of them are considered reprogramming determinants according to our model, and they constitute alternative solutions.

Application of the Method to Three Illustrative Biological Examples

We selected three different biological examples of cellular reprogramming in order to illustrate and validate the applicability of our method as generalization of TF cross-repression concept in illustrative biological cases. These examples provide an experimental validation of the identified sets of reprogramming determinants as effective inducers of transitions between cellular phenotypes. The Th2-Th1 and Myeloid-Erythroid examples are based on GRNs previously published by Mendoza [35] and Krumsiek et al. and Dore and Crispino [36, 37], respectively. These two networks were constructed to describe the differentiation process of the corresponding human cell types. We showed that the appropriate perturbations of these networks allow inducing transdifferentiation between cell types with the same cellular precursor. The mouse fibroblast-hepatocyte reprogramming example illustrates the case of a cellular transition between two cell types that do not share the same direct cellular precursor. In this case, we reconstructed a literature-based GRN of differentially expressed genes (DEG) between both cell types [38]. This network was contextualized by an iterative network pruning described in the Methods section and previously published [39]. This contextualized network is specific for the cellular transition under study and therefore suitable to describe input–output relationships or network response under specific perturbations for a given initial network stable state (stable expression pattern).

The networks for the three examples were enriched when it was possible with information about micro RNAS (miRNAs) interactions experimentally validated and publicly available [40, 41]. Details about GRN for these three biological examples are included in Methods section and Supporting Information.


T lymphocytes are classified as either T helper cells or T cytotoxic cells. T helper cells take part in cell- and antibody-mediated immune responses, and they are subdivided in Th0 (precursor) and effector Th1 and Th2 cells depending on the array of cytokines that they secrete [42]. T-helper differentiation network determines the fate of the T-Helper lineage [35], with three different attractors corresponding with the three different phenotypes (Th0, Th1, and Th2). We applied our method on a GRN previously published [35], which represents the regulatory mechanisms determining T-helper basic types. This network includes T-bet and GATA-3 forming a cross-repression motif responsible for the differentiation either to Th1 or to Th2 from a common precursor (Th0). We applied our method in order to detect reprogramming determinants for the Th2-Th1 transdifferentiation. The SCCs hierarchy analysis followed by the maximum retroactivity criteria allowed us to identify one master regulatory SCC with one master regulatory DEPC (named as circuit 16 in Fig. 3A and supplements) among five DEPCs of this specific cellular transition. Circuit 16 corresponds to the positive feedback loop formed by GATA-3, T-bet, SOCS-1, IL-4R, and STAT-6. The interface out-degree of this circuit is 11, resulting of the sum of interface out-degree of all genes belonging to it. Within this DEPC master regulator, there are two genes with equal contribution to the circuit degree interface: GATA-3 and T-bet have a degree interface of 4. According to the methodology presented here, both GATA-3 and T-bet constitute independent reprogramming determinants, by inactivation and activation, respectively. The predicted capability of T-bet to induce the transition from Th2 to Th1 is in full agreement with reported experimental results [18]. To our knowledge, there is no experimental evidence of either the capability or incapability of GATA3 to induce the transition from Th2 to Th1 when inactivated.

Figure 3.

Reprogramming determinants in three illustrative biological examples. (A) Th2-Th1 reprogramming. Activation of T-bet and, alternatively, inhibition of GATA-3 are predicted as effective perturbations to induce this cellular transition. (B): Cellular reprogramming from myeloid to erythroid cells. Both, activation of GATA-1 or inhibition of PU.1 are predicted as independently able to induce this cellular transition. (C): Cellular reprogramming from fibroblast to hepatocyte. In this particular case no single gene is able to induce the cellular transdifferentiation according to our predictions. On the other hand, combined activation of HNF4A and FOXA2 is predicted as an effective combination of reprogramming determinants. Blue and red colors in network nodes represent mutually excluding expression states for a given pair of cellular phenotypes, standing for downregulation and upregulation respectively. “→” represents activation or positive regulation and “|”represents inhibition or negative regulation. Abbreviations: DEPC, differentially expressed positive circuit; GRN, gene regulatory network; SCC, strongly connected component.

It is worth mentioning that the cross-repression motif responsible for the binary cell decision between Th1 and Th2 from the precursor Th0 is embedded in the master regulatory SCC, and the detected master regulatory DEPC, named as circuit 16, is composed of the two genes forming the cross-repression motif. This example illustrates how a motif responsible for cell fate decision can also participate in the derived cellular phenotypes stabilization and how its proper perturbation can trigger transitions between them.


Within the hematopoiesis, there are several binary decisions from multipotent stem cells to different type of blood cells. One of these decisions, the one determining if multipotent stem cells become erythroid (later erythrocytes and megakaryocytes) or myeloid precursor cells (later macrophages and granulocytes) requires the participation of the TF cross-repression motif including GATA-1 and PU.1. As it is shown in Figure 3A, the application of our method on a GRN previously published [36, 37], containing this motif embedded and connected with other multistable motifs allowed us to identify GATA-1 as a reprogramming gene able to induce the transition from myeloid to erythroid precursor cells. This finding is in full agreement with the experimental results obtained by Heyworth et al. [19], where the authors reported that myeloid precursors infected with an inducible form of GATA-1 generated erythroid colonies when GATA-1 was induced. In Figure 3B, it is shown that in this example we found a single master regulatory circuit, named as Circuit 12, with an interface out-degree of 8, which is formed by the mutual inhibition between GATA-1 and PU.1. In this particular case, we obtained two possibilities with identical gene degree interface of 4: activation of GATA-1 and inhibition of PU.1. The activation of GATA-1 refers to the experiment performed by Heyworth et al. [19]. To our knowledge, there is no experimental evidence to support that the inhibition of PU.1 is neither able nor unable to produce the same effect yet. As in the previous example, here we observe how a cross-repression motif not only participates in binary cell fate decision but also can be exploited to respecify the cellular commitment in cells sharing the same precursor.


Normally, hepatocytes differentiate from hepatic progenitor cells to form the liver during the regular development. However, hepatic programs can also be activated in different cells under particular stimuli or fusion with hepatocytes. The transition from mouse fibroblasts to hepatocyte-like cells induced by the perturbation of specific combinations of TFs has been previously reported by several authors [34, 38]. As it is shown in the table included in Figure 3C, in this case the SCCs hierarchical analysis allowed us to identify two master regulatory SCCs, one including circuit 2 (including NR5A2 and FOXA2) and one including circuits 0, 7, and 4 (including genes AGT, PPARGC1A, UCP2, and HNF4A). Within the latter SCC, the DEPC, named as circuit 0, is the one with the highest interface out-degree of 20. Then, we proceeded to identify reprogramming determinants by targeting both master regulatory circuits. Within circuit 2, the gene that contributes the most to the circuit retroactivity is FOXA2, with an interface out-degree of 5. Within the circuit 0, HNF4A is the one with the highest contribution to the circuit retroactivity with an interface out-degree of 9. Therefore, the final combination of reprogramming determinants is HNF4A and FOXA2. Both genes should be activated to trigger the transition from fibroblast to hepatocyte. This result is supported by the work of Sekiya and Suzuki published in 2011 [34]. These authors experimentally validated three different combinations of two TFs able to induce the transition from mouse fibroblast to hepatocyte, including HNF4A and FOXA2. This cellular transition constitutes a good example of reprogramming cells without a common direct precursor (interlineage transdifferentiation). Details about attractors, circuits, and genes interface out-degree for the three biological examples are included in the Supporting Information.


Cellular reprogramming, including the conversion of one differentiated cell type to another (transdifferentiation) or to a more immature cell (dedifferentiation), constitutes an invaluable tool for studying cellular changes during development and differentiation and has an enormous relevance for regenerative medicine and disease modeling. Although, substantial progress has been made in developing experimental reprogramming techniques, to date the scientific community is still faced with challenges such as the identification of optimal sets of genes whose repression and/or activation are capable of reprogramming one cell type to another (reprogramming determinants), and the elucidation of molecular changes and relevant pathways involved in these transitions [9]. Furthermore, there is currently no methodology able to systematically predict reprogramming determinants that could guide the design of cellular reprogramming experiments. The development of computational models of transcriptional regulation that underlies cellular transitions would help to predict these reprogramming determinants. Moreover, the analysis of gene regulatory network properties has allowed the identification of functionally relevant motifs of interactions that could play a role in cellular transitions. In particular, TF cross-antagonism has been described as a mechanism that plays a key role in cell fate decisions. A bistable toggle switch constitutes a molecular cross-repression motif that determines cellular commitment and provides stability to gene regulatory networks underlying transcriptional programs of binary decision cell choices. Experimental evidences indicate that flipping the stable states of these toggle switches produces interconversion between binary decision choices. Nevertheless, interlineage transdiferentiation and dedifferentiation could involve perturbation of combinations of cross-repression motifs together with other multistable motifs. Here, we propose a method, which considers the connectivity of these different multistable motifs, in order to systematically identify sets of reprogramming determinants able to induce transitions from differentiated cells to other cell types, either to progenitor cells (dedifferentiation) or to other differentiated cell types (transdifferentiation). Our strategy rests on the identification of a subset of all network positive circuits (necessary condition for network multistability), whose genes are differentially expressed between the cellular states involved in these. We termed this subset as DEPC. Furthermore, a hierarchical organization of these circuits allows us to detect master regulatory positive circuits, which directly or indirectly regulate the states of the other DEPCs. By focusing on genes belonging to these master regulatory circuits, we dramatically reduced the number of possible combinations of reprogramming determinants.

However, some of these gene combinations in master regulatory DEPCs are more influenced by expression noise, affecting signal transmission in gene regulatory networks and consequently decreasing reprogramming efficiency and fidelity. This is due to the fact that they are participating in a bigger number of regulations, so a limited concentration of the gene product has to interact with several targets a part from the one that closes the DEPC. In other words, the gene product has to distribute to different regulated targets, so the probability that the DEPC signal feedback is broken by chance is higher (neglecting considerations about different molecular affinities that are assumed similar). Hence, in order to increase signal transmission our method proposes these gene combinations as reprogramming determinants. It is worth mentioning that we have considered in our model some of the important events influencing reprogramming efficiency and fidelity, such as the role of noise in network dynamics and the regulatory interactions played by miRNAs. However, other factors, such as epigenetic modifications that block activation of certain genes can affect the expected network behavior after specific perturbations. Furthermore, it has been experimentally shown that epigenetic modifications can prevent cellular reprogramming reversibility in some cases [43]. In addition, our model does not take into account different delays in time response of distinct regulatory interactions. Nevertheless, given that the purpose of our method is the identification of reprogramming determinants, rather than a detailed description of network dynamics, we consider that our model provides reasonable predictions. More accurate predictions shall require addressing these considerations in the future.

Interestingly, despite there was no methodological constraint or theoretical limitation to prevent that genes nontranscription factor are reprogramming determinants, to date, in a blind application of the method, TFs always came up as reprogramming determinants. It is worth mentioning that applicability of the method presented here is restricted to cellular transitions between stable states or stable expression patterns and constraint by the availability of information to reconstruct the corresponding GRN, as it is explained in more detailed in Methods section.


Networks Reconstruction

Among the selected biological examples, Th2-Th1 and Myeloid-Erythroid reprogramming illustrate the case of transdifferentiation between two cell types sharing a direct common precursor. We based our analysis on previously published GRNs describing the regular differentiation process of T-helper and cell fate decisions during hematopoiesis [35-37]. These two published networks were enriched with miRNA interactions experimentally validated and publicly available in two different databases: TransmiR [40] and miRTarBase [41], including information about miRNA regulatory genes and miRNA regulated genes, respectively. Only miRNA forming closed loops with network genes and, therefore, able to affect the stability of the network were included (Table 1).

Table 1. miRNAs included in the biological examples
  1. “→” represents activation and “-|” represents inhibition.

  2. Abbreviation: miRNA, micro RNA.

mir-145IFN-B → mir-145
 mir-145 -| STAT1
mir-34amir-34A -| PU.1
 CEBPA → mir-34A
mir-155mir-155 -| FLI1
 PU.1 → mir-155
 mir-155 -| PU.1

The Fibroblast-Hepatocyte reprogramming example illustrates a distant (interlineage) cellular transdifferentiation. Therefore, no canonical previously published network can be exploited to detect the reprogramming determinants. Such reprogramming requires the reconstruction of a GRN contextualized to this specific cellular transition.

Given that the final goal is to induce the transition from one specific cell phenotype to one another, the network is constructed based on changing elements between these two states, that is, DEG between these two conditions or cell types obtained from microarray experiments. We scanned the literature and collected 24 genes known to play a relevant role in liver development and function and differentially expressed when comparing fibroblasts and hepatocytes according to previous works [44-47]. We proceed to try to connect these genes using interactions obtained from literature harvested from the entire PubMed. For this specific purpose, we used the information contained in the ResNet mammalian database from Ariadne Genomics (http://www.ariadnegenomics.com/). The ResNet database includes biological relationships and associations, which have been extracted from the biomedical literature using Ariadne's MedScan technology [48, 49]. More specifically, we included interactions annotated in the ResNet mammalian database in the category of Expression, PromotorBinding, and Regulation. In the Expression category, interactions indicates that the regulator changes the protein level of the target, by means of regulating its gene expression or protein stability. In the PromotorBinding category, interactions indicates that the regulator binds the promotor of the target. Finally, in the Regulation category, interactions indicates that the regulator changes the activity of the target. Similar resources for network reconstruction are the interactive pathway analysis (IPA) tool of Ingenuity Systems (http://www.ingenuity.com/) and the Transfac tool (http://www.biobase-international.com).

Once we had a raw GRN from literature, we proceed to remove interactions inconsistent with expression data by an iterative network pruning. These removals represent interactions apparently not active in the biological context under study. It should be taken into account that interactions from literature usually come from different biological contexts as cell types, tissues, or even species. This network pruning allows us to reduce the amount of “false” interactions and to obtain a contextualized network. The algorithm applied for this network pruning [39] was originally conceived to predict missing expression values in gene regulatory network but could be applied to contextualize the network when all the expression values in two given cellular phenotypes or stable transcriptional programs are known. Basically, the algorithm exploits the consistency between predicted and known stable states from experimental data to guide the iterative network pruning that contextualizes the network to the biological conditions under which the expression data were obtained. This process implies the booleanization of cellular phenotypes coming from experimental expression data; genes considered as upregulated and downregulated for a given p value (usually <.05 for a regular t test) are assumed as “1” and “0,” respectively. This is due to the fact that a Boolean model is assumed to compute network attractors. An evolutionary algorithm, more specifically an estimation of distributions algorithm [50] samples the probability distribution of positive feedback loops or positive circuits and individual interactions within the subpopulation of the best-scored networks at each iteration of the pruning algorithm. The resulting contextualized network is based not only on previous knowledge about local connectivity but also on a global network property (stability) providing robustness in predictions (the remaining set of interactions) against noisy sources of information and network incompleteness. Despite we tried to enrich this network with miRNA interactions as we did in the two previous examples, none of the miRNA involved in regulatory loops or circuits with genes differentially expressed were found experimentally validated for mouse. More details about network reconstruction process for the Fibroblast-Hepatocyte reprogramming example are included in the Supporting Information. Main properties of these three biological examples GRN are shown in Table 2.

Table 2. Main properties of the gene regulatory networks of the three biological examples
  1. Abbreviation: miRNA, micro RNA.

Myeloid- Erythroid133419152
Fibroblast- Hepatocyte275646100

Network Transformation in a Directed Acyclic Graph

The first step of the method, named as “Detecting master regulatory SCCs” in Results section, requires the hierarchical analysis of a subnetwork of the complete GRN including only DEPCs and all genes and interactions connecting them. This subnetwork contains positive feedback loops, so it should be transformed in order to be able to analyze its hierarchy. The transformation of this subnetwork of connected DEPCs in a directed acyclic graph (DAG) was performed by contraction of DEPCs strongly connected, that is, SCCs of DEG, in single super-nodes. This network transformation allows the hierarchical analysis of the network following the method described by Jothi et al. [51], resulting in the location of SCCs at different levels of hierarchy with the subsequent identification of master regulators SCCs on the top of the hierarchy pyramid.

During the application of this network transformation to the three examples included in this work, we also forced the method to work on differentially expressed negative circuits (DENC) instead of DEPCs to illustrate the failure of the method when a wrong stability element is considered. Interestingly, we could not found any single DENC in none of the three examples, despite the relative abundance of negative circuits in the three GRNs (17, 11, and 11 for Th2-Th1, Myeloid-Erythroid, and Fibroblast-Hepatocyte, respectively, whereas the corresponding number of positive circuits are 29, 25, and 19). Consequently, it was not possible to perform the network transformation in a DAG and the subsequent hierarchical analysis because there was no SCC of negative circuits to analyze. This finding is consistent with the role of positive circuits or positive feedback loops as cornerstone of multistable behavior in networks of interacting elements.

Circuits' Detection

The Johnsons algorithm [52] was implemented to detect all elementary feedback circuits in the network. A feedback circuit is a path in which the first and the last nodes are identical. A path is elementary if no node appears twice. A feedback circuit is elementary if no node but the first and the last appears twice. Once we have all elementary feedback circuits, we select positive feedback circuits, or feedback circuits for which the difference between the number of activating edges and the number of inhibiting edges is even. Both elementary feedback circuit detection, positive feedback circuits sorting and DEPFCs detection were implemented in Perl.

Attractor Computation

We assumed a Boolean model to compute attractors with a synchronous updating scheme [53] and using our own implementation [39] of the algorithm described by Garg et al. [54]. The logic rule applied by default is the following: if none of its inhibitors and at least one of its activators is active, then a gene becomes active; otherwise the gene is inactive. If different regulatory rules are known for specific genes, this knowledge can be included in the model. Results in the attractor computation were consistent with the results obtained using previously published software to compute attractors in Boolean systems (Boolnet [55], GenYsis [54]).

Minimal Input Data for the Method Usage and Limitations

Given that our methodology considers transitions between attractor states, it requires the availability of expression data of stable cellular phenotypes. In addition, if the GRN has been experimentally validated and its attractors are consistent with the cellular phenotypes under study, our methodology is readily to be applied. Otherwise, the GRN has to be reconstructed from publicly available data, and therefore the applicability of our methodology could be limited by the availability of information. In this case, the reliability of the resulting GRN can be estimated by evaluation of how well the stable states of this network coincide with the experimental expression data. We usually assumed a threshold of 70% to consider a GRN worth to be processed. For instance, in the Fibroblast-Hepatocyte example after the network contextualization process, the attractor computation of the resulting GRN revealed a matching with the expression data of 76% for both conditions (fibroblast and hepatocytes), meaning that 76% of gene expression values in the network are well predicted for these two conditions. The remaining 24% of the gene expression values are not well predicted due to two different possibilities: incompleteness of the network or wrong assumed regulatory rules in specific cases. It is worth noticing that our method for contextualizing GRNs rests on removal of inconsistent regulatory interactions rather than on the addition of new interactions, and therefore the possibility of adding new predicted interactions could improve the description of the expression data. This is a very interesting and very relevant point, and despite it is out of the scope of the present work, and the fact that it constitutes a challenging computational problem, it should be definitely pursued in order to improve our methodology.


The methodology presented here constitutes to our knowledge the first strategy that systematically provides lists of combinations of reprogramming determinants for cellular reprogramming events involving two given cellular phenotypes without prior knowledge on potential candidates and pathways involved. Due to that, the method is easily exportable to different biological systems, providing guidance even without having expertise in a biological process. In particular, this method is suitable for cellular transdifferentiation, especially when transitions occur between different cellular lineages. Indeed, interlineage transdifferentiation involves significant changes in several molecular mechanisms that increase the complexity of this type of reprogramming and therefore hinders the prediction of reprogramming determinants. Hence, given the increasing interest in various applications of cellular reprogramming in medicine and basic research, our method represents a useful computational methodology to assist researchers in the field in designing experimental strategies, especially when very little about a specific biological system is known.


This research was supported by Funding from Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg for open access charge.

Disclosure of Potential Conflicts of Interest

The authors indicate no potential conflicts of interest.