- Top of page
Alterations in gene expression programmes are controlled by sequence-specific DNA-binding proteins that interact with the epigenetic regulatory machinery. The sum of such processes comprises a gene regulatory network and differentiation processes involve transitions between such networks. However, while great progress has been made to identify network components, this list is not complete, and we still do not fully understand how they work together. In this article, I argue that one reason for this lack of knowledge is the fact that we still do not understand what controls the cell stage and cell state-specific regulation of individual genes and review examples highlighting this notion.
Our understanding of how gene expression is controlled has been an intensive focus of research in the last decades. During this time, many of the principal advances have not been achieved by genome-wide studies, but by studying individual genes as models for a specific mode of gene regulation. This includes the egg-white protein genes of the chicken or the mouse mammary tumour virus as a model for steroid regulation, the globin genes as a model for gene regulation in erythroid cells and the immunoglobulin genes as B-cell-specific models, just to name a few (for recent reviews see refs [1-5]). Many different groups have painstakingly worked on many different genes and have discovered an astonishing complexity of principles involved in gene regulation.
We have learnt a great deal from these single-gene studies. We obtained insight into which factors are involved in controlling tissue-specific gene regulation, although this picture is far from complete. We have discovered that development and transcriptional regulation of cell-type-specific genes are intimately linked. We discovered sequence-specific DNA-binding proteins and how they interact with the chromatin template. We now know that transcription factors can be activators and repressors depending on the genomic context. We found out that the methylation of DNA and the chromatin landscape participate in gene regulation and that transcription factors recruit co-factors to change this landscape. We learned that genes from higher eukaryotes are regulated by scattered cis-regulatory elements and that these elements interact with each other in physical space. We discovered the mechanisms of how signals are transmitted into the nucleus and how they impact on gene expression profiles and cell fate decisions. Only recently, we learned that enhancers can generate non-coding (nc) RNAs and that these RNAs have an impact on the expression of neighbouring genes and/or recruit members of the epigenetic regulatory machinery to these genes. We discovered long ncRNAs, microRNAs, Piwi-interacting RNAs, tinyRNAs, all of which impact on cellular states, and which in combination make our cells a dynamic and complex entity that reflects the complexity of the entire organism. To address these issues, researchers developed techniques such as reporter gene assays, chromatin-immunoprecipitation (ChIP) studies or DNA-sequence manipulation techniques and established cellular and organismal model systems.
In essence, all of these studies were aimed at answering the central question, how all of the different mechanisms listed above translate into gene regulatory networks and so into cellular phenotypes. Armed with the knowledge described above, we have embarked on an exciting new journey where we study the actions of different factors in a global fashion. These studies have confirmed what we have inferred from single-gene studies. Have these studies answered the central question? I would argue: not yet. The reason for this is that genome-wide studies so far have only looked for factors that are already known from single-gene studies. Yet, we still do not fully understand how individual genes actually function, as there are a multitude of factors employing a variety of regulatory mechanisms and individual genes use all combinations of these. The frequency of mRNA production can be stochastic, can be controlled by a circadian clock or may be connected to the cell cycle and individual cells may respond differently to external signals. Gene expression could be controlled by regulating elongation, attenuation/pausing of RNA-polymerase, by non-coding RNA and by the chromatin landscape. Genes can display molecular memory where actual gene expression is controlled by past environmental exposures as shown by elegant studies in plants; they can be regulated by cell-type-specific signalling regulating the activity of specific transcription factors, the control of factor levels by auto-regulation and microRNAs, the control of factor activity by post-translational modification and binding-site affinities on the DNA. The binding stability of transcriptional regulators to DNA can be governed via protein–protein interactions, nuclear localization and the control of cis-element interaction. For all of these mechanisms, we have not yet found a reliable set of predictive rules that would allow us to know the precise molecular interactions predicting the behaviour of one specific gene in one specific regulatory context. Let alone how all genes work together (for an excellent discussion on the topic see ref. . In essence, the gene regulation field has matured into the realm of high complexity. This is always bad, because it impacts on funding and publication. Neither funding bodies nor journals like studies that are too detailed and too specialized. However, I would argue that we must not give up on such studies and must retain the skills that make them possible. There are so many more basic principles to work out. I illustrate this notion in a few examples.
Years ago I heard a talk about one of the many biomarkers that predicted clinical outcome in cancer. The speaker praised proteomics and announced that we now could do away with gene regulation studies to study cancer cells in the future, as proteomics would tell us about the actual effectors that are de-regulated in such cells. He went on to describe a spot on a two-dimensional gel that correlated with bad prognosis, and identified this spot as HIF1α, a transcription factor. What can we learn from this anecdote? First, thanks to the people working on single-gene regulation who discovered HIF1α as a hypoxia-induced factor up-regulating the erythropoietin receptor gene and later showing that this is a general response in most cells we know what this spot on the gel actually does. Second, a MedLine search of papers published on HIF-1α and cancer yielded more than 3500 hits and we now know a great deal about how it influences the expression of many different genes. However, we still do not really know the exact mechanism by which HIF-1α contributes to tumour pathology. It is obvious that such mechanistic insight is the key to any therapeutic intervention – i.e. we would like to surgically switch such a specific tumour-associated factor on or off, but do not know how. We need studies using well-characterized single-gene model systems to work this out.
We currently are also unable to predict how specific perturbations in the genome influence a gene regulatory network. This is exemplified by a large number of genome-wide association studies that map DNA-sequence changes (single nucleotide polymorphisms; SNPs) in people with various diseases and ask the question whether these changes in some way contribute to the disease phenotype and associate with clinical prognosis. Surprisingly, the vast majority of these SNPs mapped to sequences outside the coding region. This leaves us with the baffling scenario that we see an effect of regulatory sequence alterations at the systemic level, that in some way alter the expression of the associated gene, but we have no idea about the mechanism by which these systemic changes come about. Moreover, even if such SNPs alter transcription factor-binding sites, their regulatory consequences are difficult to predict. For most genes, we do not have precise information about how their cis-regulatory elements are regulated in development, which factors bind and function in specific cell types, respond to signals and drive gene expression. What we ideally would like to do is to take such genes, and study in detail the effects of single base-pair changes in cis-regulatory elements on the developmental regulation of this gene and on the systemic impact of deregulation. This has been done only for a handful of genes, and only at a very basic level. An example is the Pu.1 (Sfpi1) locus, which encodes a transcription factor that is crucial for the development of myeloid cells. At this locus, an SNP within an essential cis-element alters the temporal regulation of Pu.1 expression during differentiation. An SNP at another binding site within the same element is correlated with leukaemia in humans, but we do not really know why. It is likely that reduced levels of this factor lead to temporal disturbances at downstream regulator genes but to interpret such results we need to have some ideas of critical levels below which factor complexes fall apart. What we would really need are biochemical experiments that define how interactions between the multitude of factors binding to a specific gene define its cell-specific expression status and devise rules that then can serve as models for global studies. However, I dare say that trying to obtain funding for such studies will be an uphill battle. Thanks to the major journals being forever on the hunt for ‘new concepts’ and neglecting people who work out the mechanistic details of such concepts (and debunk many of them on the way), it will also be more and more difficult to find people who have the skills and the willingness to actually perform such experiments.
While we are busy describing the systemic features of individual cell types, we still do not know all the rules governing the temporal regulation of gene expression during development. Very few individual genes have actually been studied at this dynamic level, and I would argue that we have little idea of how genes respond to and regulate transitions between different cell types, many of which are regulated by signalling processes. For the few genes that have been allowed to be studied in more detail, such as the a-globin locus,[1, 12] a breathtakingly complex picture emerges with basically the entire gene regulatory machinery, including non-coding RNAs, being involved in making sure that this gene is expressed at the right level, in the right cell and at the right time. An even more complex picture emerges for genes that are regulators themselves. An example is again the Pu.1 locus. The statement above also applies to this gene, with the additional feature that the product of this gene auto-regulates itself and where expression levels are crucial for the balanced formation of mature myeloid cells. Genetic studies demonstrated that such auto-regulatory loops at regulator genes are a common feature of developmental pathways, again indicating that parameters such as binding-site affinities and concentration dependence of factor interactions are an essential part of dynamic gene expression control. That this is a real scenario was recently confirmed. Moreover, transcription factors and the epigenetic regulatory machinery themselves are subject to signalling-dependent regulation of their activity by an ever-increasing zoo of post-translational modifications. We know very little of how these modifications impact on a given regulatory module and the interaction of different regulatory modules, i.e. single cis-regulatory elements within a single gene locus or a defined genomic neighbourhood.
Last, but not least, we have not even touched upon the complexity of how non-coding RNAs impact on developmental processes. The DNA and the RNA-world are intricately connected as RNA molecules can target specific genes and recruit protein complexes, but at the moment we only have very rudimentary ideas of how DNA and non-coding RNAs interact and how this impacts on gene regulation. Moreover, the act of transcription itself can alter the regulatory output of a gene by altering its chromatin architecture. Such basic mechanisms still need to be worked out, and it requires model gene studies to do this. I would argue that the more parameters we define at the mechanistic level using bottom-up studies, the more predictive our system-wide studies are going to be. Once we know what to look for, we will be able to connect single genes to gene regulatory networks, and we can combine top-down modelling approaches with perturbation studies to try and get an idea of why a cellular system behaves one way or another. We should not delude ourselves that measuring single or even several genome-wide features will be a safe path to personalized medicine or allow us to reprogramme the epigenome in a controlled fashion. We still have a long way to go and we should value those who dig deep and walk the long walk.