The emergence of information organization in biology



Systems of information organization are commonly understood to be human designed and human implemented, and are often synonymous with classification systems. This is true across all sorts of technologies, from library classification systems like the Decimal Classication (DC), to new systems like user-driven tagging. However, there is a broader view of information organization systems that encompasses biological systems of information (like the genetic code and the immune system) in addition to these human-engineered systems. Understanding the evolutionary principles and possible trajectories of all these systems – artificial and natural – will contribute to our understanding of human designed systems by presenting a much more general picture of the essential features of functional information systems.

We have observed the following properties of information organization systems, and take them as assumptions grounding the discussion that follows. First, the organization of information is inherently a balance between dynamic and stable properties, in that an organization system needs to be able to accommodate innovation and reflect change in its environment, while still having enduring utility. Second, information organization appears in many different contexts: at a biological level, organisms use organization to adaptive, functional advantage; formal classifications are used throughout LIS; and informal information organization systems are found in all aspects of the human environment around us. Third, information organization is an emergent phenomena – it hasn't “always been there”, but has emerged many times in many contexts, ranging from evolved organization in biology to engineered systems in LIS. In all cases, organization emerges in response to some need, and emerges as a collective phenomena. Finally, evidence from evolutionary genetics and from investigations of abstract models of language strongly suggests that information organizations of high dynamic complexity and robustness can evolve naturally – without planning or intent.

What is biological information organization?

Examples of biological information organization include the most fundamental cellular systems like the organization of the genome, to the most complex products of biological information – language and culture. These systems of organized information have been implicated as the crucial developments in the evolution of higher organisms capable of coordinated action and complex communications (Szathmary and Maynard-Smith, 1995). In the genome, organization is exhibited in gene ordering, clustering of related genes, and the structure of coding and non-coding material. These properties exist for a variety of functional reasons; for example, co-located genes are able to be processed and expressed more rapidly by the cell. Genome organizations have great consistencies across a wide range of organismal classes (Koonin and Wolf, 2008), indicating that they are functionally important, and were a relatively early evolutionary development.

Two of the most significant products of biological complexity are human language and culture. Language may be thought of as a structured system of information, used both for communication and the organization of information. It is the product of certain aspects of biological evolution, like the human brain and complex vocal apparatus. Additionally, language itself is a constantly evolving communal phenomena, and the communicative and organizational aspects of language are mutually reinforcing and constantly evolving. It exhibits its own characteristic evolutionary dynamics, while having much in common with other aspects of biological organization. Language is necessary for the existence of culture and society, and enables the development and specification our artificial systems of organization and classification. While language and culture are not necessarily “biological”, they are clearly shaped and constrained by the underlying biology and are the most complex informational products of the biological evolutionary trajectory.

Information emergence

The evolution of informational systems in biology is a problem of emergent information and semantics: starting from non-biological chemistry, how did we get to having representational and communicative information? One of the earliest and most obvious examples is the question of the evolution of the genetic code, but the range of emergent information systems spans all areas of biology – from the smallest molecular systems through modern human language. We know that both organisms and human-developed organization systems must be able to adapt to changing demands throughout their life-cycles; the problem of engineering the accommodation of all contingencies into systems at their inception is known to be intractable (Olson, 1998), so instead we want to address this problem by creating systems able to learn and adapt. Investigations into the evolution of artificial languages have demonstrated that relatively complex representational and semantic systems can be evolved autonomously (Swarup and Gasser, 2008). This suggests that it is reasonable to think that building software systems able to evolve and use their own organizational principles and content is a practical goal.


Building on the work of Eigen and Schuster (1982), Szathmary (1999), and Kauffman (2000), we have developed a theoretical model of spatially localized, interacting chemical cycles as the earliest entities (agents) able to encode and reproduce information. A chemical cycle that is able to accurately reproduce itself embodies a fundamental informational process – copying. Using a multi-agent simulation framework, we are investigating ways that this very primitive form of information can evolve to represent and encode an agent's environment. Biological evolution as we understand it is a process of encoding environmental states and adaptations to those states in a stable, functional storage media for later retrieval and deployment by living agents, and this evolutionary process requires reliably inheritable information in order to function. Understanding possible evolutionary scenarios for the development of inherited information will give us insight into the fundamental building blocks of information organization.


The creation of human information organization systems that can exercise a degree of autonomy in their interaction with the world – so that they can be flexible and adapt to changing conditions – is an enterprise that will benefit from an understanding of principles of the biological information systems that are already able to do this. Research into these principles is required, as we currently have only a very high-level, abstract picture of what they might be. We are attacking this problem, beginning with a theoretical account and computer model of the emergence of hereditary information in primitive biology. Elaborating the essential, abstract features of biological self-organizing information systems is an essential first step in developing this understanding.


I am grateful to Les Gasser, Carl Woese, Allen Renear, and Karen Wickett for much helpful discussion and criticism.