On the state of scientific English and how to improve it – Part 2

Define what you mean

Figure 1.

I learned a new word the other day: it's “ripsnorter”, and according to the Merriam-Webster dictionary it means “something extraordinary”. The stimulus for its use – as communicated in an Email bulletin from F1000 under the heading ‘ENCODE: “A ripsnorter of a controversy”’ – certainly lives up to the definition: it's the paper recently published in Genome Biology and Evolution by Dan Graur et al. [1], which essentially takes the ENCODE consortium to task over the meaning and application of a single word, “function”, and its derivative “functional”. Grossly simplified, a major part of the thesis of Graur et al. is that ENCODE [2] did not distinguish between the two, importantly different, meanings of the word “function” – i.e. between function as evolutionarily selected effect or trait, and function as causal role. The latter can – as Graur et al. point out – refer to something as trivial as the observation that the middle two fingers of the human hand separate the index finger from the little finger, hence having the “function” of separators – i.e. it is an empirical description of one of potentially many ways in which a relationship between two or more entities may be conceived. However, it says nothing about the evolutionary processes that shaped the particular arrangement of, and interaction between, given entities. That distinction is crucial if one is interested in the evolution of the genome in response to selective pressures. Still, in fairness to ENCODE, that doesn't stop one finding potentially informative patterns and features in the genome using an alternative definition of “function”. Would it have diminished the value of the ENCODE work to have divided the claimed 80% of the human genome to which biochemical function was assigned into smaller, more specifically defined, percentages? Perhaps the opposite.

And now for our second word – possibly the strongest candidate for the most incorrectly or vaguely used word in biology today: “complexity”. When considering the relative complexity of organisms – a popular topic in the literature – which complexity are we discussing? Organismal, genomic, developmental, metabolic, behavioural…? And within genomic complexity is it numbers of genes, regulatory motifs, regulatory architecture, or the genetic “interactome”, the proteome etc.? So often, the nature of the complexity that readers are directed to consider simply isn't sufficiently defined; and, worse, “complex” often seems to be confused with “complicated” – a term understood in every-day language as “consisting of many parts – possibly intricately interconnected – the whole of which is hard to understand”, or simply “confusing”: quite a vague meaning. So, the problem is that there is an imprecise every-day understanding of “complex” or “complexity”, and there are several, specific, scientific definitions; e.g. see: http://en.wikipedia.org/wiki/Complexity. When considering the genome as encoded information under the effect of neutral drift and various selective pressures, it's crucial to know whether we're talking about a definition of complexity rooted in information science, for example.

The two words (and their derivatives) highlighted above have something in common: they are used in every-day language with very flexible meanings but they also appear in science with often quite different, specific, scientific meanings. The problem is that those meanings are frequently either assumed by the scientific users, or even neglected, and are not revisited in thought or word at the point of use. A reader who does not assume the same meaning will probably either be considerably influenced by the every-day meaning, or be convinced of an implied meaning that might not be close enough to the true scientific meaning; furthermore, that implied meaning might even be inappropriate in the given context. There is another possibility – and this might be better in certain circumstances: the reader might simply fail to grasp the concept altogether. The bottom line is: when using such words, always define extensively the sense in which you are using them, and then briefly remind the reader of this definition, as necessary, later in your manuscript.


Andrew Moore