Commentary: Never trust your word processor


Some time last year, I had an intriguing conversation with a medical student, who approached me with the words: “…I know you teach biochemistry, could you please explain something to me? What is DANN-Polymerase?”. I thought about this for a few seconds, and had to admit that I had never heard of any enzyme with that name. I asked about the context, and she produced a booklet made for German medical students, to prepare them for their exams in biochemistry. The exams are multiple choice, and the booklet contained lists of example questions, and clearly, there were questions on DANN-Polymerase. From the answers, it was not hard to guess that this was just a misspelling; the question was obviously on DNA-Polymerase.

I have to admit that it took me a while to realize how exactly that error had made its way into a German book. The German and English versions of DNA-Polymerase are spelled the same way. DANN (actually, “dann”) is the German word for “then”. And, surprise, when I typed DNA-Polymerase into my German version of Microsoft Word, it automatically corrected it to DANN-Polymerase – without asking. I tried this on the computers of colleagues, and depending on the language version, the “correction” happened or not. I then played with a lot of settings that are deeply hidden in the bowels of the software, and managed to turn off this function. I felt that I should be the only one with the privilege to produce typos on my computer.

I talked to colleagues about the phenomenon, and everyone was slightly amused – and several colleagues gave me other examples. Let me give you a grave one: in a publication in BMC Bioinformatics, Zeeberg et al. report the automatic and irreversible renaming of genes by Microsoft Excel [1]. What the authors had noticed when they were using Excel to evaluate microarray datasets, was that genes seemingly disappeared from the extensive tables. In truth, they were automatically renamed, because Excel decided that some of the gene names were actually “floating point numbers” in case they contained a capital “E”. Genes with such inventive names as “SEPT7” were converted to dates, etc. As only a fraction of gene names were affected, the authors suggest that in many cases, such mistakes happen unnoticed, and ruin databases. They actually cite a number of online database resources that contain such mistakes.

I immediately looked for instances of “DANN-Polymerase” in PubMed, and found several instances, where this word processor enzyme had made it into the Materials and Methods section of publications, typically from German scientists. But today, a colleague complained to me about an even better example of artificial intelligence. Whenever he wrote about the amino acid “proline”, his word processor turned this into “praline”. This also works in English versions of Microsoft Word, and has made its way into PubMed in dozens of cases, in publications from all over the world. With some amusement, I noticed that PubMed, when I typed “praline” into the search window, replied with the following sentence: “Did you mean: proline (50271 items)”.

Clearly, the auto correction mode of word processors leads to a number of problems; mistakes in databases are grave, mistakes in papers, where pralines show up in proteins may be funny. But, I want to get back to the original finding in a German medical exams question. Especially in multiple choice tests, where students typically have little time and are frequently prepared only for “typical” questions (i.e., are not trained to really understand mistakes, such as the one about “DANN-Polymerase”), such typos are not acceptable. Intriguingly, it seems that such typos penalize good students more than bad ones [2]. I'm not going to say more on the system of multiple choice testing, this has been discussed in this and in other journals extensively [3]. But it is clear that, where this system is applied, spell checking should not be left to a word processor.


The author is grateful to P. Szczesny, V. Braun, S. Dunin-Horkawicz, and H. Pall, for alerting him to examples of misleading auto corrections by word processors.