Standards for the publication of mouse mutant studies


  • W. E. Crusio,

    Corresponding author
    1. Centre de Neurosciences Intégratives et Cognitives, Université de Bordeaux, CNRS UMR 5228, Bat B2 - Avenue des Facultés, 33405 Talence, France
    Search for more papers by this author
  • D. Goldowitz,

    1. Centre for Molecular Medicine and Therapeutics, Child and Family Research Inst, Dept Medical Genetics, University of British Columbia, Vancouver, BC, Canada
    Search for more papers by this author
  • A. Holmes,

    1. Section on Behavioral Science and Genetics, Laboratory for Integrative Neuroscience, National Institute on Alcohol Abuse and Alcoholism, Rockville, MD, USA
    Search for more papers by this author
  • D. Wolfer

    1. Institute of Anatomy, University of Zurich, Zurich, Switzerland
    2. Institute of Human Movement Sciences and Sport, ETH Zurich, Zurich, Switzerland
    Search for more papers by this author

*W. E. Crusio, Centre de Neurosciences Intégratives et Cognitives, Université de Bordeaux, CNRS UMR 5228, Bat B2 - Avenue des Facultés, 33405 Talence, France. E-mail:

Since Genes, Brain and Behavior (G2B) started publication in 2002, the journal has become one of the favored outlets for, among many other subjects, mouse mutant studies. Unfortunately, a significant proportion of mutant studies suffer from serious methodological problems, which are the most frequent reason of rejection without review for manuscripts submitted for publication to G2B. Most of these problems are currently not always appreciated in the field and authors sometimes protest such summary reject decisions by arguing that they followed commonly used procedures that many other journals publish without any problem. Of course, the real issue is whether or not these commonly used procedures are scientifically valid. The problems vary from improper breeding procedures to invalid choices of control animals. Examples of such studies can be found in virtually every scientific journal, even the most prestigious ones. Because of the paucity of methodological information that many so-called high-impact journals provide nowadays, it is often even impossible to verify what genetic background was employed and whether proper procedures were followed and one just has to hope for the best. Many of these problems were pointed out by Gerlai (1996) and it is very disappointing and rather distressing, to say the least, to see that more than a decade after Gerlai’s original commentary appeared, articles continue to be published that violate some of the most basic requirements for the proper conduct of mutant studies. Clearly this situation is highly unsatisfactory and the current situation is not only frustrating to authors, reviewers and editors alike, but to the extent that these design faults are obvious fatal flaws, this also constitutes a waste of time, effort and research resources, not to mention the ethical implications of unnecessarily using live animals for flawed studies. The following editorial will therefore outline the minimum standards a mutant study needs to comply with in order to be considered for publication in G2B (Table 1). Manuscripts that do not adhere to these minimum requirements will be rejected for publication without review, whereas manuscripts that do not provide sufficient experimental detail will be returned to their authors for completion before being sent out for review. It is hoped that other journals publishing mutant work will follow our lead, resulting in a critical improvement of the designs of mutant studies. (Sometimes authors refer to an earlier publication for ‘more details’. Although this may be acceptable to avoid repeating more arcane details of some experimental procedure, any manuscript should always contain all experimental details necessary for reviewers (and readers!) to understand what had been done.)

Table 1.  Checklist for reports on mutant studies
1. Present full strain and substrain information and use correct nomenclature. If animals were purchased, full information on the vendor should be included. Sufficient details of breeding procedures need to be presented to permit others to replicate an experiment under the same (or at least highly similar) conditions.
2. Do not ignore or belittle the potential biasing effect of flanking alleles without solid evidence.
3. Do not maintain mutants and wild-type animals as separate lines if they are derived from a segregating population.
4. If mutants are maintained in a randomly-bred colony, it is essential that only WT and mutant littermates are compared with each other and that no single breeding pair produces a disproportionate number of either one of the experimental groups.
5. If mutants are maintained as a congenic strain by repeated backcrossing to a standard inbred strain, it is strongly recommended to maintain the backcrossing procedure for as long as possible.
6. Mutants derived from a congenic strain can be compared to wild-type animals from the recipient inbred strain, provided there are no substrain differences between the strain used for backcrossing and the strain used to derive the control animals.
7. When a mutation is generated by transgenesis, more than one expressing founder line is needed for the initial analysis.
8. Do not pool data from males and females, even if sample sizes are low.

In what follows we will briefly discuss the most common problems with the design of mutant studies and briefly explain why they are not acceptable. For more detailed discussions and acceptable alternative approaches, the reader is referred to Crusio (2004), Gerlai (1996, 2000, 2001) and Wolfer et al. (2002). It should be noted that basically all of the problems discussed are common to any mutant study, regardless of whether the mutation was induced in a targeted way by means of transgenesis or homologous recombination (‘knock-out’), or was induced randomly following chemical treatment or radiation, or occurred spontaneously. In addition, most of the recommendations given obviously also apply to other species than rodents.

The genetic background and flanking allele problems

By now, of course, everybody has at least heard about the so-called ‘flanking allele’ problem, to which attention was originally drawn by Gerlai (1996). The flanking allele problem is often referred to as a bias because of the ‘genetic background’. Although this is technically correct, it should be noted that the genetic background contains much more than just the alleles flanking a certain mutation on one particular chromosome (Crusio 1996). For the sake of clarity, we prefer to reserve the term ‘flanking allele problem’ to biasing effects of alleles flanking a mutation and derived from strain [generally: embryonic stem (ES) cell] donors in which the mutation was induced. We will use ‘genetic background effects’ for genetic influences because of genes located outside this flanking region (often even on different chromosomes).

It has been shown repeatedly that widely divergent effects of a given mutation can be observed depending on the genetic background onto which the mutation has been backcrossed (e.g. Holmes & Hariri 2003; Threadgill et al., 1995). Examples even exist in which completely opposite effects have been reported for one and the same mutation backcrossed onto different backgrounds (Ivanco & Greenough 2002; Mineur & Crusio 2002). As for the possible biasing effects of flanking alleles, Bolivar et al. (2001) rather ingeniously even used these effects to localize Quantitative Trait Loci.

Nomenclature of inbred strain backgrounds

There is a large literature documenting behavioral and neural differences between substrains of inbred strains such as 129 (Montkowski et al. 1997), C57BL/6 (Crusio et al. 1991; Jamot et al. 1994), C3H (Heimrich et al. 1988) and DBA/2 (van Abeelen & Hughes 1986), to mention but a few. ES cells used to generate null mutations are mostly derived from one of the many strains belonging to the 129 family (Simpson et al. 1997). There are many documented differences between the different 129 strains (Festing et al. 1999), coat color being one of them (Simpson et al. 1997). Given the facts that complications because of flanking alleles and pervasive effects of the genetic background are widely known to occur, the use of complete and correct strain nomenclature, including full substrain information, is therefore essential (Wotjak 2003). Detailed instructions on nomenclature rules and lists of existing strains and substrains with their correct strain abbreviations (including revised nomenclature for strains formerly designated as, e.g. ‘129/SvEv’, etc.) can be found on the website of The Jackson Laboratory (Bar Harbor, ME, USA; for general mouse information, see:; for nomenclature rules, see:

The flanking allele problem

Although this problem is mostly known in connection with knockout studies, it should be noted here that it will, in fact, occur whenever a mutation arises or is generated on one background and then transferred to another one by intercrossing (this includes therefore most transgenic and mutagenesis studies). Only when a mutation occurs on an inbred background and is maintained in that background will an experimental bias because of flanking alleles be absent. In consequence, in most experimental designs mutants and wild-type controls will differ not just at the locus of the mutation. Almost always, alleles derived from the strain in which the mutation was generated will flank the mutated gene, whereas potentially different alleles derived from the strain to which the initial mutation carrier was backcrossed will flank the wild-type allele (Gerlai 1996). Several strategies to test for possible effects of these flanking alleles have been proposed (Bolivar et al. 2001; Crusio 2004; Wolfer et al. 2002). Unfortunately, these solutions are sometimes rather impractical or even impossible; invariably they are laborious and costly. It would be counterproductive to insist that every mutant study should entail control experiments for flanking allele effects and G2B will continue to publish mutant studies even in the presence of a potential bias from the flanking allele region. However, authors should avoid using wrong or downright disingenuous statements to the effect that ‘the flanking allele problem has been solved’ (not really…), ‘most probably does not play a role here’ (how do you know?), or ‘we backcrossed for over 10 generations and this takes care of the flanking allele problem’ (it does not, see: Crusio 1996, 2004; Gerlai 1996 and below). Only maintaining the mutation in a heterogeneous, segregating population (whether by continuous backcrossing to a standard inbred strain or in a randomized genetic background) will over time gradually reduce the size of the flanking allele region (Crusio 2004). Ideally, researchers could genotype animals for markers in the flanking allele region and choose those individuals for breeding that have undergone appropriate recombinations, thereby reducing the size of the flanking region derived from the ES donor (Behringer 1998). As is the case for cross-breeding control experiments, this is not always practical, though. Some unbiased experimental designs have been proposed elsewhere (Crusio 2004; Gerlai 2000, 2001; Wolfer et al. 2002; Zimmer 1996)

Breeding systems used to maintain mutants

Several breeding systems have been used to maintain a mutant and produce experimental and control animals. The absolutely worst way of doing this is to produce homozygous mutants and wild-types from an originally segregating population (e.g. an F2 generation between the ES-cell donor and another strain) and establish independent homozygous +/+ and −/− lines. This will, in effect, create two new recombinant inbred strains that because of random fixation of alleles (genetic drift) will differ for many genes, not just the mutation, after even one single generation. Although it is fortunately becoming rare, this genetically naive and completely improper breeding system is still used from time to time. Obviously, it is equally unacceptable to maintain homozygous mutants in this way and then use either an F1 or F2 population derived from the two parental strains as an ‘approximate control’.

Basically, there are two correct ways to maintain a mutation (Wolfer et al. 2002). The first is to continuously breed heterozygotes with each other, choosing animals from within the population either randomly or nonrandomly by preferentially mating animals that are not closely related (to maintain a higher level of genetic variation in the colony). (While continuously breeding heterozygotes with each other may be acceptable, it is not recommended for the long-term: backcrossing to a standard inbred strain, such as C57BL/6, is preferable; see Wolfer et al. 2002) The second is to backcross the mutation to one or more inbred strains. Most often, C57BL/6J animals are used for this purpose.

If mutants are maintained in a randomly segregating population (the first case mentioned in the above paragraph), it is absolutely necessary that experimental KO and WT animals are littermates, derived from as many different breeding pairs as possible. The reasons for this are twofold. First, because the population is randomized, dams will differ genetically (even though they will all be heterozygous for the mutation) and may therefore provide differential maternal care, either behaviorally (Carlier et al. 1982) or by providing milk of a different composition (Ragueneau 1987). Second, because dams and sires are genetically different, each breeding pair will provide experimental animals with in the mean a different genetical background. It is therefore also important that experimental groups are as much balanced as possible regarding to from which breeding pair the subjects are derived. Deriving most wild-type animals from one breeding pair and most mutants from another one would obviously constitute a serious bias in the experimental design. Ideally, each breeding pair and even each litter provides the same number of experimental animals to both the wild-type (WT) and knock-out (KO) groups. If this is not practical, possible litter effects must be included as a between-subject factor in any statistical analysis.

If animals are backcrossed to a standard inbred strain such as C57BL/6J (note that ‘C57/Bl6’ is both incomplete and an incorrect use of nomenclature), a so-called congenic strain (Green 1966) is created. With each additional generation of backcrossing, 50% of the remaining alleles from the original genetic background (the donor line) will be lost (in the mean). After a minimum of 10 generations of backcrossing, the line will be 99% identical genetically to the recipient strain and is considered congenic with it. As noted elsewhere (Crusio 2004), this figure of 99% is only applicable in the absence of selection and therefore does not apply to the flanking gene region. Once a line is fully congenic, it is then acceptable to derive homozygous mutants and maintain them in this way. In this situation, WT control animals can be taken from the parental recipient strain (but not from a different substrain than was used for the derivation of the congenic) and need not necessarily be littermates, although this still remains preferable. It goes without saying that mutant and control lines should be bred in the same facility and should be age-matched. A necessary caveat with this design is that any differences found between mutants and WT animals can be because of direct effects of the gene on the phenotype studied or to indirect pre- or postnatal maternal effects in the not unthinkable case that +/+ and −/− dams provide different maternal environments (Crusio 2004). However, it is not recommended to breed mutants as a separate congenic line for more than about 10 generations, because of the risk that de novo mutations will lead to biasing genetic differences between the congenics and the parental recipient strain. At a bare minimum, one generation of backcrossing should be applied every 10 generations, but more often (say, every five generations) is better. The ideal way of maintaining a mutant line remains, of course, consistent backcrossing carriers to wild-type animals from a standard inbred strain, because in this way the flanking allele region will over time be slowly reduced in size and heterozygote animals to generate WT and KO homozygous (and if so desired also heterozygous) littermates on a standardized genetic background will be readily available.

Using male and female animals

Frequently, only low sample sizes are available for a newly generated mutant and some authors then proceed by pooling data from males and females to obtain larger sample sizes. Obviously, such pooling is only allowable if a two-way analysis of variance does not indicate any sex effects or sex × genotype interactions (and even in this case not pooling is statistically preferable). The preferred way of handling this situation remains, of course, to include sex (not ‘gender’ which has social connotations and should almost exclusively be reserved for human studies) as a factor in any statistical analyses. Sometimes the argument is made that this was not done because ‘sample sizes are too low to reliably detect any sex effects’. It should be realized that if this is true, the same applies to mutation effects and sample sizes need to be increased before any reliable conclusions can be drawn from the data.

Single founder lines in transgenic studies

When a transgene is inserted into a mouse’s genome, the insertion site is determined more or less randomly. As a result, the expression of an endogenous gene may be disrupted by the insertion and this may cause considerable phenotypical effects (e.g. Cases et al. 1995). To check whether an observed phenotypical effect is because of the transgene it is not enough to check whether the transgene actually is being expressed. The only way to ascertain that an effect is because of the transgene itself is by generation more than one line, as it is highly unlikely that both lines would have the transgene inserted at exactly same place in the genome. The additional effort of generating more than one line is partially offset if the different transgenic lines happen to have different expression levels of the transgene (for instance, because of differing copy numbers). If different levels of gene expression correlate with the phenotype of interest, then this is strong additional evidence for the phenotypical effects of the transgene. It is therefore essential for the first characterization of a transgenic mutant to have more than one founder line available. For follow up studies of an already well-established model (like most lines used as models for Alzheimer’s disease), this is less critical. Of course any new phenotypic change that is described for such a mutant could potentially be an indirect effect and this has to be discussed in a clear and fair way.


The above-discussed issues are summarized in the recommendations listed in Table 1.


We would like to thank Dr. Robert Gerlai (Toronto, Canada) for valuable suggestions.