A statistics primer



Statistical input into an experimental study is often not considered until the results have already been obtained. This is unfortunate, as inadequate statistical planning ‘up front’ may result in conclusions which are invalid. This review will consider some of the statistical considerations that are appropriate when planning a research study.


Our recent editorial in the Journal of Small Animal Practice (Flaherty and others 2011) highlighted our perception that, in general, Statistics is a subject that is poorly understood by many veterinary clinicians, but – at the same time – one of huge importance to the successful design and interpretation of any research study. To this end, we proposed to produce a series of short articles on statistics in veterinary clinical studies.

In this, the first of the series, a recently published paper is used to illustrate the way in which one may approach a research project from a statistical perspective and some of the most commonly asked questions. This example and sample questions will subsequently be used in a series of articles that explore in detail some common statistical issues. The article under discussion (Bell and others 2011), although focused on anaesthesia, provides a framework to consider first how to design an experiment and second how such an experiment might be analysed statistically.

We start with a recipe for designing an experiment, and then use the published study to illustrate the highlighted steps (detailed below) and to raise some of the common questions.

Recipe for design

  • Find out as much about the problem as possible, making use of specialist knowledge in the subject domain of the experiment as well as statistical knowledge. An understanding of previous work in the area is invaluable.
  • Define the objectives of the study clearly. A failure to do this will make subsequent analysis very difficult, if not impossible.
  • Determine the types of data to collect and the conditions under which the data should be collected. This might be as simple as identifying location, (e.g. Small Animal Hospital, at first consultation), who would complete the observations (clinician, vet nurse, student) and variables to be measured (such as arterial blood pressure, body condition).
  • Identify the necessary information required as inputs. Other information such as age, sex and breed might not be of primary interest but could be important to record.
  • Define the study boundaries, i.e. the time periods and conditions to which the experimental results will apply (the scope of inference required). Here we are referring to the population of animals (all healthy dogs), if we are studying first consultations in the Small Animal Hospital, then we might specify between 2007 and 2009 and so on.
  • Choose an appropriate experimental design (of which there will be a number).
  • Select the study units (animals) and allocate them to a treatment.
  • Carry out the experiment.
  • Analyse, interpret and report on the results.

Define the objectives of the study clearly

Any study has as its objective, answering particular scientific questions. What needs to be measured or recorded and the nature and levels of treatments need to be selected in order to make sure that these questions can be answered. The objectives of the study also provide a first indication of the statistical tools that might be required to assess the outcomes, as an important step in getting from the data to answers to the substantive questions that the study poses.

In our example we have an anaesthetic study to compare the effects of different premedicants on certain physiological variables and propofol induction requirements in dogs (Bell and others 2011).

Specifically, in this case, the researchers wish to assess three types of outcome:

  • the degree of sedation achieved by the different premedicants,
  • the effects on the cardiovascular and respiratory systems, and
  • the dose of propofol required for induction of anaesthesia

following premedication with one of three treatment regimes:

  • “high dose” dexmedetomidine with buprenorphine
  • “low dose” dexmedetomidine with buprenorphine
  • acepromazine with buprenorphine.

Determine the types of data needed to answer the scientific question(s)

The nature of the data to be collected also helps to identify the types of statistical methods that will be used.

The data being collected in this study are of two distinct types:

  • continuous, i.e. can take any value (heart rate, respiratory rate, blood pressure) and
  • categorical (induction quality, pre-induction sedation and incidence of myoclonus) – these variables had four levels (so comprising an ordinal categorical variable as each level is ranked in terms of severity from least to most).Why does the data type matter? Because the statistical methods and types of analysis that are appropriate depend on the data types involved. For instance, how we analyse pre-induction sedation score (a rating scale) will be different from how we model heart rate (a continuous measure). The reason for this is that all statistical techniques make assumptions, and what assumptions it is safe to make depend on the type of data that we are dealing with.From the Bell and others (2011) study:
  • heart rate,
  • respiratory rate, and
  • arterial blood pressure

were recorded using appropriate monitoring equipment (heart rate and arterial blood pressure) or by direct observation (respiratory rate).

From the same study:

  • induction quality,
  • pre-induction sedation quality, and
  • incidence of muscle twitching

were scored by the researchers on 4 point scales.

  • From the study, propofol induction requirements were recorded in μg/mL (anaesthetic induction was undertaken with a propofol target-controlled infusion system, which delivers the drug to a set blood concentration (the “target”) as opposed to a set “dose”).

Conditions under which the data are collected

This study recorded heart rate, respiratory rate and non-invasive arterial blood pressure at a number of times:

  • immediately prior to induction of anaesthesia,
  • at successful tracheal intubation,
  • at 3 minutes post-intubation, and
  • at 5 minutes post-intubation.

So the time-points at which the observations were to be made were specified, and the dogs had measurements recorded at each of these four different time-points. This is an important component of this study, and one that we must take account of in our analysis. This is sometimes described as a longitudinal study, as we make observations on the same animal at a number of time periods (sometimes also described as a time course experiment).

Study design (including how much data we need)

The study recruited subjects and randomly allocated them to a treatment group. The authors describe their study as randomized and blinded. We know that we are comparing two different premedicant combinations and assessing two different doses of one of the combinations (giving a total of three treatment groups). So what do all these terms tell us?

Randomized– Subjects were allocated at random to the different treatment groups.

Blinded– The practitioner (or observer) did not know which treatment group a particular patient (dog in this case) had been allocated to.

The subjects were healthy dogs, and we need to assume that these dogs are representative of the whole population of healthy dogs and are not in some material way different to other dogs that might receive these treatments. The authors reported some further characteristics of the sample of dogs (quoting means and standard deviations for body mass and age) so that readers may judge for themselves how typical the dogs used are of the wider population (more detail on the concept of means and standard deviations will be given in the next article of the series).


Sixty healthy dogs (American Society of Anesthesiologists status I/II). Mean (SD) body mass 28·0 ±9·1 kg, and mean age 3·4 ±2·3 years.

One might ask the natural question of “why 60 healthy dogs?” This question touches on the statistical concept of power. That is, the power of the experiment and analysis to detect differences or effects if they are actually present. The issue of how many sample units (dogs in this case) are required for a particular study, is one of the most commonly asked statistical questions, and refers to the need to do sample size calculations in order to find out how many dogs would be required to have a reasonable chance of finding differences that are genuinely present. This assessment obviously needs to be carried out before actually beginning the study, usually by utilising data from a previously published similar study, or by conducting a pilot study. We will only very briefly describe this topic here as this will be the subject of a further article.

Sample size calculations

Without giving much detail, “how many” depends on a number of factors, including the inherent variability of the variable of interest (e.g. arterial blood pressure) between patients, the size of the effect induced by the particular treatment (e.g. how much the blood pressure is altered by the three different pre-medication groups), and how sure we want to be in our answer.

Large variability and small effect will require a large sample size. If blood pressure was highly variable in the population of dogs, then it is likely that we will only be able to detect differences between the three premedication groups in terms of their effect on blood pressure, if these differences are very large (i.e. relative to the variability); alternatively if we want to study a small effect (i.e. there is very little difference between the groups in terms of their effect on the variable of interest), then we will need to increase the number of dogs recruited to the study, perhaps to an unmanageable level.

The more sure we want to be of the answer (i.e. the less likely to draw an incorrect conclusion) then the more dogs we will need.

Select the study units and allocate them to a treatment

Having identified our subjects, we then have to consider how to allocate a dog to a treatment group. The study used random allocation.

Dogs were allocated randomly to receive 15 μg/kg buprenorphine combined with either

  • 30 μg/kg acepromazine (group 1),
  • 62·5 μg/m2 dexmedetomidine (group 2), or
  • 125 μg/m2 dexmedetomidine (group 3) administered intramuscularly.

Under a random allocation scheme, each dog has the same chance (1/3) of being allocated to the three treatment groups – in this way we avoid any hidden biases. One way of achieving random allocation in this case would be to imagine a three-sided dice; then for each dog at enrolment, this dice would be tossed and on the basis of the result would be allocated to a treatment group. Of course, today we would use a random number generator or a set of random number tables. There are many ways that we could achieve random allocation; the table shown below gives the dog id and a treatment allocation for nine cases (four allocated to treatment 1, one to treatment 2 and four to treatment 3). The treatment row was randomly generated. As we continue using such a scheme there is no guarantee that the same number of dogs would be allocated to each treatment but it is possible to ensure that equal numbers of dogs are allocated to each treatment (although from an analysis point of view this is not essential) by modifying the scheme.

Treatment group113

Analyse, interpret and report on the results

In this article, we will keep this section extremely brief, not because it is not an important topic, but because we will need to build up statistical knowledge as the series of articles progresses before we are in the position of being able to discuss this completely.

The methods of statistical analysis that we need, depend on both the specific scientific question of interest and the data type of each of the outcomes being considered and so are very much study dependent.

In this study, the different outcomes were analysed using combinations of:

  • chi-squared tests,
  • Fisher’s exact tests,
  • Kruskal-Wallis tests,

for the rating scale variables, and

  • one way ANOVA
  • general linear model ANOVA

for the continuous measures (we will not consider the details of any of these procedures in this article).

In all cases the researchers assumed statistical significance at a P value of <0·05. What on earth does statistical significance mean, and why this figure of 0·05?

Both statistical significance and the P value are to be found decorating most scientific papers and reports. To understand them fully, we need to consider statistical hypothesis testing (which we will do in a later article), but for this brief introduction, we will simply say that the P value is the probability that what you observed was not due to the treatments applied (in this case the different premedicant drugs/doses) but simply occurred by chance. In essence, the P value allows us to assess how likely it is that the result of our study occurred by chance and so is not a “real” repeatable effect. A statistical significance of <0·05 says that a treatment effect as large as that observed in a particular study may occur by chance less than one time in 20 (i.e. <5% of the time, 5% being 0·05 when decimalised). Scientific convention has determined the value of 0·05 as a judgement of how much risk it is reasonable to take of publishing a positive result when there is in fact no effect, and thereby inadvertently misleading other scientists.

One very important point to remember is that statistical significance does not always translate directly to veterinary or clinical importance; again, we will discuss this in a later article.

Next steps

Where do we go from here? A series of short articles are planned to appear at two-monthly intervals. The next will address some of the basic statistical concepts needed before we can deal with the more complex issues. The third will cover the topic of hypothesis tests and confidence intervals. The fourth will tackle the issue of sample size calculations. The fifth and sixth will address correlation and regression. The seventh and eighth will discuss designed experiments (Analysis of variance), longitudinal studies (sometimes called repeated measures) and multiple comparisons.

We would welcome feedback and comments on the topics as we go through the series, and can be contacted at Derek.


Conflict of interest

None of the authors of this article has a financial or personal relationship with other people or organisations that could inappropriately influence or bias the content of the paper.