Design issues for drug epidemiology
Alex McMahon, Robertson Centre for Biostatistics, Boyd Orr Building, University of Glasgow, Glasgow, G12 8QQ, Scotland.
Despite the difficulties involved in designing drug epidemiology studies, these studies are invaluable for investigating the unexpected adverse effects of drugs. The aim of this paper is to discuss various aspects of study design, particularly those issues that are not easily found in either textbooks or review papers. We have also compared and contrasted drug epidemiology with the randomized controlled trial (RCT) wherever possible. Drug epidemiology is especially useful in the many situations where the RCT is not suitable, or even possible. The study base has to be defined before the appropriate cohort of subjects is assembled. If all of the cases are identified, then a referent sample of controls may be assembled by random sampling of the study base. If all of the cases cannot be assembled, a hypothetical secondary base may need to be created. Preferably, only new-users of the drug should be included, and the risk-ratio will be different for acute users and chronic users. Studies will usually only be possible when researching the unintended effects of drugs. It is difficult to study efficacy because of confounding by indication. In occasional circumstances it may be possible to study efficacy (examples are given). Discussion of the dangers of designing with generalisability in mind is provided. Additionally, the similarities in study design between drug epidemiology and the RCT are discussed in detail, as well as the design-characteristics that cannot be shared between the two methods.
Designing studies in drug epidemiology can be a very complicated business. Despite the difficulties, many useful studies have been performed that combined pharmacoepidemiological techniques with large record-linkage databases of drug prescriptions and clinical outcomes [1, 2]. Generally speaking, there is a consensus amongst modern epidemiologists with regard to epidemiological methods, and study design in particular . Even though there are recent controversies raging in the world of epidemiology , more basic errors such as the desire for controls in a case-control study to be ‘very healthy’ are still being made . Against a backdrop of recent confusion in (specifically) drug epidemiology, a leading researcher decided it was necessary to publish guidelines for the design of such studies , even though something similar had been published in 1978 . The purpose of the current paper is to discuss various aspects of study design that are of interest to the authors. We have attempted to highlight issues that are not easily found in either textbooks or review papers. We have also compared and contrasted drug epidemiology with the randomized controlled trial wherever possible.
Why use drug epidemiology?
Although the randomized controlled trial is the best way to demonstrate the effects of pharmaceuticals, these trials are expensive to carry out. It is probable that the majority of medical research is carried out using purely observational data, i.e. no trial intervention or control was used. Some researchers have even criticised the dominance of the controlled trial, and even randomization itself, as the best methodology for carrying out clinical research [8, 9]. It is sometimes forgotten that the real strength of randomization is not to be found in obscure statistical subtleties, it is the ability to demonstrate cause and effect . Because of the many sources of bias in an observational study, demonstrating cause and effect is much more difficult and it is often necessary to rely on evidence from outwith a study. In a very old paper by Cochrane , an anecdote is provided in which the eminent statistician Sir Ronald Fisher was asked what could be done in observational studies to make that step to causation. Fisher's answer, in seeming contradiction of Occam's Razor (i.e. the simplest answer is the most likely), was to ‘make your theories elaborate’. He meant that observational studies have to consider as many different reasons for an association that can be thought of, and try and rule out as many as possible.
Although the importance of randomization cannot be overestimated, there are many situations where the randomized controlled trial (RCT) is not suitable. This may be due to either the practicalities of carrying out a study, or because randomization is often unethical. For example, a clinical trial can only detect very frequent adverse drug effects, as too few subjects will typically be studied [12, 13] and the observational methods of pharmacovigilance (as invented by Finney ) have to be used instead . Other examples of difficulties with RCTs include the following : RCTs are not useful for studying drug interactions or genetic disposition to diseases; drugs are used for indications other than the licensed ones; it may be unethical to randomize some groups of people such as pregnant women or children (although this principal may be misused when the real reason for exclusion is merely convenience); and finally, it is not ethical to design an RCT that examines drug overdose.
Defining the cohort
The first thing to do when a hypothesis has been developed, and study design has begun, is to define the cohort within which the study will be conducted. The source population is the population which has the available data and measurements that are required to answer the questions. The study population is the remaining population after all inclusion and exclusion criteria have been applied. When the period of time over which the study population will be studied is added to the equation, we then have the study base. The study base is therefore the members of the study cohort during the time that will be used in the study [17–19]. A cohort study is constructed explicitly within the study base by examining exposure patterns and looking for an association with the disease (or ‘event’, or ‘outcome’) of interest. Defining the base is also important for case-control studies, because these studies also take place within the study base. When described this way, it is obvious that a case-control study is merely a special type of sample of the underlying cohort . If a case-control study is to be carried out then a complete census of all of the events, or cases, must be assimilated into the case-series. At the very least, the case-series should be a ‘random’ sample of all of the cases [17, 21]. In other words, the cases in a case-control study are the same cases that would be used in an equivalent cohort study. The difference between a cohort study and a case-control study is that in the latter type of study we sample the cohort to provide a set of controls .
The secondary base
Sometimes, trying to imagine the study base for a set of cases that have already been assembled can be difficult, although this exercise is instructive in itself . Alternatively, the base may be defined but we cannot identify all of the cases, and we may only be able to identify a particular subset of the cases. Either of these two scenarios fails the simple rules above. A valid study might still be possible when this happens by imagining a secondary base, although the study might not be representative of a real population when doing this. The secondary base is an artificial base for which the set of cases is indeed complete [17, 21].
The most common example of this is the hospital-based case-control study, where the available cases are only those cases which ‘ended up’ in a particular hospital [18, 23]. The controls may have to be drawn from either the hospital catchment area, so that they would have gone to hospital if they developed the disease, or they must be actual patients in the hospital, if the patients in the catchment area would not necessarily have been in the case-series if they developed the disease. This type of study is easier to understand when data are available for all of the hospitals in a particular population. At least there is no concern over the case-mix of a particular hospital; we only worry about the possibility of cases occurring in the community not reaching hospital at all. For serious diseases, it may be possible to argue that the vast majority of cases have been identified (with the possible addition of community death-registration data).
Note that if hospitalized controls are being used, it is important to exclude patients who have been admitted for a disease that is associated with the study drug in order to prevent selection bias. Also, in all studies the subjects with prior study-events will usually have to be excluded because of contraindication with the study drug [7, 24], unless the effect of the drug (usually an adverse effect) is hitherto unknown . Generally, subjects who have evidence of illnesses that are risk factors for the outcome of interest, and for which the study drug is either indicated for, or indicated against (contraindicated), should also be excluded . An example would be a study of the association between nonsteroidal anti-inflammatory drugs (NSAIDs) and gastric bleeding. Previous endoscopic examination will be a risk factor for bleeding, and will also be a contraindication for prescribing of NSAIDs. Therefore, including these patients would bias the size of the toxic effect downwards, so patients with prior endoscopies should be excluded.
It may be tempting to keep contraindicated subjects in a study when the researcher is struggling with an under-powered sample-size. There could be a desire to settle a serious drug-toxicity problem as early as possible. Although the study may indeed have more power by including a larger number of events, doing this will create other statistical difficulties. Risk factors that are contraindications will produce very strong interaction effects. Essentially, the increased rate of adverse events that is due to exposure, will only manifest itself in the subjects without contraindications, although the event rate may be higher in these subjects. For example, in a previous study of the gastric toxicity of NSAIDs , the incidence rate (per thousand person years of exposure) for subjects with prior ulcer healing drugs was 8.02 for exposed subjects, and 9.08 for unexposed subjects. For subjects without prior ulcer healing drugs, the incidence rate was 4.45 for exposed subjects and 1.61 for unexposed subjects. Therefore the rate-ratios for exposure vs nonexposure were 0.88 for subjects with this particular contraindication to NSAIDs, and 2.77 for subjects without the contraindication. This is a very strong interaction (P = 0.004), which means that we shouldn't combine these subgroups.
Some of the published reasons for exclusions in a study are not essential for the purposes of removing dangerous sources of bias. It has been suggested that we should only include cases of uncertain cause , and exclude cases with an ‘alternate proximate cause’. This is a bit like censoring the irrelevant deaths in a survival analysis (for example in a Kaplan-Meier plot). These alternate cases may not necessarily be biased towards either exposed subjects or unexposed subjects. When this type of exclusion is considered the analysis may be carried out both ways, including and excluding these cases, out of curiosity about what happens when this possible source of error (i.e. ‘noise’) has been removed from the results.
It is important to consider the timing of events in relation to the start of drug exposure . Another criterion that should be applied to drug epidemiology studies of drug-toxicity is that a study should only include new users of the drug of interest. Any previous adverse experiences with a drug will be a contraindication to future exposure . Similarly, past use of a drug, and chronic prescribing in particular, will tend to be associated with nonsusceptibility to any adverse effect of the drug . This means that we should expect the risk of an event to be higher in acute users of a drug than in more chronic users [24, 27–29]. It could be argued that, in some studies, the previous use of a drug might cause a relatively mild form of confounding by contraindication. Some researchers may carry out at least one analysis that includes the previous users due to concerns with the study being under-powered. If the study was one of drug-efficacy (assuming that a study was possible) the role of previous prescribing would not be as obvious. There could be a selection bias due to the differing effects of diagnoses in the ‘distant-past’ and more recent diagnoses, so perhaps a study of efficacy should be analysed separately for these different types of patient, as a check of whether the results are affected.
For the stated reasons, it is not desirable to mix together subjects with different patterns of drug usage . Also, if acute users and chronic users are mixed together then the important assumption that the hazard rate is constant across time may be violated . This assumption underlies both cohort and case-control studies, and can also be broken if evolving clinical practice creates change in the pattern of contraindications . Analyses that examine the cumulative effects of repeated prescriptions should be restricted to only those subjects with chronic prescribing , and only after establishing the start of exposure. New-use of the drug can be established by creating an ‘inception cohort’ from the date of the first prescription. In practice this might be difficult if a database begins collecting data from a particular date, in which case a sacrificial screening period may be used to screen out past users [26, 33].
The greatest successes of epidemiology have been with the unintended effects of exposures such as adverse effects , and much less has been learned about the intended effects, such as efficacy. In drug epidemiology this is due to powerful biases that confound an association with the indication for a treatment. If we wish to compare treated patients with untreated patients, and we assume that prescribing is rational, then the treated patients will automatically have a higher rate of any disease that the drug is intended to treat (or possibly cure) . Therefore a drug that actually helps patients will appear to be risky. This means that studying efficacy with observational data is extremely difficult, and usually impossible. This can be seen in a positive light; the reverse is true of randomized clinical trials, because they are an excellent method of examining efficacy, and are not usually suitable for looking at unintended effects. As an aside, this should serve as a warning to the emerging field of ‘outcomes research’, which examines the ‘effectiveness’ of health technologies [25, 35]. The usefulness of routinely collected medical data, for the evaluation of treatments or medical interventions in general, is likely to be limited.
Despite the problems, there is clearly a demand for epidemiological studies of drug-effectiveness, especially for drugs that are being used for unlicensed indications . There are in fact some situations where nonexperimental methods could be used to demonstrate efficacy . The effect of the drug could be so dramatic that no comparator group is required, for example the use of naloxone in patients who are comatose with opiate poisoning . A disease could be stable or predictable so that fluctuations may be attributable to an exposure; for example insulin use and glycaemic control in diabetes . However, drugs can appear efficacious simply due to ‘regression to the mean’, when patients will get well over time, especially if selected into a cohort at a time of severe illness. In theory, if the severity of an indication could be measured exactly then this could be adjusted for in an analysis, although this is usually not possible [39, 40]. In practice, epidemiological studies of drug-efficacy suffer from uncontrollable bias due to confounding by indication.
When designing a study we have seen that various exclusions have to be made in order to create a study base that is as unbiased as possible. A large part of a source population may be thrown out of a study in this way . This is paralleled by the inclusion and exclusion criteria used in clinical trials. The subjects in an RCT will not usually be a random sample of any population at all, although they are often considered to be just that. What is important is whether or not the treatment works for those patients who were randomized. The result is still valid using “proof within the trial” and the results only provide an approximation to what will happen in a true population [10, 42]. Emphasis on the ‘representativeness’ of the subjects in either a clinical trial  or an observational study can be damaging to the design of a study [17, 41, 43]. It should be safer to generalize a treatment difference (e.g. active vs placebo) than the actual success rate in the treated subjects .
There are some aspects of study design with clinical trials that are worth trying to emulate in an observational study. Let us consider the RCT again. As was mentioned, the primary purpose of a clinical trial is not to be representative of a population, it is to find a difference between treatments (equivalence studies are being ignored, because ‘equivalence is different’). This simple point has firm grounds in the philosophy of science . Simply put, we are trying to refute the suggestion that the null hypothesis is true, and not trying to prove that the alternative is true. Having said that, it is desirable to have as wide a selection of patients in a trial as possible, as we are only excluding subjects who cannot be randomized for ethical reasons. There have often been justified criticisms of RCTs, stating that they are usually excessively restrictive in their choice of subjects [44, 46–49]. In this respect the RCT does not follow the paradigm of experimental science because the experimental ‘units’ are not intended to be homogenous (e.g. genetically similar rats). Some researchers may disagree with this view, but we remain convinced.
The main criticism is that if RCTs exclude certain types of patient, then this makes it difficult to extrapolate the results. As we have said, although patients should not be unnecessarily excluded, this external application of study-results may often be less problematic than doctors may think. Simply put, the treatment effect may be smaller or larger for patients with different prognostic factors, rather than nonexistent or even harmful for some patients [46, 50]. Ironically, when a study does have wide entry criteria, some researchers may attempt to claim that the drug does not work for a particular subgroup! This is of course an unacceptable practice, and one study has humorously pointed out that their drug did not appear to work for patients born under two of the astrological birth signs . The most likely treatment response in a subgroup is that for the whole study .
Where an RCT does follow the experimental paradigm is in the idea of experimental control. In an RCT, the conduct of the study has to be tightly controlled in order to give the best chance of detecting a difference between the treatments. This is why it is more difficult to find a treatment effect with either an intention-to-treat analysis, because the study conduct was not as good as that hoped for, or in a so-called ‘pragmatic’ study. In an observational study exclusions may be made in the spirit of a clinical trial [52, 53], but for very different reasons. Because there is no randomization of treatments, the only alternative that may help in preventing bias is ‘judicious selection of subjects’. Restricting entry into the study in ways similar to the RCT can make the results less biased, and in some cases very like the results of equivalent RCTs . Although the reason for exclusions is different, researchers should be wary of over-reliance on the flawed concept that study subjects should be representative of a population [7, 17, 41, 43] and not be afraid of ‘throwing away data’. In the past authors have over-emphasized the virtues of generalisability with observational research . As we have discussed, perhaps observational studies have not always been restrictive enough and RCTs have often been too restrictive, which raises the possibility that there is an optimum level of subject-restriction that is common to both methods.
At one time case-control studies were ridiculed as inferior retrospective ‘trohoc’ studies (i.e. cohort spelt backwards) . Nowadays it is recognized that both cohort and case-control studies may be conducted either prospectively or retrospectively. The idea that the concept of ‘directionality’ can distinguish between the two is now rejected as ‘founded on nonsense’[58, 59]. In practice, most observational research is probably retrospective since this type of research often takes advantage of data that has been recorded for other purposes. However, thinking about directionality can help in designing a study, because observational research is usually theoretically ‘retrospective’ for an entirely different reason.
In an RCT the anchor of time upon which the study is built, is the point of randomization. An observational study does not have this anchor, and will usually be centred on the time of the outcome, at the end of the study. This may seem counter to intuition, but consider the following quote concerning cohort studies from over 20 years ago by the eminent epidemiologist Liddell; ‘our research is prospective more in appearance than in fact, and as in all such studies, it is only possible, in practice, to classify the subjects’ exposure in terms of length and intensity after it has ended’. In other words, although a study may be conducted prospectively, and cause and effect reasoning is always in a forwards direction, observational variables will usually have to be measured backwards from the outcome .
The idea of a ‘baseline’ may not therefore mean much in drug epidemiology because the exposure might actually be in transit through the hypothetical baseline, which is really just a ‘study start day’ when data recording began. Exceptions might be studies where the baseline (i.e. ‘time zero’) could be created because the study also begins with a disease (e.g. myocardial infarction, and subsequent recurrence ). In theory a record-linkage system that covers an entire population, could simply stay in business until all of the subjects' prescriptions are recorded, from birth to death.
There is also no way to force exposure to be constant in the manner of a controlled trial (notwithstanding the usual problems of patient compliance). That exposure cannot be kept constant, and time is anchored at the outcome, can often be virtues of observational research. A parallel-group clinical trial could never discover that it is the blood alcohol immediately before a road accident that matters, or that it is the physical activity in the hours before a myocardial infarction that is the cause. The earlier exposure in these two examples did not matter at all.
The clinical trial paradigm
We have noted that, firstly, a clinical trial is anchored around the time of randomization and that exposure is held constant throughout the study and at the time of an outcome, and secondly, observational studies are usually assessed backwards from the outcome. Although there are many useful lessons from the clinical trial, these points mean that the clinical trial ultimately fails as a paradigm for drug epidemiology [41, 61]. This discovery certainly met some resistance when it was first suggested , and is probably still not appreciated by many researchers today.
In conclusion, the importance of randomization cannot be overestimated. In many situations a drug epidemiology study may be difficult, or even impossible to carry out. Indeed, anticipating when studies are impossible is an important skill in study design. However, it can also be difficult or impossible (at least ethically) to carry out a randomized controlled trial. Happily, these are often the very situations when drug epidemiology is at its most useful.