Suppose a patient presents with severe headaches. These headaches can be treated by a pill of which there are currently four brands on the market. The brands are all 100% effective, yet, there are some studies showing that one of the four carries a higher risk of a certain side effect. If the side effect occurs, there is no equally effective treatment for the headaches. Would you even consider prescribing the brand which may have an increased risk of the side effect? A physician who believes that his foremost obligation is not to cause harm (primum non nocere) will not.
It seems, we are experiencing a similar question following a recent report on differences in immunogenicity between various factor VIII (FVIII) concentrates in patients with haemophilia A . The report provided incidences of the development of neutralizing antibodies (inhibitors) against FVIII – a severe side effect of the treatment with FVIII – according to product brands. Among 574 consecutive newly treated children with severe haemophilia A, recombinant and plasma-derived FVIII products conferred similar incidences of inhibitor development, and switching among products was not associated with inhibitor development. Yet, the adjusted incidence rate of inhibitors was 60% higher during treatment with the second-generation FVIII, known as Kogenate® Bayer (Kogenate® FS; Bayer AG; Bayer HealthCare LLC, Berkeley, CA, USA) also known as Helixate®NexGen (Helixate® FS; Bayer AG; Bayer HealthCare LLC, distributed by CSL Behring) than during treatment with third-generation full-length FVIII, known as Advate™ (Baxter Healthcare Corporation, Thousand Oaks, CA, USA & Neuchâtel, Switzerland). Kogenate FS and Helixate FS are identical medicines containing octocog alfa, produced by recombinant DNA technology using baby hamster kidney cells. They were authorized throughout the EU on 4 August 2000. ADVATE is produced by recombinant DNA technology using Chinese hamster ovary cell lines.
The interpretation of this finding, and its clinical implications have been heavily debated in the literature [2, 3], at scientific meetings, and by regulatory authorities in Europe and the USA. The central question is the following: ‘Is this singular observation sufficient to impact clinical care, i.e. to stop prescribing Kogenate/Helixate?’.
In all the debates, three major aspects of the study relevant to its interpretation seem to have been ignored: first, the prior probability of an excess risk with some products; second, the difference in sensitivity to spurious results in studies of effects vs. studies of side effects; and third, the age-old adagium primum non nocere.
Clearly, every scientist will agree that a single study cannot provide definite proof. A finding from any type of study can be the result of chance. Bad luck. In fact, as David Hume has stated already in the 18th century, no proof can ever be derived from empirical activities. Nevertheless, we find some studies more convincing than others. Prior beliefs influence how convincing we consider study results to be. We do not believe results that are completely implausible, e.g. a randomized study showing an effect of homoeopathy. In the case of the immunogenicity of FVIII concentrates, a strong prior belief is not so reasonable: the expectation that all concentrates would invariably confer a similar risk of inhibitor development seems less plausible than that there might be differences. After all, such a difference has been observed before, after a slight modification in the production process of a plasma-derived concentrate .
An essential, but largely ignored argument in the discussions about clinical implications of the RODIN study findings is that the design, the interpretation and the clinical consequences of a study, are completely different for studies of side effects as opposed to studies of intended, therapeutic effects. Although randomization is nearly always required for the study of intended effects, there is a large body of literature pointing out that this is nearly never the case for the study of side effects . In fact, most aetiological studies, including studies into side effects, are by necessity non-randomized, e.g. from the earliest studies on the side effects of smoking (lung cancer) to the latest studies into effects of genetic variants.
As stated before, among previously untreated patients with severe haemophilia A the RODIN study found a 60% increase in the incidence of inhibitors associated with second-generation recombinant factor VIII (rFVIII) compared to third-generation FVIII products. This finding may be the result of a causal effect, in which case the explanation should be sought in the manufacturing process. Alternatively, theoretically the finding could be untrue, and the result of confounding, measurement or selection bias, or a chance finding. We will discuss first the likelihood of the non-causal explanations and discuss second the biological plausibility of a causal effect.
The RODIN study is an observational study, not a randomized experiment. Observational studies to compare treatment effects are notoriously prone to confounding by indication because in general, a treatment is chosen according to the clinical profile of a patient, leading to imbalance in the characteristics of the patients being compared. This confounding by indication is the only reason to apply randomization, to break the link between prescription and prognosis. When there is no confounding by indication, observational studies yield results that are equally valid to randomized studies . This is why we believe non-randomized studies on the association between smoking and lung cancer. In the RODIN Study, confounding would have occurred if the decision for one or another FVIII product had depended on the patients' perceived risk for inhibitor development at the time of the decision, i.e. at the start of treatment. We currently only know three inhibitor predictors which can be recognized at the time of that decision: certain F8 genotypes, positive family history of inhibitors and severe haemorrhage or surgery at first FVIII infusion . The presence of these predictors indicates an increased risk of inhibitors. However, if anything, all three factors were less prevalent among the patients who were treated with the second-generation rFVIII, not more. So, confounding by indication cannot explain the observed association. It is worth to note that all three risk factors were most prevalent among patients who received plasma-derived products. This suggests that high-risk patients were preferably given plasma-derived products by some clinicians. All three risk factors were measured accurately, which allowed for appropriate adjustment for them in the statistical analyses . We conclude that confounding cannot explain the observed association.
Some have argued that findings from retrospective studies should be mistrusted. The RODIN report intentionally did not mention the terms retrospective or prospective because the use of these terms is strongly discouraged by the STROBE guidelines . Because the phrases are ambiguous, STROBE suggests clearly reporting of how patients were selected and how data were collected in order for the reader to judge the influence of potential biases. For while, it is true that with a prospective design the researchers have the choice which variables to include, the final judgement on the credibility of a study should be based on whether or not relevant variables were included and to what extent data were missing. The RODIN report clearly describes all eligible patients and exclusions with the reasons for exclusion in Fig. 1, and very few data were missing . In the Supplementary material  on missing data we read, ‘In 12.4% of patients, the F8 gene mutation type was missing. These values were imputed by the prevalences of the F8 gene mutations in this study population. If more than 10 exposure days on on-demand treatment were missing, the patient was excluded from the analyses. If less than 10 exposure days of on-demand treatment were missing, these missing dates of exposure days were unconditionally imputed with the middle of the period of the date before and after the missing date. It was assumed that these treatments were given for minor bleeds. Overall, 0.6% of dates and products were imputed. The missing values of the variables ethnicity (missing for one patient) and family history of inhibitors (missing for 30 patients) were imputed using multiple linear regression methods'. This observation shows that missingness was limited and appropriately dealt with in the analyses. Therefore, there is not only no likely reason why the partly ‘retrospective’ nature of this study would have caused a spurious association between second- and third-generation recombinant factor and inhibitors, but it even requires extremely unlikely scenarios how the missingness could have caused bias.
There may have been misclassification of inhibitor status due to differences between laboratories. The levels of inhibitors were measured in the laboratories of the participating centres, instead of a central laboratory. One would again need elaborate scenarios to have this explain the increased number of inhibitors in patients on second-generation rFVIII. Many factors may increase the noise-over-signal ratio, but none of these will lead to the identification of a spurious signal. Moreover, all high titre inhibitors will be detected, irrespective of the used inhibitor assay or screening frequency, and therefore correctly classified as inhibitors. The increased inhibitor risk associated with treatment with second-generation rFVIII compared to third-generation rFVIII, was confirmed when only high titre inhibitor development was considered, confirming that the observed association cannot be explained by differences in laboratory methods. This leads to the conclusion that it is very unlikely that the observed association is explained by measurement error.
Of 648 eligible patients, 574 were analyzed. Is it possible that exclusion of 74 patients creates a spurious effect of second-generation rFVIII on inhibitors among the other 574 patients? Again, that would require a complex scenario, in which the excluded patients were either predominantly low inhibitor-risk patients who used second-generation FVIII, or high inhibitor-risk patients who used third-generation FVIII. This is farfetched, even more so when one looks at the reasons for most exclusions: no or pending informed consent in one centre (n = 22) and insufficient data available (n = 32).
Another cause of selection bias might be incomplete follow-up of a part of the study population. However, patients were censored from the analyses at the time of last follow-up. Follow-up for the reported analyses ended May 2011 (see Supplementary material). There is no reason to believe that the end of follow-up was influenced by anything else than calender time. This implies that selection bias cannot explain the observed association.
Some have suggested that the comparison between the specific recombinant products was not clearly described as a separate objective in the protocol of the RODIN study and should therefore be regarded as a post hoc analysis and accordingly accompanied by a different interpretation of the P-value . The study protocol is publicly available at the PedNet website (www.pednet.nl); it states: ‘The objective of the RODIN Study is to examine the role of risk factors for inhibitor development among patients with mild/moderate/severe haemophilia A and B. Potential risk factors for inhibitor development: treatment characteristics like age of first infusion of FVIII, type and purity of coagulation factor concentrates used for treatment, dose and frequency of clotting factor administrations. Number, type and severity of haemorrhages (joint, muscle or intracranial), surgery, infections, use of antibiotics and other medications, allergic diseases, vaccinations and the response to vaccinations, FVIII/FIX gene mutation, family history of inhibitors and duration of breast feeding'. As the analysis was prespecified, the P-value may be interpreted in a frequentist fashion, which implies that the likelihood of observing this difference, while in reality there is no difference, is less than 2%.
None of the proposed alternative explanations to a causal one offers a satisfying explanation for the observed association, so the likelihood that the association is causal is considerable. A biological mechanism underpinning the observation, although not necessary, would lend further credibility. The literature does indeed provide evidence that may explain the observations. A plausible explanation is that the second-generation rFVIII contains more FVIII protein in aggregate form .
But even when this is seen as at most a partial explanation, we cannot let the limitations in our current knowledge negate a potential deleterious effect. There is a long list of drugs shown to have unexpected side effects after they had been licensed, ranging from short-acting calcium antagonists to ximelagatran to third-generation contraceptives, and in all cases the mechanism was unknown at the time the side effect was identified. This is quite logical, for if the mechanism of a side effect were already known before that time, the drug would not have passed the licensing stage. Many drug effects exist, and the mechanism is unknown, which is even true for intended effects. For instance, the mechanism of action of aspirin was only revealed more than 60 years after recognition of its clinical effects . Still, man should learn by repetition, and in the 1990s we have witnessed that a particular FVIII concentrate, after the viral inactivation method was changed from solvent detergent to pasteurization, caused a sharp increase in inhibitor incidence . Twenty years hence, we still do not know the mechanism.
After having discussed the interpretation of the observed association, let us return to the clinical consequences. These are far simpler and can be derived from the principle of primum non nocere. This principle must lead to the conclusion that efficacy needs to be proven, but not harm. Reasonable doubt about safety is sufficient rationale to refrain from the use of a drug, unless there are no alternatives or only less effective alternatives. Our surprise over serious side effects attributed to a drug we thought safe, should not lead us to disbelief and irresponsible continuation of its use. Moreover, while class effects (all types of a drug have the same effect) for the main therapeutic effect are more rule than exception, for side effects such class effects are less common. This is because drugs were developed for the main effect, and not for the side effect.
Development of inhibitory antibodies against FVIII is an adverse drug reaction. The risks for an adverse drug reaction may be acceptable when the potential desired effect of the drug is substantial, such as chemotherapy for cancer. If, however, a drug with equal effectiveness (and costs) but possibly substantially fewer side effects is available, this should be the preferred drug. Given the availability of a number of rFVIII brands with equal effectiveness, the heated and highly theoretical discussions about the implications of the observed increased risk of inhibitors with one of these products are not only surprising, but both misguided and misleading.
Finally, clinicians are the guides of the patient and their task is first and foremost to inform the patient about the available knowledge. Let a well-informed patient decide.
Inhibitor formation is a multicausal process and FVIII exposure is undeniably a necessary cause of inhibitors. Despite all efforts of the regulatory authorities it will be impossible to exactly know the immunogenicity of a new product before authorization of the product. The best way to monitor a FVIII product's immunogenicity is to collect information on all FVIII infusions of a product and to compare inhibitor occurrence between products. A central database with such information, which is under the ownership of patients is the most efficient, fastest and most reliable way to learn whether a product carries an higher or lower risk of inhibitors. With such a database in place we would have known earlier that Kogenate/Helixate indeed induces more inhibitors than other products.