Tying research question and analytical strategy when variables are affected by medication use

Ill‐defined research questions could be particularly problematic in an epidemiological setting where measurements fluctuate over time due to intercurrent events, such as medication use. When a research question fails to specify how medication use should be handled methodologically, arbitrary decisions may be made during the analysis phase, which likely leads to a mismatch between the intended question and the performed analysis. The mismatch can result in vastly different or meaningless interpretations of estimated effects. Thus, a research question such as “what is the effect of X on Y?” requires further elaboration, and it should consider whether and how medication use has affected the measurements of interest. In our study, we will discuss how well‐defined questions can be formulated when medication use is involved in observational studies. We will distinguish between a situation where an exposure is affected by medication use and where the outcome of interest is affected by medication use. For each setting, we will give examples of different research questions that could be asked depending on how medication use is considered in the estimand and discuss methodological considerations under each question.


Plain Language Summary
Ill-defined research questions could be particularly problematic in an epidemiological setting where measurements fluctuate over time. One of the reasons for measurements to change over time is medication use. When a research question fails to specify how medication use should be handled methodologically, arbitrary decisions may be made during the analysis phase, which likely leads to a mismatch between the intended question and the performed analysis. The mismatch can result in vastly different or meaningless interpretations of results. Thus, a research question such as "what is the effect of X on Y?" requires further elaboration, and it should consider whether and how medication use has affected the measurements of interest. In our study, we discussed how well-defined questions can be formulated when medication use is involved in observational studies. We gave examples of different research questions that could be asked depending on how medication use is considered in the research aim and discussed methodological considerations under each question.

| INTRODUCTION
A well-defined research question is the cornerstone of research.
Depending on the research question, different theoretical considerations and statistical analyses are required, and most importantly, estimated effects should be interpreted differently. 1,2 Unfortunately, researchers may start performing statistical analyses before their research question is settled with sufficient detail.
Analyses are done first, and the meaning of the estimated effect remains vague. 3 Ill-defined research questions are particularly problematic in an epidemiological setting where measurements fluctuate or change over time. Medication use is one important cause for this change, as it is prescribed to target specific measures. A research question that fails to specify how medication use should be handled methodologically, may lead to arbitrary decisions during the analysis phase, and a subsequent mismatch between the intended research question and the performed analysis.
Suppose that different researchers are interested in the effect of blood pressure (BP) on myocardial infarction (MI) risk. Some researchers may exclude individuals using antihypertensive drugs. The result would be interpreted as the effect of BP on MI in the subset of medication non-users, and it may not be transportable to medication users. Others may be interested in untreated BP values and take a modeling approach to reconstruct BP values without medication; for example, by using methods to account for measurement error. 4 Again, others may ignore the medication information and consider the effect of observed BP, which might have been lowered by medications in the total population. Similar problems arise when BP is studied as an outcome. Thus, a research question such as "what is the effect of X on Y?" requires further elaboration, and it should consider whether and how medication use has affected the measurements of interest.
Numerous authors in causal inference have stressed that exposures should be well-defined. 5 As practical guidance, several authors 4,10-12 discussed statistical methods that could be used when measurements are affected by medication use. However, our recent review of the handling of medication use in medical papers 13 demonstrated that a majority of studies featured vaguely formulated research questions and unclear research aims. Invalid methods were often used and a justification for the chosen method was rarely given. Despite the efforts to raise an awareness, medication use as intercurrent events were overlooked in majority of reviewed papers. Therefore, in this paper, we emphasize the importance of further elaborating on ostensibly straightforward research questions when the exposure or the outcome variable is affected by medication use.
We describe several types of research questions of interest to applied researchers; some formulated within the framework of causal inference, others more explorative in nature. When considering a cause, we take a practical pluralistic perspective; not only manipulable interventions but also "states," such as having a certain level of BP, can be studied as causes. 14, 15 We discuss how medication use is incorporated into each research question and which potential design considerations or methodological challenges may occur. Additionally, we warn against some common approaches to handling medication use, that generally fail to yield interpretable results.
We start this paper by discussing a situation where an exposure, possible time-varying, is affected by medication use with considering five different research aims are considered. Following, we consider five research aims when the outcome of interest may be affected by medication. We conclude with a general discussion.

| SITUATION 1: THE EXPOSURE IS AFFECTED BY MEDICATION USE
Imagine a researcher who is interested in the effect of BP on the severity of COVID-19 in patients who just tested positive for the coronavirus. The time of the positive test is indicated by t.
The outcome, severity of COVID-19, is measured at a certain moment after t. Individuals' BP levels have changed over time before time t, and some people have started using antihypertensive drugs at a certain moment before time t. Depending on research settings, BP may have been measured repeatedly before time t or only at t.
The initial research question, "the effect of BP on the severity of COVID-19," is not well defined; it ignores the fact that BP is varying over time and does not specify which BP values are of interest. For simplicity of the further discussion, let us assume three categories of study participants ( Figure 1A). In category A, individuals had a high BP for a prolonged period and never used antihypertensive drugs. Individuals in category B also had a history of high BP but started using antihypertensive drugs before t. Thus, at time t, their BP is lower than before taking the medication. In category C, individuals had normal BP over time without medication. We use this example to discuss different possible research questions of interest. Throughout the paper, we assume that all confounding factors are measured and dealt with appropriately. severity; for example, people with higher BP values are at a higher risk (e.g., because of inflammation or vessel wall stress), and people with lower values (whether controlled naturally or by antihypertensive medication) are at a lower risk. This is illustrated in Figure 1A, where the BP measurements as they are observed at time t are used as the exposure in the analysis. The analysis here is relatively straightforward. In principle, medication use does not need to be added as an extra variable in the model, unless the medication affects the outcome independently of BP (i.e., medication use is a confounder).
What is the effect of the currently observed BP value on the severity of COVID-19?) The interest is the currently observed exposure value BP levels observed at Ɵme t is used as the exposure.
What is the effect of the history of BP on the severity of COVID-19?
(B) The interest is the exposure trajectory before Ɵme t BP trajectories for each individuals are esƟmated from the repeated measurements.
What is the effect of untreated BP at Ɵme t on the severity of COVID-19?
(C) The interest is in untreated values at Ɵme t The last measurement of BP before using medicaƟon is used as a proxy for the untreated BP at Ɵme t for person B.
What would have happened if all parƟcipants with high BP had been treated with anƟhypertensive drugs?
(D) Interest in the effect of an intervenƟon on BP Aim to esƟmate the causal effect of medicaƟon use on the relaƟonship between BP and the outcome What is the effect of BP on the severity of COVID-19 among the people who did not use anƟhypertensive drugs?

(E) The interest is the untreated populaƟon only
Person B is excluded from the study populaƟon. T A B L E 1 Summary of Section 2 (the exposure is affected by medication use) and Section 3 (the outcome is affected by medication use).

Section 2
The interest is in Untreated BP values at time t better reflect the medical condition than the observed BP after medication.
2.4. the effect of an intervention on the exposure What would have happened if no one had been treated with antihypertensive drugs? A causal effect of intervening on BP on the relationship between BP and the outcome is of interest.
2.5. the untreated population only What is the effect of BP on the severity of COVID-19 among the people who did not use antihypertensive drugs?
The subpopulation of medication non-users is of interest.

Section 3
The interest is in Research question example When or why 3.1. the observed value of the outcome What is the difference in observed BP at age 40 between individuals born with and without genetic factor?
The total effect of gene A on BP that may in part be mediated by using antihypertensive drugs is of interest.
3.2. the outcome value unaffected by medication use What is the effect of the genetic factor A on BP at age 40 if no one would have used antihypertensive drugs?
The biological effect of gene A on BP is of interest, and antihypertensive drug use is considered to have altered the effect of interest.
3.3. medication use as part of the outcome What is the effect of the genetic factor A on the risk of hypertension at age 40?
The fact that a person started using antihypertensive medication is a part of the outcome.

in the outcome values while being untreated What is the difference in BP between individuals born with and without genetic factor
A, while being untreated?
• Only the measurements before treatment may be of interest.
• More meaningful in simulations where measurement after intercurrent events is undefined; that is, quality of life between the treatment group compared over time only in those still alive.

the untreated population
What is the difference in BP between individuals born with and without genetic factor A, in those untreated at age 40?
• It resembles a per-protocol analysis of an RCT.
• Questionable whether this approach corresponds to any sensible and clinically relevant estimand.  Figure 1B).
Still, the "effect of the history of BP" is vaguely defined and needs to be specified. For example, one could be interested in the cumulative BP values during a certain period before t (estimated by the area under the curve), the mean value of BP in a specific period, or the increase in BP over a certain period. In any case, the length of the period of interest before time t should be well defined.
Notably, medication use is not added as a variable in the model, but the effect of medication use is incorporated in the analysis through its effect on subsequent BP levels. Furthermore, in this scenario, confounders should be measured at the time when the follow-up starts.   What is the difference in observed BP at age 40 between individuals born with and without the geneƟc factor A?

| The interest is the untreated exposure value
(A) Interest in the current value of the outcome BP levels observed at Ɵme t is used as the outcome.
What is the effect of the geneƟc factor A on BP at age 40 if no one would have used anƟhypertensive drugs?
(B) Interest in the outcome value unaffected by medicaƟon use When repeated measurements of BP are available, untreated BP level of person b1 at Ɵme t could be esƟmated, for instance, by inverse probability weighƟng.
What is the effect of the geneƟc factor A on the risk of hypertension at age 40?
(C) Consider medicaƟon use as part of the outcome Based on the observed BP at Ɵme t, individuals are categorized to hypertension (HTN; medicaƟon users or having high BP) and no hypertension (no HTN; normal BP) cases.
What is the difference in BP between individuals born with and without geneƟc factor A, while being untreated?

(D) Only interested in the outcome values while being untreated
Person b1 is followed only unƟl he started using medicaƟon.  As an illustration, we consider four hypothetical individuals in Table 2 and Figure 2A. Person a1 and b1 both were born with gene A, which causes high BP. Individual b1 starts using medication. Person a0 and b0 are identical to a1 and b1, respectively, except that they both were born without the gene and did not develop high BP. Person a0 and b0 share identical characteristics, and the difference in Figure 2 only reflects random inter-variability. A summary of the research interests is given in Table 1.

| The interest is the observed value of the outcome
Firstly, the BP levels as observed can be the outcome of interest ( Figure 2A). For example, we may want to compare observed BP levels at age 40 of individuals with gene A to similar individuals born without the gene. In this type of research question, one is interested in the total effect of the exposure on the outcome; that is, an effect that may in part be mediated by using antihypertensive drugs. In counterfactual notation, we are interested in the average total effect of A on the outcome: Y A¼1 is the potential outcome when setting A to 1 and Y A¼0 is the potential outcome when setting A to 0. Young et al. referred to this contrast as the "effect without elimination of competing events." In the clinical trial context, 9 this is referred to as "treatment policy strategy-estimand." 9 The principle of such analysis corresponds to an intention-to-treat analysis in an RCT, as the data is analyzed using the observed outcomes ignoring any intercurrent event or protocol deviation. Therefore, under question 3.1, medication use would be ignored in the analysis.

| The interest is the outcome value unaffected by medication use
Alternatively, the interest could be the biological effect of gene A on BP, where antihypertensive drug use may alter this effect. Here we would ask research questions such as, what is the effect of the genetic variant A on BP at age 40 if no one would have used antihypertensive drugs? In counterfactual notation, we are interested in the effect: ,med¼0 the potential outcome of Y when A is set to 1 and no medication would have been used. This is called "the effect under elimination of competing events." 22 In a clinical trial context, it is referred to as "hypothetical strategy-estimand." 9 Figure 2B depicts this scenario.
Suppose repeated measurements of BP are available and all factors influencing medication use are measured. In that case, this estimand can be estimated using repeated measurement methods, such as linear mixed models or generalized estimation equation methods with inverse probability weighting. 5,21 The BP levels after medication use will not be used in these analyses. If no repeated measurements of BP are available, other methods for handling an outcome variable affected by medication use, such as adding the mean medication effect to the treated measurements or fitting a censored regression model 4,10-12,23,24 may be used. what is the effect of the genetic factor A on the risk of hypertension at age 40? In this case, the outcome is dichotomized into hypertension (high BP and/or using antihypertensive medication) and no hypertension (normal BP and no medication use). This is illustrated in Figure 2C. In other scenarios, using an ordinal scale could be an alternative (e.g., categorizing fasting glucose level into normal glucose, impaired glucose, and diabetes,

| Considering medication use as part of the outcome
where diabetes is defined as glucose level above a certain level or use of diabetes medication). In clinical trials, this type of scenario is called "composite variable strategy-estimand."

| Only interested in the outcome values while being untreated
In Section 3.2, the interest was in the effect of the gene on untreated BP measurements in the total population. In this section and Section 3.5, we consider two strategies that restrict the population based on medication use. Sometimes only the measurements before treatment may be of interest. In that case, one could compare outcomes between the exposure groups at each time point using only the individuals still untreated at that time. In other words, comparing different exposure groups conditionally on being untreated ( Figure 2D). This approach may be called "while untreated strategy," analogous to the EMA guideline where the "while on treatment-estimand" and "while alive-estimand" are discussed.
In general, this comparison will not answer a causal research question because of selection bias; the comparison only involves individuals who are still untreated at the time of comparison. Suppose people born with the genetic variant A (exposed group) are more likely to use antihypertensive drugs. As time passes, more people in the exposed group will be excluded from the comparison, and the remaining individuals in the exposed and unexposed groups are no longer comparable. This issue will arise even if the groups were exchangeable at baseline. When persons can go on and off treatment (treatment episodes), defining a "while untreated strategy" becomes even more complicated, as also measurements in an untreated period after a period of taking drug may be considered in some instances as "while untreated." The definition of "while untreated" should in this case be carefully considered with the clinical context in mind.

| Could the interest be only in the untreated population?
Some studies exclude all measurements of individuals who started medication during follow-up from their analysis, including the measurements before starting medication use. A difference with Section 3.4 is that here the measurements before medication use are removed as well.
This approach resembles a per-protocol analysis of an RCT where only the participants who completed the follow-up without protocol deviation are included in the analysis. 27 Defining whether an individual belongs to a population of interest (i.e., people who are untreated at any time point) based on an event happening after the follow-up started (i.e., medication use) is risky. If the follow-up time increases, more people will start using medication and consequently be excluded from the comparisons, even for the time before using medication. Consequently, this approach can lead to substantial selection bias. 28,29 It is questionable whether this approach corresponds to any sensible and clinically relevant estimand. One of the estimands mentioned by the EMA is the "principal stratum-estimand," which is the effect in subpopulations where a particular intercurrent event would or would not occur. In our example, a principal stratum could be individuals who would not use hypertension medication when their BP would be elevated (e.g., because they have an aversion to medication or are not aware that their BP is too high). We decided not to discuss this in detail as research questions using potential medication use to define a subpopulation are rarely considered. The corresponding analysis is challenging because whether a person is a medication non-user can be observed only if their BP becomes high during the follow-up.

| DISCUSSION
Situations with medication use can be much more complex as multiple medications can be used simultaneously and/or switching between medications may occur. It is also possible that both the exposure and the outcome measurements are affected by medication use.
In addition to medication use, behavioral changes (e.g., starting exercise regularly to regulate high BP) after the baseline could also affect measurements of interest. Needless to say, examples are not limited to BP and BP medication but could be other measurements, such as glucose or lipid levels, and other types of drugs. Numerous sources of potential bias outside those discussed in this paper should be critically considered as well (e.g., how to properly adjust for confounding or illdefined time zero of follow-up: immortal time bias). 28,35 The complexity of the situation, however, should not discourage tackling the problem of measurements affected by medication use.
Rather, it requires additional caution when defining research questions and more rigorous planning on how medication should be handled in the analysis. In any given case, we advise researchers to consciously set a research question and corresponding analytic strategy for handling medication use based on the clinical aim and underlying assumptions.

ACKNOWLEDGMENTS
We thank Dr Sonja Swanson (Erasmus University Medical Center) for a valuable discussion and providing comments on the manuscript.

FUNDING INFORMATION
None.