Simultaneous assessment of short-term gastrointestinal benefits and cardiovascular risks of selective cyclooxygenase 2 inhibitors and nonselective nonsteroidal antiinflammatory drugs: An instrumental variable analysis

Authors

  • Sebastian Schneeweiss,

    Corresponding author
    1. Harvard Medical School and Harvard School of Public Health, Boston, Massachusetts
    • Associate Professor of Medicine and Epidemiology, Harvard Medical School, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital, 1620 Tremont Street, Suite 3030, Boston, MA 021205
    Search for more papers by this author
    • Drs. Solomon and Schneeweiss have in the past received research funding from Merck and Pfizer to study the safety of selective cyclooxygenase 2 inhibitors; however, neither company funded this study.

  • Daniel H. Solomon,

    1. Harvard Medical School, Boston, Massachusetts
    Search for more papers by this author
    • Drs. Solomon and Schneeweiss have in the past received research funding from Merck and Pfizer to study the safety of selective cyclooxygenase 2 inhibitors; however, neither company funded this study.

  • Philip S. Wang,

    1. Harvard Medical School, Boston, Massachusetts
    Search for more papers by this author
  • Jeremy Rassen,

    1. Harvard Medical School, Boston, Massachusetts
    Search for more papers by this author
  • M. Alan Brookhart

    1. Harvard Medical School, Boston, Massachusetts
    Search for more papers by this author

Abstract

Objective

To simultaneously assess the short-term reduction in risk of gastrointestinal (GI) complications and increase in risk of acute myocardial infarction (MI) by celecoxib compared with rofecoxib and several nonselective nonsteroidal antiinflammatory drugs (NSAIDs) using instrumental variable analysis.

Methods

A population of 49,711 Medicare beneficiaries ages 65 years and older who initiated nonselective NSAID or selective cyclooxygenase 2 inhibitor therapy between January 1, 1999, and December 31, 2002, was identified. The increase in risk of GI complications and MI within 180 days after initiation of NSAID (rofecoxib, diclofenac, ibuprofen, and naproxen compared with celecoxib) therapy was assessed using instrumental variable analysis.

Results

Compared with nonselective NSAIDs, celecoxib reduced the risk of GI complications by 1.4 per 100 users but increased the risk of MI by 0.3 per 100 users. Rofecoxib decreased GI complications by 1.1 per 100 users and increased the risk of MI by 0.3 per 100 users. Using celecoxib as the reference exposure showed an increase in the MI risk for rofecoxib (risk difference [RD] 1.40, 95% confidence interval [95% CI] −0.20, 3.01) and diclofenac (RD 6.07, 95% CI −0.02, 12.15). The RD for naproxen as well as its upper 95% CI was the lowest of all NSAIDs (RD −0.30, 95% CI −2.74, 2.14) and there was no significant difference in GI complication rates among all NSAIDs.

Conclusion

In this instrumental variable analysis, diclofenac and rofecoxib had the least favorable benefit–risk balance among NSAIDs in older adults.

The withdrawal of rofecoxib as a consequence of accumulating evidence of its cardiac risks (1–5) has subjected all nonsteroidal antiinflammatory drugs (NSAIDs), including the other marketed selective cyclooxygenase 2 (COX-2) inhibitor, to intensified scrutiny. A Food and Drug Administration advisory committee concluded that the safety profile of COX-2 selective and nonselective NSAIDs needed to be reevaluated with respect to cardiac as well as gastrointestinal (GI) events (6–8).

Although results of randomized controlled trials (RCTs) are available on the relative GI safety of selective COX-2 inhibitors (1, 9) and their risk of acute myocardial infarction (MI) (4, 10–15), most of these RCTs compared a selective COX-2 inhibitor against placebo or a single active comparator (diclofenac, ibuprofen, or naproxen), making it difficult to compare all relevant agents one against the other for both GI and cardiac safety outcomes (16). The RCTs also excluded elderly patients with multiple comorbidities, despite the fact that in routine care, most NSAIDs are prescribed to older adults who have increased baseline risks for GI complications and cardiac events and that physicians must choose between several available NSAIDs (17). This has led to a situation in which physicians treating elderly patients are both hesitant to start patients on selective COX-2 inhibitors and have no comprehensive evidence base from which to determine which nonselective NSAID has the most favorable balance between GI and cardiac risks. Thus, there is an urgent need for direct comparisons regarding the GI toxicity as well as cardiac safety of all NSAIDs in older adults (6). Existing RCTs cannot answer these questions, and a recently initiated RCT comparing celecoxib versus ibuprofen and naproxen (18) will not provide answers soon.

Large health care utilization database studies can examine the safety profiles of several NSAIDs in elderly patients without long delays and without putting patients at risk as a large-scale RCT would (19). However, such epidemiologic studies have been criticized for their incomplete control of important risk factors for cardiac events and GI toxicity, such as the use of aspirin, body mass index, physical activity, smoking, alcohol consumption, and relevant laboratory results (20). Since these risk factors influence the physician's choice of an NSAID and affect patient outcomes, failure to adjust for these risk factors will lead to confounding bias (21). Standard statistical methods can be used to adjust for measured selection factors but fail to adjust for unobserved factors.

To overcome this inability to control for residual confounding by unobserved factors, we applied an analytic approach, well known in economics as instrumental variable estimation (22), that can provide unbiased estimates of causal effects in nonrandomized studies (23) by mimicking random assignment of patients into groups of different likelihood for treatment (24). Although such analyses are rarely used in medicine because of the lack of suitable instruments that fulfill a set of standard assumptions, some instrumental variable estimation studies have influenced the practice of medicine (25, 26).

We have proposed and evaluated an instrument in a related study based on an estimate of the prescribing physician's preference for each of the different therapeutic alternatives under study (27). In that study of the short-term GI side effects of NSAIDs, our instrument was found to yield results that were in close agreement with RCTs, which conventional regression analyses did not (27). The justification for use of this instrument is based on the observation that the prescribing of a specific NSAID in older adults is driven more strongly by physician preference than by recorded patient characteristics (28, 29). Under several assumptions detailed below, instrumental variable estimation provides unbiased estimates of the potential association between specific NSAIDs and health outcomes (23).

Using instrumental variable estimation, we sought to simultaneously assess the reduction in risk of GI complications and the increase in risk of acute MI within 180 days after being started on rofecoxib, ibuprofen, diclofenac, and naproxen against a common comparison group of new celecoxib users.

PATIENTS AND METHODS

Study design and patient population

We conducted a cohort study of patients initiating NSAID therapy among Medicare beneficiaries 65 years of age and older who were enrolled in the Pharmaceutical Assistance Contract for the Elderly (PACE) program provided by the state of Pennsylvania. To be eligible for PACE, annual income must be <$13,000 if single and <$16,200 if married but not low enough to qualify for Medicaid. Enrollees in PACE were eligible for inclusion in the study population if they filled a prescription for any NSAID between January 1, 1999, and December 31, 2002, and demonstrated continuous health care system use (filling at least 1 prescription and utilizing health care services during each of 3 6-month periods before the index date defined below).

Initiation of NSAID use was defined as when an eligible beneficiary filled at least 1 prescription for an NSAID between January 1, 1999, and December 31, 2002, but did not use any NSAID during the 18 months prior to the index date. The index date was the first date an NSAID prescription was filled. Such “new user designs” ensure that all patient covariates are measured before the start of treatment and reduce the potential for under-ascertainment of adverse effects associated with the initiation of a drug (30, 31). Followup was for 180 days after the initiation of therapy; secondary analyses limited followup to 60 and 120 days.

Outpatient and inpatient diagnoses, procedure codes, and dates of all inpatient and outpatient services were obtained from Medicare claims data. All personal identifiers were transformed into anonymous study numbers to protect the privacy of patients and physicians. The study was approved by the Institutional Review Board of the Brigham and Women's Hospital, and data use agreements were obtained from the Centers for Medicare and Medicaid Services.

Study outcomes

Severe GI complications were defined as either a hospitalization for GI hemorrhage and peptic ulcer disease complications, including perforations coded as International Classification of Diseases, Ninth Revision (ICD-9) discharge diagnoses 531.x, 532.x, 533.x, 534.x, 535.x, or 578.x in the first or second position (to exclude in-hospital complications), or a physician service code for GI hemorrhage. These definitions were validated in 1,762 patients in a hospital discharge database in Saskatchewan, Canada, with a composite positive predictive value (PPV) of 90% (32). Acute MI was defined as a hospitalization with ICD-9 code 410 in the first or second position and a stay of at least 4 days and a maximum stay of 180 days (PPV 94%) (33). The first occurrence of either outcome was included in the analysis. Outpatient cardiac deaths could not be captured reliably and were not included as outcomes.

Exposure to NSAIDs

Prescription drug information was assessed based on pharmacy claims from PACE, with detailed and highly accurate information (34, 35) on drug name, dosage, quantity, and date of dispensing. All NSAIDs commercially available in the US during the study period were fully covered by PACE, with a fixed copayment of $6. Ibuprofen and naproxen were available over the counter but were also fully covered. New NSAID exposure at study start was grouped into 1) rofecoxib, 2) diclofenac, 3) naproxen, 4) ibuprofen, 5) other NSAIDs (etodolac, fenoprofen, flurbiprofen, indomethacin, ketoprofen, ketorolac tromethamine, meclofenamate sodium, nabumetone, oxaprozin, piroxicam, sulindac, and tolmetin sodium), and 6) the common reference group of celecoxib. Celecoxib was chosen as the reference group because it was the largest single NSAID user group. There were too few new users of valdecoxib to perform meaningful analyses. The exposure risk window for an NSAID dispensing was assumed to be 180 days. Because some new users of NSAIDs have short-term indications for NSAID therapy, we conducted secondary analyses for 60 days and 120 days. Patients discontinuing treatment were not censored from the analysis because of the potential for overestimation or underestimation by possible copredictors for discontinuation and outcome, while the current analysis could lead to some underestimation analogous to that seen in intent-to-treat (ITT) analyses of RCTs.

Instrumental variable analysis.

We estimated the effect of COX-2 selective and nonselective NSAID exposure on GI complications and acute MIs using instrumental variable estimation to reduce confounding by unmeasured factors (36). The instrumental variable that we used was based on an estimate of the physician's preference for prescribing each of the NSAIDs under study. This instrument has been described in more detail in a related study by our group (27). First, we classified each patient by the NSAID he or she was prescribed (actual treatment), and then examined the most recent prior prescription written by the same physician to any other patient in the cohort. This most recent NSAID prescription to a patient initiating NSAID therapy was used as an indication of the physician's current prescribing preference (expected treatment or instrument) (27). Accordingly, if the last new NSAID prescription written by a physician was for celecoxib, then for the current patient, the physician was classified as a “celecoxib prescriber.” If 2 or more prescriptions were filled the same day, 1 was randomly picked to determine prescribing preference. This instrument was then used as a surrogate for the actual treatment, and is largely independent of patient characteristics.

An instrumental variable is a factor that is related to treatment but is unrelated to observed and unobserved patient risk factors. It must also be unrelated to outcome, other than through its relationship to treatment. These are 2 fundamental assumptions for valid instrumental variable estimation (23, 37). The more strongly an instrumental variable is related to the actual treatment, the more precise the estimate will be. Using preference as an instrument is based on the premise that, for most users, the type of NSAID prescribed is unrelated to patient risk factors after adjustment for known patient attributes. Preference will therefore be related to actual treatment, but not directly related to outcome. Using the last prescription written as a measure of preference takes into account that NSAID prescribing patterns may have varied over the 3-year study period, and is therefore potentially more valid than using information on all prescriptions written by a physician within a recent time period (27).

The key assumptions for valid instrumental variable estimation are similar to those of RCTs. In RCTs, the random assignment into exposed and unexposed groups does not have an effect on outcome directly, but rather, through the assigned group's close correlation with actual treatment. The RCT framework is analogous to that of our instrumental variable analysis, in which randomized assignment is replaced by the patient's quasi-random choice of a physician, independent of the physician's prescribing preference. In RCTs, randomization balances measured and unmeasured patient risk factors across treatment arms at the time that randomization occurs. In a similar vein, we examined the balance of measured patient attributes between instrumental variable categories at the time of treatment initiation; if the balance of measured confounders could be demonstrated, one would think that such balance carried over to unmeasured confounders as well. Finally, similar to an ITT analysis in an RCT, our analysis assumed that patients remained on continuous therapy for the duration of the study.

We used 2-stage ordinary least squares (OLS) regression for the instrumental variable estimation, and risk differences (RDs) are reported per 100 patients. In the first stage, we modeled actual treatment as a function of the instrument to estimate expected treatments for each patient. In a second stage, we modeled study outcomes as a function of the expected treatment, as opposed to the actual treatment. Both stages were adjusted for measured patient characteristics (see below) (38). Linear regression to estimate RD is valid in large samples such as the one in the present study. Because patient-level observations were clustered by physicians, standard errors of the regression parameters were computed robustly to account for the within-physician correlation of outcomes (Stata software, version 9; Stata, College Station, TX).

Conventional multivariate analysis.

The instrumental variable analysis was compared with a conventional multivariate analysis using the identical study design. We modeled GI complications and acute MIs in 2 separate OLS linear regressions as a function of the previously specified NSAID groups, using celecoxib as the common reference group. Analyses were adjusted for risk factors of NSAID-associated gastrotoxicity and acute MI (39–41): age (≥75 years), history of peptic ulcer disease, prior GI hemorrhage, past use of gastroprotective drugs, warfarin sodium use, oral glucocorticoid use, and history of coronary artery disease, hypertension, or congestive heart failure in the 18 months before the index date, as well as concurrent use of gastroprotective drugs. Other markers of comorbidity included nursing home residence, history of hospitalization in an acute care facility, number of ambulatory physician visits, Charlson comorbidity score (42), and acute comorbid disease activity measured as the number of different medications each patient received in the year prior to the index date (43). RDs and their 95% confidence intervals (95% CIs) are reported. We repeated the conventional analysis using propensity scores, controlling for all patient characteristics. Propensity scores are known to balance all measured covariates (44) and, in our study, consisted of the predicted probability of receiving each of the 5 study NSAIDs in contrast to celecoxib. These 5 pairwise propensity scores were then entered as deciles in 5 separate linear regression models for each of the 2 study outcomes to estimate adjusted RDs.

We used the Sargan test (45) to evaluate whether OLS regression (our conventional analysis) provided a consistent estimator of individual model parameters compared with the instrumental variable analysis, the null hypothesis being that the OLS analysis is consistent. This test is based on the assumption that the instrumental variable estimate is consistent. If this test indicated that the instrumental variable analysis did not improve validity over the OLS (P > 0.05 by 2-sided test), we interpreted the OLS parameter estimates because of their smaller variance.

RESULTS

During the study period, 50,360 new users of NSAIDs were identified; 649 patients were excluded from the analysis since their prescriptions had missing physician ID numbers. The final sample size for the conventional analysis was 49,711. The instrumental variable analysis was limited to 37,650 patients since the first prescription by each physician was used to assess his or her NSAID preference and did not contribute to the second stage of the instrumental variable estimation.

Celecoxib and rofecoxib were the 2 most frequently prescribed NSAIDs in this population (Table 1). There was substantial imbalance in most clinical covariates between actual exposure groups, including prior use of gastroprotective drugs as a marker for preexisting GI irritability, which was prevalent in 28% of celecoxib users and 18% in naproxen users (Table 1). Due to the large sample size, all P values were less than 0.001. When patient characteristics were compared using instrumental variable status (the expected NSAID exposure), this imbalance was substantially reduced, thereby suggesting that unmeasured confounders are also more balanced, which would in turn lead to less biased effect estimates (Table 2).

Table 1. Characteristics of 49,711 new NSAID users or selective COX-2 inhibitor users among subscribers to the Pharmaceutical Assistance Contract for the Elderly in Pennsylvania in 1999–2003*
 Celecoxib (n = 19,842)Rofecoxib (n = 12,232)Diclofenac (n = 1,817)Ibuprofen (n = 5,353)Naproxen (n = 4,139)Other NSAIDs (n = 6,328)Maximum difference in percentage
  • *

    Values are the number (%) of patients. NSAID = nonsteroidal antiinflammatory drug; COX-2 = cyclooxygenase 2.

Age ≥75 years14,850 (75)9,245 (76)1,205 (66)3,456 (65)2,601 (63)4,251 (67)13
Female17,046 (86)10,506 (86)1,514 (83)4,308 (80)3,420 (83)5,063 (80)6
Charlson comorbidity score ≥115,169 (76)9,189 (75)1,289 (71)3,834 (72)2,814 (68)4,591 (73)8
Use of >4 distinct drugs in prior year15,006 (76)9,122 (75)1,222 (67)3,671 (69)2,694 (65)4,266 (67)11
>4 physician visits in prior year14,327 (72)8,594 (70)1,267 (70)3,269 (61)2,542 (61)4,286 (68)11
Hospitalized in prior year6,041 (30)3,767 (31)421 (23)1,544 (29)964 (23)1,669 (26)8
Nursing home resident1,636 (8)1,037 (8)75 (4)397 (7)180 (4)345 (5)4
Prior use of gastroprotective drugs5,636 (28)3,154 (26)388 (21)1,116 (21)753 (18)1,344 (21)10
Prior use of warfarin sodium2,704 (14)1,552 (13)106 (6)321 (6)233 (6)493 (8)8
Prior use of oral steroids1,710 (9)1,092 (9)165 (9)446 (8)268 (6)495 (8)3
History of osteoarthritis9,969 (50)5,585 (46)776 (43)1,649 (31)1,276 (31)2,198 (35)19
History of rheumatoid arthritis1,130 (6)473 (4)73 (4)121 (2)101 (2)181 (3)4
History of peptic ulcer disease771 (4)418 (3)55 (3)116 (2)96 (2)159 (3)2
History of gastrointestinal hemorrhage358 (2)193 (2)16 (1)58 (1)39 (1)83 (1)1
History of hypertension14,453 (73)8,887 (73)1,280 (70)3,731 (70)2,887 (70)4,470 (71)3
History of congestive heart failure6,114 (31)3,617 (30)429 (24)1,332 (25)842 (20)1,727 (27)11
History of coronary artery disease3,303 (17)1,966 (16)263 (14)797 (15)554 (13)990 (16)3
Table 2. Characteristics of 37,650 new NSAID users by their instrumental variable status*
 Celecoxib (n = 16,111)Rofecoxib (n = 9,096)Diclofenac (n = 1,451)Ibuprofen (n = 3,416)Naproxen (n = 2,853)Other NSAIDs (n = 4,723)Maximum difference in percentage
  • *

    Values are the number (%) of patients. The number of nonsteroidal antiinflammatory drug (NSAID) users in the instrumental variable analysis is reduced by 12,061 patients because the first dispensing of the target drugs by each physician was used to determine the instrumental variable status of the subsequent patient (see Patients and Methods for details). Since this reduction in sample size is unrelated to patient or physician characteristics, this will not bias subsequent comparisons between conventional and instrumental variable analyses.

Age ≥75 years11,653 (72)6,650 (73)1,048 (72)2,429 (71)2,054 (72)3,357 (71)2
Female13,597 (84)7,685 (84)1,212 (84)2,838 (83)2,387 (84)4,032 (85)2
Charlson comorbidity score ≥112,127 (75)6,772 (74)1,037 (71)2,503 (73)2,030 (71)3,464 (73)4
Use of >4 distinct drugs in prior year11,809 (73)6,596 (73)1,035 (71)2,403 (70)2,014 (71)3,328 (70)3
>4 physician visits in prior year11,292 (70)6,176 (68)1,020 (70)2,230 (65)1,875 (66)3,233 (68)5
Hospitalized in prior year4,725 (29)2,515 (28)419 (29)995 (29)744 (26)1,255 (27)3
Nursing home resident1,282 (8)674 (7)98 (7)259 (8)192 (7)312 (7)1
Prior use of gastroprotective drugs4,164 (26)2,032 (22)392 (27)811 (24)659 (23)1,150 (24)5
Prior use of warfarin sodium1,926 (12)1,050 (12)156 (11)328 (10)280 (10)481 (10)2
Prior use of oral steroids1,377 (9)722 (8)113 (8)274 (8)215 (8)396 (8)1
History of osteoarthritis7,481 (46)3,918 (43)641 (44)1,355 (40)1,128 (40)2,038 (43)6
History of rheumatoid arthritis794 (5)337 (4)61 (4)118 (3)89 (3)175 (4)2
History of peptic ulcer disease543 (3)275 (3)48 (3)102 (3)82 (3)149 (3)0
History of gastrointestinal hemorrhage240 (1)129 (1)22 (2)46 (1)36 (1)70 (1)1
History of hypertension11,652 (72)6,578 (72)1,027 (71)2,435 (71)2,020 (71)3,376 (71)1
History of congestive heart failure4,707 (29)2,575 (28)391 (27)997 (29)765 (27)1,279 (27)2
History of coronary artery disease2,603 (16)1,352 (15)220 (15)533 (16)408 (14)699 (15)2

To evaluate the strength of the instrument, we calculated the proportion of NSAID types at the index date that was correctly predicted by the instrumental variable based on the previous NSAID prescription by the same physician (Table 3). The proportion of patients being treated with an NSAID that was the same as the last prescription by the same physician varied between 15% (diclofenac) and 54% (celecoxib). These proportions were substantially higher than those of the alternative treatment choices, indicating a strong instrument (25).

Table 3. Distribution of the type of index NSAID, depending on the last NSAID prescribed by the same physician*
Last NSAIDIndex NSAIDTotal
CelecoxibRofecoxibDiclofenacIbuprofenNaproxenOther NSAIDs
  • *

    Values are the percentage of instances in which the last nonsteroidal antiinflammatory drug (NSAID) prescribed by the physician to the index patient was the same as the NSAID prescribed by that physician to the previous patient. For each NSAID, the proportion of patients for whom the last NSAID (i.e., the instrument) agreed with the actual treatment is shown in boldface.

Celecoxib54.222.62.66.55.29.0100
Rofecoxib30.747.22.16.75.87.6100
Diclofenac37.620.515.08.56.112.3100
Ibuprofen30.721.32.824.58.112.6100
Naproxen29.921.83.28.824.212.1100
Other NSAIDs35.321.24.19.48.721.2100

Within 180 days, a total of 746 patients had a GI complication and 698 patients had an acute MI. The unadjusted 180-day risk of GI complications per 100 ranged from 1.27 (ibuprofen) to 1.73 (rofecoxib) and, due to confounding, made the coxibs appear more gastrotoxic than most nonselective NSAIDs (Table 4).

Table 4. Risks and unadjusted RDs for GI complications and acute MI after 180 days, stratified by NSAID group and calculated for the actual treatment groups*
 GI complicationsAcute MI
EventsExposedRiskRD (95% CI)EventsExposedRiskRD (95% CI)
  • *

    GI = gastrointestinal; MI = myocardial infarction; NSAID = nonsteroidal antiinflammatory drug; 95% CI = 95% confidence interval.

  • Risk difference (RD) is per 100 patients.

Celecoxib29119,8421.47 31319,8421.58 
Rofecoxib21212,2321.730.27 (−0.02, 0.55)19112,2321.56−0.02 (−0.30, 0.26)
Diclofenac291,8171.600.13 (−0.47, 0.73)281,8171.54−0.04 (−0.63, 0.56)
Ibuprofen685,3531.27−0.20 (−0.54, 0.15)645,3531.20−0.38 (−0.72, −0.04)
Naproxen604,1391.45−0.02 (−0.42, 0.38)424,1391.01−0.56 (−0.91, −0.21)
Others866,3281.36−0.11 (−0.44, 0.22)606,3280.95−0.63 (−0.92, −0.33)

While the conventional analysis failed to show a meaningful decrease in short-term gastrotoxicity by the coxibs, the instrumental variable estimation showed a significant short-term risk reduction by celecoxib and rofecoxib (Table 5). Additional adjustment of the conventional models for deciles of propensity scores did not change results by more than ±5%. Sargan tests indicated that ordinary least squares regression did not provide consistent estimates compared with instrumental variable estimation (P < 0.05).

Table 5. RDs for GI complications and acute MI during the first 60, 120, and 180 days after the start of selective COX-2 inhibitor therapy compared with all nonselective NSAIDs combined*
 Conventional multivariate adjusted analysis (OLS)Instrumental variable adjusted analysis
GI complication, RD (95% CI)Acute MI, RD (95% CI)GI complication, RD (95% CI)Acute MI, RD (95% CI)
  • *

    COX-2 = cyclooxygenase 2; OLS = ordinary least squares (see Table 4 for other definitions).

  • Adjusted for age, sex, hypertension, congestive heart failure, coronary artery disease, osteoarthritis, rheumatoid arthritis, peptic ulcer disease, hemorrhage, race, gastroprotective drug use, warfarin sodium use, steroid use, Charlson index, physician visits, hospitalizations, and nursing home residence.

  • P < 0.05 versus conventional multivariate adjusted analysis by Sargan test, i.e., there was a significant difference between results obtained by conventional multivariate analysis and results obtained by instrumental variable analysis, suggesting that one should prefer the instrumental variable analysis.

  • §

    0.05 < P < 0.1 versus conventional multivariate adjusted analysis by Sargan test, i.e., there was a significant difference between results obtained by conventional multivariate analysis and results obtained by instrumental variable analysis, suggesting that one should prefer the instrumental variable analysis.

Celecoxib
 60 days−0.13 (−0.30, 0.03)0.15 (0.00, 0.29)−1.07 (−2.07, −0.07)−0.10 (−0.94, 0.73)
 120 days−0.18 (−0.40, 0.04)0.33 (0.14, 0.52)−1.63 (−2.91, −0.35)−0.22 (−1.32, 0.88)
 180 days−0.18 (−0.43, 0.07)0.34 (0.10, 0.57)−1.42 (−2.89, 0.04)§−0.68 (−2.01, 0.64)
Rofecoxib
 60 days0.10 (−0.11, 0.30)0.15 (−0.01, 0.31)−1.12 (−2.15, −0.10)−0.27 (−1.17, 0.62)
 120 days0.07 (−0.197, 0.33)0.32 (0.09, 0.54)−1.12 (−2.52, 0.28)§0.40 (−0.86, 1.66)
 180 days0.11 (−0.19, 0.41)0.30 (0.03, 0.58)−1.13 (−2.71, 0.45)0.71 (−0.80, 2.23)

Significantly increased risks of MI were found in rofecoxib users (RD 0.30 per 100) as well as celecoxib users (RD 0.34 per 100) compared with nonselective NSAIDs in the conventional multivariate analysis. Here the Sargan tests did not indicate any difference compared with the instrumental variable analysis and we therefore interpret the statistically more efficient conventional OLS estimate.

Based on the instrumental variable analysis in Table 5, celecoxib reduced the risk of GI complications by 1.4 per 100 users at 180 days compared with nonselective NSAIDs. However, by OLS estimate, it increased the risk for MI by 0.3 per 100 users. Rofecoxib decreased GI complications by 1.1 per 100 users based on the instrumental variable analysis, but increased the risk of MI by 0.3 per 100 (by OLS estimate).

In a final instrumental variable model (Table 6) using celecoxib as the reference, exposure showed an increase in the risk of MI for rofecoxib (RD 1.40, 95% CI −0.20, 3.01) and diclofenac (RD 6.07, 95% CI −0.02, 12.15). While this instrumental variable analysis was not powered to detect small effects of naproxen on MI, the RD as well as the upper 95% CI for naproxen were the lowest of all NSAIDs (RD −0.30, 95% CI −2.74, 2.14).

Table 6. Results obtained in an instrumental variable model comparing 2 selective COX-2 inhibitors and 3 nonselective NSAIDs for differences in risk of GI complications and acute MI during the first 180 days after the start of therapy*
 Instrumental variable adjusted analysis
GI complications, RD per 100 (95% CI)Acute MI, RD per 100 (95% CI)
  • *

    COX-2 = cyclooxygenase 2 (see Table 4 for other definitions).

  • Instrumental variable analysis adjusted for age, sex, hypertension, congestive heart failure, coronary heart disease, osteoarthritis, rheumatoid arthritis, peptic ulcer disease, hemorrhage, race, past and concurrent gastroprotective drug use, warfarin sodium use, steroid use, Charlson index, physician visits, hospitalizations, and nursing home residence.

  • 0.05 < P < 0.1 versus conventional multivariate analysis by Sargan test, i.e., there was a significant difference between results obtained by conventional multivariate analysis and results obtained by instrumental variable analysis, suggesting that one should the instrumental variable analysis.

  • §

    P < 0.05 versus conventional multivariate analysis by Sargan test, i.e., there was a significant difference between results obtained by conventional multivariate analysis and results obtained by instrumental variable analysis, suggesting that one should the instrumental variable analysis.

Celecoxib0.00 (reference)0.00 (reference)
Rofecoxib0.30 (−1.28, 1.89)1.40 (−0.20, 3.01)
Diclofenac5.09 (−1.18, 11.36)6.07 (−0.02, 12.15)§
Ibuprofen0.88 (−1.93, 3.68)−0.01 (−2.49, 2.46)
Naproxen0.74 (−2.04, 3.52)−0.30 (−2.74, 2.14)

DISCUSSION

We have used instrumental variable estimation methods in a large claims database study to simultaneously assess the short-term benefits and risks of NSAIDs with respect to GI complications and acute MI in elderly patients. We did this in an effort to reduce the potential for bias by unobserved confounders, including unmeasured behavioral risk factors and over-the-counter aspirin/NSAID use. Such head-to-head comparisons of risks and benefits of several NSAIDs can help physicians make more informed choices about NSAID prescribing in elderly patients who are currently unstudied in randomized trials but are the largest group of NSAID users.

The instrumental variable analysis revealed that celecoxib and rofecoxib both produce a significant short-term reduction in GI complications compared with all nonselective NSAIDs combined. When compared with celecoxib, diclofenac and rofecoxib appear to have the least favorable safety profile among individual NSAIDs, with an increased risk for acute MI and no GI benefits within 6 months after initiation, while in these elderly patients, naproxen has a benefit–risk balance similar to that of celecoxib.

The results from this simultaneous head-to-head comparison of potential benefits and risks suggest that while celecoxib and rofecoxib have some GI benefits in elderly patients, there are significant cardiovascular risks involved with rofecoxib and diclofenac and lesser risks for celecoxib and naproxen. Based on these data, the benefit–risk balance is therefore least favorable for rofecoxib and diclofenac in elderly patients. If these estimates were multiplied by estimates of the quality of life after GI complications and after acute MI, it is likely that the benefit–risk balance would tilt even further.

We used instrumental variable estimation in an effort to reduce confounding. We observed evidence that the conventional regression analysis of GI complication outcome has residual confounding, as shown in the effect estimates and in the Sargan tests. Although a direct comparison of our results with those of RCTs is not appropriate because of substantial differences in study populations (age and the restrictive RCT inclusion criteria), it is still reassuring that findings of our instrumental variable analysis seem consistent. For example, the reduction in GI risk of celecoxib compared with ibuprofen and diclofenac was −0.96 per 100 in the Celecoxib Long-Term Arthritis Safety Study (9). These results could not be reproduced using conventional multivariate regression techniques or propensity score analyses because both methods are unable to adjust unobserved patient risk factors (46, 47). The instrumental variable analysis did not change the interpretation of the association of NSAID use with MI. This is not surprising, because MIs were unexpected outcomes during the study period and, in contrast to GI complications, cardiovascular risk factors were not influencing the physicians' prescribing decision, resulting in analyses less biased by confounding by indication.

Like other statistical approaches, the validity of instrumental variable estimation relies on assumptions. First, the instrument must be related to the actual exposure, which we could demonstrate in our study.

Second, an instrument must not be correlated with patient risk factors conditional on measured and adjusted covariates. We demonstrated that large imbalances of risk factors among the actual treatment groups (Table 1) were substantially reduced in the instrumental variable analysis (Table 2), suggesting to us that unmeasured risk factors were likely to be equally well balanced. However, some of what appears to be physician preference for a specific NSAID could actually be a clustering of patients with higher risk of study outcomes within specific practices. For example, physicians who were seeing patients of higher cardiovascular risk may have more frequently prescribed rofecoxib. Such a group of physicians could be rheumatologists, since rheumatoid arthritis is related to an increased risk of MI and rheumatologists may also have been more likely to prescribe rofecoxib. Such clustering of unmeasured risk factors in physicians would bias our instrumental variable estimates. We performed several subgroup analyses by restricting the study population to specific physician and patient characteristics to test the sensitivity of the instrumental variable estimation. While confidence limits of the instrumental variable estimates became wider due to the smaller sample sizes, point estimates did not vary meaningfully, suggesting that such clustering bias is unlikely in this study (27).

Third, the instrument must not be associated with the study outcome other than through the actual exposure. While it can generally be assumed that a physician's preference for a specific NSAID cannot directly influence the next patient's outcome other than through the actual treatment, physicians can influence an outcome in ways other than through the choice of the study drugs. For example, physicians who are high prescribers of celecoxib may also be more likely to screen for and treat heart disease more aggressively.

Finally, instrumental variable methods can yield biased estimates of treatment effect in uncommon situations in which strong unmeasured effect modifiers are used by physicians to make prescribing decisions (36). This limitation applies equally to conventional analyses, but, as we showed, the instrumental variable analysis achieved a better balance of measured patient risk factors than the conventional analysis.

Our study, which focused on initiators of NSAIDs, used an analysis comparable with ITT analysis in RCTs. However, in routine care, many patients are known not to adhere to NSAID therapy over a prolonged period of time. This is comparable with the nonadherence and treatment crossover that is to be expected in an uncontrolled routine care setting and can lead to attenuated effect estimates. In a secondary analysis, we excluded NSAID initiators whose initial prescription had ≤15 days of supply, suggesting one-time use. Such an analysis did not meaningfully alter the instrumental variable and conventional estimates. We did not censor patients who did not refill NSAID prescriptions, since discontinuation is likely informative (due to treatment intolerance or a lack of effectiveness) and may therefore introduce bias.

Despite the large study size, our instrumental variable approach yielded estimates with large standard errors. Statistical inefficiency is a limitation of instrumental variable methods, particularly those based on weak instruments. In our analysis, the wide confidence intervals limited the conclusions that we could draw from our analysis. The precision of instrumental variable estimates can be improved by further increasing the study size. Medicare Part D data will contain drug exposure information on most Medicare beneficiaries, providing a vast, population-level dataset. Such large datasets of elderly patients will also result in more patients per physician and may therefore allow us to construct a more precise estimate of preference using multiple, recently written prescriptions. This will lead to a stronger instrument and more precise estimates. The current study size also did not allow for stratifying patients into levels of NSAID dosage, so the interpretation of results is limited to typical initial dosing in older adults.

Although most readers of medical journals are now used to ratio effect measures such as relative risks, RDs are more directly interpretable as the risk attributable to a specific exposure. Relative risks tend to become smaller in populations with high background risk independent of the exposure effect (48) and are less useful unless such baseline risks are reported. We used linear regression analysis to estimate RDs. Approximating a binomial error distribution with a normal is justifiable with the large number of study subjects.

Our instrumental variable analysis found that both rofecoxib and celecoxib have clinically significant gastroprotective effects and that rofecoxib and diclofenac appear to increase the short-term risk of MI relative to celecoxib. These results suggest that diclofenac and rofecoxib have the least favorable benefit–risk balance among NSAIDs in older adults, while naproxen, ibuprofen, and celecoxib are more favorable.

Ancillary