Funding agencies: This study was supported by the Chief Scientist Office of the Scottish Government (Clinical Academic Fellowship CAF/12/05) and the University of Aberdeen.
Relevant conflicts of interest/financial disclosures: Dr. Macleod is funded by a Clinical Academic Fellowship from the Scottish Chief Scientist Office. Dr. Taylor received funding from the University of Aberdeen to carry out the research related to this manuscript. Dr. Counsell reports no financial disclosures relating to the research covered in the manuscript. We declare we have no conflicts of interest.
Full financial disclosures and author roles may be found in the online version of this article.
Despite major advances in the understanding of its pathophysiology and genetics, many aspects of the prognosis of Parkinson's disease (PD) remain unclear. Improved understanding of PD prognosis would allow better information and perhaps tailored treatment to be given to patients and caregivers, improved health-service planning, and improved clinical trial design. Knowledge of predictors of poor prognosis may enable individually tailored predictions and allow targeting of treatments, particularly if this information is incorporated into a prognostic model.
One important aspect of PD prognosis is mortality, which has been reviewed nonsystematically in several papers.[1-5] These showed generally increased but markedly heterogeneous mortality ratios with no formal analysis of what caused the heterogeneity. Recently, a meta-analysis of mortality in PD has been published, but this only included 8 studies and excluded retrospective studies and studies reporting unadjusted risk ratios or standardized mortality ratios (SMRs). Several authors have stated that mortality from PD reduced after the introduction of levodopa (L-dopa) in the late 1960s.[7-9] One review concluded that mortality had reduced initially but that it subsequently increased to a level similar to the pre–L-dopa era. However, these observations were based only on one pre–L-dopa study as the comparator, and those reviews, including another pre–L-dopa study, have expressed uncertainty about this.[3, 5]
Given the absence of a comprehensive systematic review and the ongoing uncertainty of the impact of PD on mortality, we aimed to: 1) conduct a systematic review of studies of mortality in PD; 2) perform meta-analysis of mortality outcomes; 3) use meta-regression to explore heterogeneity in mortality estimates; and 4) describe which independent predictors of mortality have been identified in previous studies.
We developed a protocol with predefined inclusion and exclusion criteria (http://www.abdn.ac.uk/iahs/documents/AM_Mortality_systematic_review_protocol.pdf). We sought to include all observational studies of mortality in PD, with follow-up of at least 1 year, reporting quantitative measures of mortality (either comparisons with a control population or survival in a PD cohort) or post-mortem series reporting disease duration at death. We excluded highly selected cohorts (for example, only very-young-onset, demented, or surgically treated patients); studies of parkinsonism in general; studies in which the diagnosis of PD was based on death certification (because it has low sensitivity for detecting PD); and studies only published in abstract form. We included studies of males only and studies of only those aged older than 55 years or older than 65 years (because these would include most patients with PD) but excluded studies of only those aged older than 70 years.
We searched several electronic databases (MEDLINE 1946 to 2012, Embase 1947 to 2012, CINAHL 1988 to 2012, and Web of Science 1970 to 2012, last searched 12 October 2012) and reviewed reference lists of included studies and relevant reviews identified in the search. The electronic searches used keyword and free-text terms for PD combined with terms for mortality or prognosis (see Supplemental Data appendix 1). Electronic searches were validated against hand searches of Movement Disorders and Neurology from 2006 to 2010. Sensitivity and specificity were calculated using the hand searches as the gold standard.
Assessment of Studies
References were downloaded to bibliographic software and de-duplicated. Titles and abstracts of studies identified from the search strategy were reviewed by 2 authors (A.D.M. and K.S.M.T.), and the full text of potentially relevant articles was obtained. Foreign language articles were translated using a translator or the website translate.google.co.uk. We predefined a scale to assess risk of bias in each study (see Supplemental Data Appendix 2), but we have only reported 4 items (representativeness of the cohort, adequate confirmation of diagnosis, comparability of controls, and excessive losses to follow-up), because the other items (selection of control and reliability of outcome assessment) scored a low risk of bias in all studies.
Data on methodological features, demographic characteristics, and mortality outcomes were extracted from full-text articles to a paper data extraction form by either A.D.M. or K.S.M.T., unblinded to study details (blinding is time-consuming and only minimally influences results). The data extraction from a subset of studies was checked independently by C.E.C. Any differences were resolved by discussion.
Some data processing was necessary. Where possible, standard deviations were calculated from standard errors; relative risks were calculated from counts of deaths; median survival was measured from Kaplan-Meier plots; and the confidence intervals (CIs) of SMRs were calculated using Ury's shortcut method.
Meta-analyses on mortality ratios (SMRs and relative risks [RRs]) and time until death data were performed using the DerSimonian and Laird random-effects model. When mortality ratios were reported at multiple times, we used the measurement nearest to 10 years. Meta-analysis of ratios was performed on a logarithmic scale. The standard error of log(SMR) was calculated thus:
Studies were stratified into inception (i.e., all participants recruited at, or soon after, diagnosis) and non-inception cohorts. Heterogeneity was assessed using the I2 statistic. Small-study effects were assessed by visual inspection of a funnel plot and Egger's test. We used random-effects meta-regression to explore prespecified demographic and study-quality variables (as listed in Supplemental Data Table 3) that might explain heterogeneity in the study estimates of mortality ratio using a residual maximum likelihood algorithm. We also performed meta-regression analysis on mean time-to-death data in post-mortem studies. We first performed univariable meta-regression and then created a multivariable meta-regression model using a backward stepwise method with a P-value of 0.1 as the cutoff for retention in the model, with a Monte Carlo permutation to lower the possibility of spurious results. We assessed the robustness of the final model by re-running the model with each variable removed in turn.
We plotted survival probability by follow-up duration and performed inverse-variance–weighted least squares regression (variance calculated as p[1 – p]/n, where p = proportion surviving and n = number in the cohort) of survival probability on time, using the latest measured survival in each study. We also plotted survival probabilities within those studies that reported survival at multiple follow-up times. Studies restricted to older patients (cutoffs older than 55 years) were not included in analyses of survival probability, because older age groups have shorter survival. We used descriptive statistics for median survival data and assessed the effect of different baselines for measurement (diagnosis, onset, or recruitment) with the Wilcoxon rank-sum test. Prognostic factors reported in individual studies were tabulated, including all factors examined for association with mortality (where reported) and which of these were independently associated with mortality. All statistical analyses were performed using Stata version 12.1 (StataCorp LP, College Station, TX, USA).
Figure 1 shows the results of the searches. One further study was identified from the validation hand-searches. The overall sensitivity and specificity of the electronic search strategy were 93% and 91%, respectively. Eighty-eight studies were included in the review.
The methods of included studies are displayed in Supplemental Data Table 1. Twenty studies were inception cohorts, 56 were non-inception cohorts, and 12 were retrospective series of patients who had all died (10 of which were autopsy series and 2 only contained clinical data). Forty-seven studies were hospital based, 24 were community based, 3 included separate hospital- and community-based cohorts, 12 were trial-based, and 2 did not specify. Four studies were carried out in the pre–L-dopa period, and 8 straddled the introduction of L-dopa. Sixty-eight studies only included idiopathic PD, 5 also included a small proportion of cases with post-encephalitic parkinsonism, 14 included other degenerative forms of parkinsonism (older studies of PD before the importance of some atypical features was recognized and studies of parkinsonism based on coding for PD or pharmacy data without validation of cases), and 1 did not state which diagnoses were included but included only patients in early trials of L-dopa. Many studies had exclusion criteria relating to atypical features, but 12 had more extensive exclusion criteria, predominantly relating to co-morbid diseases. Six studies only included older adults (older than 55 or older than 65), and 2 only included men. Only 5 studies met all 4 quality criteria. Sixteen studies met the representativeness criterion, 47 had adequate confirmation of diagnosis, 49 used comparable control comparators, and 54 had acceptable losses to follow-up. Some studies with overlapping groups of participants were included in the review (see Supplemental Data Table 1), although no overlapping studies were included in the same analysis. The results of the included studies are displayed in Supplemental Data Table 2.
Forty-two studies were included in the meta-analysis of mortality ratios (Fig. 2). The references for studies included in this and every other analysis are given in Supplemental Data Appendix 3. The overall mortality ratio in the inception cohorts (9 studies; 1,801 patients; median follow-up duration, 9 years) was 1.52 (95% CI, 1.25-1.78), but heterogeneity was high (I2, 73.5%). Most of the heterogeneity was because of a single study (Hoehn, 1967), and when this study was removed, the overall mortality ratio was 1.41 (95% CI, 1.28-1.55) with low heterogeneity (I2, 3.8%). The non-inception cohorts (33 studies; 27,480 patients; median follow-up duration, 7 years) had mortality ratios ranging from 0.90 to 3.79, with major heterogeneity (I2, 95.1%) and no studies clearly driving the heterogeneity. Calculation of a pooled measure of effect was therefore inappropriate. Repeating the meta-analysis without the studies of only men or older people did not significantly reduce the heterogeneity. A funnel plot (Supplemental Data Figure 1) did not show any evidence of publication bias or other small-study effect (Egger's test, P = 0.87).
The results of univariable meta-regression analyses on mortality ratios are shown in Supplemental Data Table 3. Only time from recruitment to outcome measurement was statistically significant, with mortality ratios decreasing with longer follow-up. In the final multivariable meta-regression model, inception cohorts, measurements at longer follow-up duration, older study recruitment year, and post–L-dopa studies were associated with lower mortality ratios (Supplemental Data Table 4). The proportion of between-study variance explained by the model was 33.7%, but the residual heterogeneity was still very high (91.9%). None of the variables in the multivariable model was robust to the sensitivity analyses (Supplemental Data Table 5), and the L-dopa era variable in particular was not robust, because only 2 pre–L-dopa studies had heterogeneous mortality ratios.
Nine studies reported mortality ratios at multiple times (Supplemental Data Fig. 2). Ratios increased with longer follow-up in most of these studies. Seventeen studies reported mortality ratios by sex (approximately 3,340 mens and 2,190 women). Meta-analysis of mortality ratios stratified by sex (Supplemental Data Fig. 3) demonstrated major heterogeneity in study estimates of mortality ratios in both men and women (81.9% and 83.5%, respectively), so pooled estimates were not presented. Univariable meta-regression of sex on these data showed no significant differences between males and females (P = 0.17).
Forty-five studies (27,458 patients) reported survival proportion at specific times (Fig. 3A). Where survival was reported at multiple points in a study, only the latest follow-up is shown. The regression coefficient was –0.045 (95% CI, –0.056-0.033). Major heterogeneity was found between the studies, but, on average, survival was reduced by approximately 5% per year of follow-up. Nineteen studies reported survival at multiple follow-up points (Fig. 3B). Despite marked heterogeneity, the rates of decline in survival over time seem to be approximately constant after the first few years. Eighteen studies reported median survival, 5 inception and 13 non-inception cohorts (Supplemental Data Table 6), which ranged from 6 to 22 years. Significant differences were found between studies measuring survival from different baselines (from diagnosis, onset, or recruitment), but there were few studies in each group.
Disease Duration at Death
Meta-analysis of 10 studies (1,306 patients) reporting the mean disease duration at death (Fig. 4) showed major heterogeneity (I2, 94.1%). Removing the 2 non-autopsy studies only led to a minor reduction in heterogeneity (I2, 92.1%). Univariable random-effects meta-regression (Supplemental Data Table 7) showed only one significant variable, age at diagnosis, but two significant variables in the final multivariable model: studies with earlier recruitment period (i.e., from longer ago) and increasing mean age at diagnosis were associated with shorter disease duration (Supplemental Data Table 8).
Twenty-one studies reported independent predictors of mortality, and these results can be found in full in Supplemental Data Table 2. Table 1 presents frequencies of studies that found the factors independently predictive. Several studies did not report negative findings, so quantitative analysis of these data was not possible.
Table 1. Prognostic factors independently associated with mortality in two or more studies
Studies Reporting Factor Independently Associated With Increased Mortality (N)/ Studies Which Examined the Association (N)
PIGD phenotype, prominent bradykinesia or lack of tremor
2, 7, 16, 23, 32, 65, 78, 97
Higher parkinsonian impairment score
16, 75, 82, 85, 92, 97
Presence of psychosis or hallucinations
16, 23, 75, 78
Presence of extensor plantar response
Mortality ratios ranged widely, but almost all of the studies showed increased mortality in PD. Most mortality ratios lay between 1.2 and 2.4, but, because of the major heterogeneity in the included studies, calculation of an overall measure was inappropriate. Inception cohorts gave more reliable information and have a pooled mortality ratio of approximately 1.5. Little of the heterogeneity was explained by the meta-regression analyses, doubtless because of the use of study-level variables rather than individual-patient data. Mortality ratios appear to increase with increased duration of follow-up within studies. Major heterogeneity also was found in the measures of survival studied, but survival decreased on average by 5% per year, and the rate of decline is approximately constant between studies. The duration from disease onset to death in a series of deceased patients ranged from 7 to 14 years, again with major heterogeneity, some of which was explained by year of study and age at diagnosis. Older age at onset and the presence of dementia were most consistently found to be independent predictors of mortality.
The results of the meta-regression analyses to explore the heterogeneity in mortality estimates must be regarded cautiously given that none of the variables was robust to sensitivity analyses; the number of studies in the time-to-death analysis was small; and the residual heterogeneity, after adjusting for the variables in the meta-regression model, remained high. Additionally, the L-dopa–era variable was inherently non-robust because the only 2 pre–L-dopa studies had quite different results. We therefore do not think there is good evidence that L-dopa has led to a reduction in mortality in PD. Our data suggest that mortality ratios in PD have increased over time, but why is unclear. Perhaps life expectancy has risen more quickly in the general population than in PD, rather than any increase occurring in PD mortality.
Several methodological attributes of individual studies may introduce bias to reported mortality measures. Major sources of bias include the use of non-inception cohorts, which introduces selection biases, including a survival bias (those with longer disease duration will be more likely to be recruited)[21, 22]; recruitment of patients from hospital or specialist clinics only (older people and those with lower socioeconomic status are less likely to be referred); differences in diagnostic accuracy (“possible” PD may have a higher mortality rate than “definite” PD); exclusion of patients with, for example, co-morbidities (leading to underestimation of mortality ratios); or the use of noncomparable controls (the use of healthier controls will overestimate mortality ratios, for example). Measuring mortality ratios or survival from disease onset rather than diagnosis overestimates the time at risk as no deaths in patients with PD before diagnosis will be detected (an immortal time bias), thus underestimating mortality in PD,[25-28] and patients' recall of onset is often unreliable. Other potential sources of bias include retrospective data collection and high losses to follow-up. Autopsy studies are particularly prone to selection biases.
Specific factors relating to study design may confound the association between PD and mortality. Increased life expectancy over time in the general population may influence survival rates, but the effect of this on mortality ratios is unclear. Treatment also may alter prognosis over time, but we did not find any robust evidence that L-dopa has reduced mortality, and we are not aware of data to suggest that any other treatments alter survival. We have demonstrated some empirical evidence that mortality ratios within studies increase over longer durations of follow-up but some weaker evidence of a trend in the opposite direction between studies.
This systematic review has several strengths. We used a protocol with pre-specified plans for analysis, performed comprehensive searches with broad inclusion criteria, did not use language restrictions, and assessed individual studies' methodologies and results thoroughly.
The main limitations of this review relate to the limitation of the primary studies. As we have discussed, several methodological limitations were present in many of the studies that are associated with risk of bias. A more detailed quality assessment scale may have provided a more refined assessment of the risk of bias, but given the generally poor quality on the crude scale we used, a more sophisticated scale would be unlikely to yield additional insights. Some studies reporting mortality as a secondary outcome may have been missed in the database searches. Additionally, variations in the type of data reported and poor reporting of demographic, clinical, or methodological data in some studies introduced reporting bias into this review. A major limitation to our efforts to explore heterogeneity is the use of study-level variables, because many factors that influence variability within studies may not be apparent at the study level (the “ecological fallacy”).[29, 30] Additionally, publication bias is likely in the reporting of predictive factors of mortality, because several studies only report positive findings. Most of the studies in this review were from Europe or North America, so generalizability to other geographical areas may be limited.
Another potential limitation is the combination of different ratio measures in the meta-analyses. Although this does introduce some heterogeneity, it is preferable to excluding studies and thereby reducing statistical power. Some authors have argued that SMRs are inherently noncomparable, because large differences in the age structure of populations may introduce biases, but this is also true in comparisons of studies with recruited controls, and significant variations in age structures rarely occur in practice. In any case, random-effects meta-analysis assumes inherent non-comparability in the studies analyzed.
In conclusion, PD is associated with increased mortality, approximately 1.5 times the control mortality in inception cohorts, and a decrease in survival of approximately 5% per year of follow-up. However, poor study quality and heterogeneity in study methods and patients studied have hampered synthesis of mortality data in this review. Further high-quality studies of mortality need to be performed. We recommend that these studies should, as a minimum: 1) be inception cohorts; 2) be community based; 3) have expert confirmation of diagnosis using validated diagnostic criteria; 4) have no exclusion criteria (other than those relating to accuracy of diagnosis); 5) have prospective follow-up; 6) measure long-term outcomes; and 7) use diagnosis as a baseline for measurements. Several such studies are underway.[13, 35-38] To maximize their potential value, individual-patient-data meta-analysis of such studies could be performed. Additionally, the data on prognostic factors from this review could be used to guide the choice of predictors in the development of prognostic models.
We thank Kathleen Perkins for help with translation.
1. Research Project: A. Conception, B. Organization, C. Execution; 2. Statistical Analysis: A. Design, B. Execution, C. Review and Critique; 3. Manuscript Preparation: A. Writing the First Draft, B. Review and Critique.A.D.M.: 1B, 1C, 2A, 2B, 2C, 3AK.S.M.T.: 1B, 1C, 3BC.E.C.: 1A, 1C, 2C, 3B
The authors made no disclosures.A.D.M.: funded by a Clinical Academic Fellowship from the Scottish Chief Scientist Office and receives research funding from Parkinson's UK.K.S.M.T.: none.C.E.C.: has received research funding from Parkinson's UK, the Scottish Chief Scientist Office, the National Institute for Health Research and the Engineering and Physical Sciences Research Council.