Use of the Shizuoka Hip Fracture Prognostic Score (SHiPS) to Predict Long‐Term Mortality in Patients With Hip Fracture in Japan: A Cohort Study Using the Shizuoka Kokuho Database

ABSTRACT Hip fractures are common in patients of advanced age and are associated with excess mortality. Rapid and accurate prediction of the prognosis using information that can be easily obtained before surgery would be advantageous to clinical management. We performed a population‐based retrospective cohort study using an 8.5‐year Japanese claims database (April 2012–September 2020) to develop and validate a predictive model for long‐term mortality after hip fracture. The study included 43,529 patients (34,499 [79.3%] women) aged ≥65 years with first‐onset hip fracture. During the observation period, 43% of the patients died. Cox regression analysis identified the following prognostic predictors: sex, age, fracture site, nursing care certification, and several comorbidities (any malignancy, renal disease, congestive heart failure, chronic pulmonary disease, liver disease, metastatic solid tumor, and deficiency anemia). We then developed a scoring system called the Shizuoka Hip Fracture Prognostic Score (SHiPS); this system was established by scoring based on each hazard ratio and classifying the degree of mortality risk into four categories based on decision tree analysis. The area under the receiver operating characteristic (ROC) curve (AUC) (95% confidence interval [CI]) of 1‐year, 3‐year, and 5‐year mortality based on the SHiPS was 0.718 (95% CI, 0.706–0.729), 0.736 (95% CI, 0.728–0.745), and 0.758 (95% CI, 0.747–0.769), respectively, indicating good predictive performance of the SHiPS for as long as 5 years after fracture onset. Even when the SHiPS was individually applied to patients with or without surgery after fracture, the prediction performance by the AUC was >0.7. These results indicate that the SHiPS can predict long‐term mortality using preoperative information regardless of whether surgery is performed after hip fracture.

Exploration of prognostic factors after hip fracture in Japan, a leading Asian country for aging, may contribute to appropriate medical management and reduced post-onset mortality. In this study, we used the Japanese insurance claims database for 8.5 years from 2012 to 2020 to identify preoperative prognostic factors for development of a scoring system, the Shizuoka Hip Fracture Prognostic Score (SHiPS), to predict long-term (5-year) mortality after hip fracture.

Data sources
The Shizuoka Kokuho Database (SKDB) is an insurance claims database in Shizuoka prefecture, Japan, that was expanded to 2,398,393 individuals (1,303,667 [54.4%] women) collected over the 8.5-year period from April 2012 to September 2020. (32) Shizuoka is located in central Japan, has a population of approximately 3.7 million people, and is characterized by the standard climate, demographics, and economy of Japan. The SKDB includes the data from the National Health Insurance (NHI) for individuals aged <75 years and the Latter-stage Elderly Medical Care System (LSEMCS) for individuals aged ≥75 years among residents in Shizuoka prefecture. These data comprise information on age; sex; diagnosis based on the International Statistical Classification of Diseases and Related Health Problems, Tenth Revision (ICD-10); medical treatment; prescribed medications and their dates of administration; the level of Japanese nursing care certification; and the accurate date of death. The SKDB has been used as a data source in several published studies. (33)(34)(35) Study design and population This population-based retrospective cohort study was performed using the SKDB. The cohort was defined as the period from the registration date of health insurance agencies or April 1, 2012, whichever occurred later, to the date of insurance withdrawal (for transferring to other insurance, or to welfare public assistance, or death) or September 30, 2020, whichever occurred earlier. The index date was determined as the first onset of hip fracture in individuals with at least 1 year of continuous subscribership after cohort entry. Patients with hip fractures were defined by ICD-10 code S72.0 (femoral neck), S72.1 (pertrochanteric), and S72.2 (subtrochanteric).
The baseline period was set 1 year before the onset of hip fracture and was used to exclude patients aged <65 years or who had already experienced any hip fracture during that time. In addition, patients with two or more ICD-10 codes for this fracture site were excluded from the study population because the exact fracture site could not be determined based on the receipt data.

Outcome and covariates
The primary outcome was death after the first onset of hip fracture, and the duration from the onset of hip fracture to death was observed. The baseline period was used to collect the patients' demographic information and comorbidities. We used the CCI and Elixhauser comorbidity index (ECI), both of which are widely used for comorbidities, as potential predictive factors. (36,37) We also collected the Japanese nursing care certification records, which are certified strictly based on a patient's physical and mental status, and could be an indicator of activities of daily living (ADL). The nursing care assessment, based on the process in Japan's long-term care insurance system, (38) is limited to persons aged ≥65 years or persons aged ≥40 years with specific diseases. The level of nursing care is categorized as requiring help level 1 or 2 and long-term care level 1 to 5, and all persons who had been certified at any level were included as nursing care-certified patients in this study.

Timing of surgical approaches
We investigated whether surgical or conservative approaches were implemented after hip fracture. The surgical procedures included total hip arthroplasty, bipolar hip arthroplasty, and open surgery. Because the SKDB only provides information for the month in which the surgery was performed, the preoperative waiting times were classified as follows: within 1 month, 1 to 2 months, 2 to 3 months, 3 to 4 months, 4 to 5 months, 5 to 6 months, and more than 6 months after fracture onset.

Statistical analysis
Continuous and categorical variables are summarized using mean AE standard deviation and frequency (percentage). To compare baseline characteristics between the patients who survived and those who died after hip fracture, the t test and chisquared test were used for continuous and categorical variables, respectively. The survival rate was calculated using the Kaplan-Meier method, and the log-rank test was used to compare groups.
To develop a mortality prediction scoring system after hip fracture, two-thirds of all patients were randomly selected as the training data set. The remaining one-third of patients were used as the test data set. Univariate and multivariate Cox proportional hazards regression analyses were conducted to explore prognostic factors using the training data set. We calculated the hazard ratio (HR), 95% confidence interval (CI) based on the Wald test, and corresponding p value. The variables used in the multivariate model were sex, age, season of onset, fracture site, nursing care certification, and CCI/ECI. Spearman's rank correlation coefficient was used to check correlations between potential predictors (correlated: ≥0.4). All potential independent predictors were entered into the multivariate model.
Next, HR values were converted to logarithms, multiplied by the same number, and then rounded to the nearest integer to develop the scores for ShiPS. (39) The risk classification of the SHiPS was determined based on a conditional inference tree analysis. First, the data were sequentially divided into two groups according to the SHiPS. Next, the permutation test was used to compare the two groups, and the variable with the minimum p value was selected as the grouping node. This method was repeated for each subgroup until all separations were not without significance or the minimum node was reached. To evaluate the performance of the scoring system and the classification, Uno's C-index was calculated throughout the entire survival time. (40) To assess the predictive performance at the 1-year, 3-year, and 5-year time points for scoring, time-dependent areas under the receiver operating characteristic (ROC) curve (AUCs) were calculated. (41)  Because there are no missing values for all variables in this study, missing values were not imputed in all analyses. A twosided p value of <0.001 was considered statistically significant. All statistical analyses were carried out using SAS version 9.4 (SAS Institute Inc., Cary, NC, USA), R version 4.1.1 (The R Foundation for Statistical Computing, Vienna, Austria), and EZR version 1.54 (Saitama Medical Center, Jichi Medical University, Saitama, Japan), which is a graphical user interface for R. (42) Ethics statement All patient-related data were anonymized to protect the participants' confidentiality. The ethics committee of the Shizuoka Graduate University of Public Health approved the study protocol (SGUPH_2021_001_020).

Study population and surgical treatment of hip fracture
A flowchart of this study is shown in Fig. S1 Table 2).

Characteristics of patients with hip fracture
To develop and validate a scoring system, the evaluated patients were randomly divided into training and test data sets. The training data set included 29,019 patients (23,012 [79.3%] women), and the test data set included 14,510 patients (11,397 [78.5%] women) (Fig. S1). The characteristics of the patients with firstonset hip fracture in the training and test data sets are shown in Table 1. There were no significant differences in the patients' characteristics between the training and test data sets.
Of the total patients, 43.2% in the training data set (median follow-up time: 2.07 years) and 42.7% in the test data set (median: 2.09 years) died after sustaining the hip fracture. Compared with the patients who survived after the onset, those who died tended to be male, be older, be nursing care-certified, and have more comorbidities. The season of onset tended to be less in summer and more in winter. The most common fracture sites in patients who died were femoral neck fractures (S72.0) and pertrochanteric fractures (S72.1).
Association between surgery after hip fracture and mortality Table 2 shows patients who underwent surgery after a first hip fracture and those who did not, classified by survival and death. A total of 53.0% (n = 6,097) of patients without surgery and 39.5% (n = 12,636) with surgery died (p < 0.001). Almost all of the patients who underwent surgery, whether alive or dead, had surgery within 1 month of fracture. Table S1 also shows the characteristics of patients who underwent surgery after fracture onset and those who did not.

Identification of prognostic factors after hip fracture
We evaluated potential prognostic factors using univariate and multivariate Cox regression analyses with the training data set (n = 29,019) ( Table 3). Of the variables with a p value of <0.001 in the univariate analysis, nursing care certification and dementia were correlated as judged by Spearman's rank correlation coefficient (absolute value of ≥0.4) (Table S2). We selected a nursing care-certified status in this study because it is more widely used in Japan than a dementia diagnosis, which could reflect a population with declining ADL due to various factors.
In the multivariate analysis, we identified the following independent prognostic factors for mortality: male sex, age, fracture site (femoral neck and pertrochanteric), nursing care certification, and several comorbidities (cerebrovascular disease, any malignancy, renal disease, congestive heart failure, chronic pulmonary disease, moderate or severe liver disease, metastatic solid tumor, and deficiency anemia). The results of Cox regression analysis using the test data set are shown in Table S3 and did not differ from those of the training data set.
If we had selected dementia as a potential prognostic factor instead of nursing care certification in the multivariate analysis, the prognostic factors would have remained almost the same, and cerebrovascular disease was newly added despite the fact that its HR was not so extensive (Table S4). Use of SHiPS system for mortality after hip fracture A scoring system (SHiPS) was constructed to predict postfracture mortality by scoring the identified prognostic factors based on their HR resulting from the multivariate Cox regression model. Figure 1 shows the format for calculating risk scores based on the SHiPS. The maximum score was 64 points, with each predictor score ranging from 0 to 16 points. The C-index (95% CI) was 0.695 (95% CI, 0.691-0.700) in the training data set. By plotting ROC curves using the training data set, the AUC (95% CI) of 1-year, 3-year, and 5-year mortality based on the SHiPS was 0.738 (95% CI, 0.729-0.746), 0.753 (95% CI, 0.746-0.760), and 0.782 (95% CI, 0.772-0.791), respectively.  Figure S2 shows the conditional inference tree fitting for the classification of mortality risk based on the SHiPS, resulting in 0 to 9 points being classified as low risk, 10 to 17 points as moderate risk, 18 to 24 points as high risk, and 25 to 64 points as very high risk (also see Fig. 1). Based on this risk classification, the Kaplan-Meier curves for the training data set are shown in Fig. S3. The survival probability decreased at all time points in the order of low, moderate, high, and very high-risk categories.
The proportion of deaths according to the SHiPS for all patients, including those in the training data set, is shown in Fig. 2. The minimum SHiPS was 0 and the maximum was 51. As the SHiPS increased, the proportion of patients who died also increased.
Validation of the SHiPS system The SHiPS system was evaluated for its predictive performance by plotting ROC curves using the test data set (n = 14,510). The AUC (95% CI) of 1-year, 3-year, and 5-year mortality based on the SHiPS was 0.718 (95% CI, 0.706-0.729), 0.736 (95% CI, 0.728-0.745), and 0.758 (95% CI, 0.747-0.769), respectively, indicating adequate predictive value for the SHiPS system for as long as 5 years after fracture onset. Figure 3 shows the Kaplan-Meier curves for the mortality risk category based on the SHiPS in the test data set. Similar to the training data set, the test data set also showed a lower survival probability in the higher risk category throughout the observation period. Additionally, the point estimations of the 1-year, 3-year, and 5-year survival rates are shown in Table 4; worse survival rates were found in the higher risk category at all time points.
Predictive performance of SHiPS for patients with or without surgery after hip fracture The SHiPS was developed to predict the mortality risk based solely on information that can be collected at the time of the fracture, regardless of whether the patient undergoes surgery following fracture onset. We were interested in how well the SHiPS would perform when limited to patients who did or did not undergo surgery. Fig. S4 shows the Kaplan-Meier curves for the mortality risk category based on the SHiPS for the patients who underwent surgery (n = 10,612) (Fig. S4A) and those who did not undergo surgery (n = 3,898) (Fig. S4B) in the test data set. The point estimations of the 1-year, 3-year, and 5-year survival rates for each are shown in Table 4. Based on our findings, the SHiPS was considered to work well as an accurate predictor for individual patients with or without surgery in the test data set.

Discussion
In this study, we analyzed 43,529 advanced-age patients with hip fracture using a large-scale population-based claims database over an 8.5-year period from 2012 to 2020. We identified preoperative prognostic factors and developed and validated a novel scoring system (the SHiPS) to predict long-term mortality up to 5 years after hip fracture. The SHiPS consists of information that can be easily obtained before surgery, including sex, age, fracture site, nursing care certification as an indicator of ADL, and several comorbidities (any malignancy, renal disease, congestive heart failure, chronic pulmonary disease, liver disease, metastatic solid tumor, and deficiency anemia). The advantage of the SHiPS is the ability to assess long-term risk of mortality at the onset of hip fracture, regardless of whether surgery is subsequently performed. Rapid and accurate risk assessment may help anesthesiologists and surgeons to better care for patients from a short-term perspective and may affect clinical management. In the long term, it may also be helpful in considering treatment strategies alongside rehabilitation and treatment of comorbidities. It also facilitates more effective communication with the patient and his or her relatives to help them understand the prognosis and make rational decisions about clinical management. In the future, it will also help in the development of new medical services and treatment technologies.
The SHiPS is calculated by weighting the HR of each prognostic factor. The total number of points is used to classify patients into low-risk, moderate-risk, high-risk, and very-high-risk groups (Fig. 1). For example, an 80-year-old man with a femoral neck fracture, nursing care certification, and renal disease would be classified into the very-high-risk group with a total SHiPS of 26 points using the risk score form shown in Fig. 1. As shown in Table 4, his predicted 1-year, 3-year, and 5-year survival probability is 62.5%, 29.2%, and 12.8%, respectively.
Our study suggests that the SHiPS can adequately predict postfracture mortality based on preoperative information at the onset time, regardless of whether surgery is performed after the fracture. Surgery was not considered to be a prognostic factor because it was an intermediate variable in our survival analysis and because we aimed to make the SHiPS a useful predictive tool applicable to both surgically and conservatively treated patients. For example, 26.4% of the patients in this study were treated without surgery; this is a significantly higher percentage than the 3.6% in the NHFS development analysis. (25) However, from a risk modeling approach (43) perspective, our data validating the SHiPS in patients with and without surgery after fracture showed that survival was higher in those with surgery in all risk categories (Table 4, Fig. S4). The association between surgical adaptation and subsequent mortality is an important topic, (44,45) and our results showed that patients who underwent surgery had better survival as a consequence. The reason why more people in this study did not undergo surgery is unknown. However, there are two possible reasons why people generally do not opt for surgery. First, surgery cannot be  Fig. 2. The proportion of deaths according to the SHiPS. The scores of the SHiPS were tallied in 2-point increments, and the proportion of deaths for each score was calculated. The range of SHiPS for the study population was from 0 to 51. Due to their small number, scores above 46 were grouped together. The mortality risk categories of low (gray), moderate (red), high (green), and very high (blue) were displayed using background colors based on the SHiPS. applicable to elderly patients or those with underlying medical conditions for whom the risks associated with surgery and subsequent rehabilitation are too high. Second, patients and their families can choose not to undergo surgery because of concerns about the risks and uncertainties associated with surgery.
Of the predictive factors identified in this study, male sex, older age, and nursing care certification indicating prefracture ADL and dementia have already been reported in many studies, (7)(8)(9)(10)(11)(12)(13)(14)(16)(17)(18)(19)(20)(21)(22)(24)(25)(26)(27) and our results are considered reasonable. With respect to the fracture site, intertrochanteric fractures (7) and trochanteric fractures (9) are reportedly associated with higher mortality than femoral neck fractures. In the present study, however, femoral neck fractures defined as S72.0 in the ICD-10 and pertrochanteric fractures defined as S72.1 were found to have a slightly more harmful effect on mortality than subtrochanteric fractures defined as S72.2. A low hemoglobin level on admission was identified as a prognostic factor in previous reports, (25,27) and our results also showed that deficiency anemia defined by the ECI was a prognostic factor. With respect to comorbidities, six of 17 diagnoses of the CCI were associated with the prognosis in the present study. Some studies, such as the Deyo Charlson index, (46,47) weight individual diseases, but unfortunately, most prognostic studies of hip fractures have focused on the number of comorbidities, (10,11,17,20,25) making it difficult to understand the involvement of individual diseases. Although some of the conditions identified in this study have been shown to adversely affect mortality (including cerebrovascular disease, (12) renal disease, (12,16,24) congestive heart failure, (13,14,24) chronic lung disease, (8,13,24) liver disease, (17,21) cancer, (12,16,19,24) and anemia (23) ), few studies have comprehensively clarified the extent to which each comorbidity contributes to increased mortality, as in the present study. This is a novel finding. Moreover, if the number of CCI is used as a predictor, it is treated as a single variable. However, if each disease is examined separately, as in this study, the number of predictors increases with the number of diseases, which improves the predictive performance.
Several limitations of this study must be considered. First, the insurance database used in this study does not have records on past hip fracture, smoking habits, BMI, pregnancy, clinical laboratory values, or other unknown factors. Second, due to the significant cost of institutionalization following a fracture, it would be meaningful to use institutionalization-free survival as an alternative outcome. However, this outcome could not be used in this study due to the difficulty in identifying the codes for institutionalization. Third, the exact surgery date is not recorded in the database; only the month in which the surgery was performed is available. Previous studies have discussed surgical waiting times after hip fracture in the range of hours to days, (48) but in our study, we could only analyze the data on a monthly basis. Fourth, the direct cause of death is not recorded in the database. We were also unable to determine whether any rehospitalizations had occurred before death. Fifth, the diagnosis of hip fracture and other comorbidities were based on ICD-10 codes and were not clinically confirmed, and their validation studies have not been performed. Sixth, this database covered only some residents in Shizuoka prefecture. Still, we minimized the impact of this bias by classifying the cohort by age and performing a multivariate analysis. Seventh, although internal validation was conducted to evaluate the performance of the SHiPS system, the test data set was derived from the same database and may not represent proper external validation. Eighth, nursing care certification is a system unique to Japan and is more widely used than  the diagnosis of dementia in Japan. Nursing care certification is an indicator equivalent to disability. Furthermore, we attempted to develop a scoring system for cases in which we adopted the dementia diagnosis as a prognostic factor instead of nursing care certification (Table S4), and we confirmed that the prediction accuracy was sufficient in such cases as well (data not shown). Ninth, this study was conducted in Japan, which has a universal health insurance system. It is expected that similar trends would be observed in countries without universal health insurance systems, although the survival probability would be expected to be slightly lower. Finally, we could not directly compare SHiPS with the previously reported scores because we do not have all the variables that make up previous scores. However, we found it interesting that, by using a completely independent Japanese claims database, the variables such as age, sex, ADL/dementia, anemia, and malignancy were consistent with the previous scores. Despite these limitations, this study allowed us to examine the prognostic factors of post-fracture mortality and to develop a clinically useful scoring system. In conclusion, our study demonstrates that the SHiPS is an adequate scoring system for predicting post-fracture mortality over a long-term period using preoperative information. Therefore, we believe the SHiPS will be helpful for clinicians, caregivers, and researchers working with the growing number of advanced-age patients who sustain hip fractures.