A systematic review of risk stratification tools internationally used in primary care settings

Abstract Background and Aims In our current healthcare situation, burden on healthcare services is increasing, with higher costs and increased utilization. Structured population health management has been developed as an approach to balance quality with increasing costs. This approach identifies sub‐populations with comparable health risks, to tailor interventions for those that will benefit the most. Worldwide, the use of routine healthcare data extracted from electronic health registries for risk stratification approaches is increasing. Different risk stratification tools are used on different levels of the healthcare continuum. In this systematic literature review, we aimed to explore which tools are used in primary healthcare settings and assess their performance. Methods We performed a systematic literature review of studies applying risk stratification tools with health outcomes in primary care populations. Studies in Organisation for Economic Co‐operation and Development countries published in English‐language journals were included. Search engines were utilized with keywords, for example, “primary care,” “risk stratification,” and “model.” Risk stratification tools were compared based on different measures: area under the curve (AUC) and C‐statistics for dichotomous outcomes and R 2 for continuous outcomes. Results The search provided 4718 articles. Specific election criteria such as primary care populations, generic health utilization outcomes, and routinely collected data sources identified 61 articles, reporting on 31 different models. The three most frequently applied models were the Adjusted Clinical Groups (ACG, n = 23), the Charlson Comorbidity Index (CCI, n = 19), and the Hierarchical Condition Categories (HCC, n = 7). Most AUC and C‐statistic values were above 0.7, with ACG showing slightly improved scores compared with the CCI and HCC (typically between 0.6 and 0.7). Conclusion Based on statistical performance, the validity of the ACG was the highest, followed by the CCI and the HCC. The ACG also appeared to be the most flexible, with the use of different international coding systems and measuring a wider variety of health outcomes.


| INTRODUCTION
For several decades, healthcare costs have been rising. This has been attributed to aging populations and innovative ways of curing and treating diseases, leading to an increased prevalence of chronic illnesses and comorbidities among community dwelling older people. 1 Also patients have increased demands regarding increasing choice around the way their healthcare should be organized and have tended to utilize more care. Furthermore, the needs for healthcare are not evenly distributed within populations. In Western countries, the sickest 5% of the population make up for 50% of the total healthcare costs. 2 In order to maintain high-quality healthcare, resources should be distributed according to the needs of the population instead of the demand. One way of dealing with this is to allocate resources according to the individual care needs in subpopulations. Predicting healthcare utilization and health outcomes based on needs provides opportunities to allocate resources more appropriately. Predictions of health outcomes through risk stratification can be used to tailor proactive clinical care, to install preventive measures, to restructure healthcare, and to improve insight for healthcare professionals. In the long run, this approach will help improve the quality of care and reduce the costs. 3,4 A way to monitor and predict costly patient outcomes, such as hospitalization, high-care utilization, and emergency department visits, is through the use of structured population health management programs. Population health management is an approach that aims to improve the health of a defined group of people and to strive for more equitable distribution of health outcomes within the group. In population health management programs, an important step is to stratify individuals within a specific subpopulation according to the risk of experiencing an adverse event, such as defined undesirable health outcomes or the extent of their healthcare utilization. Stratification analyses are often performed based on the use of routinely collected healthcare data. Typically, the high-risk sub-population usually comprises of a small percentage of the total population. The medium-and low-risk subpopulations are much larger with around 35% of the overall population classified as medium risk and 60% as low risk. 2 The identification of people classified on their respective risk estimates is referred to as risk stratification. Preceding risk stratification population segmentation is performed. Segmentation can be performed based on general characteristics such as age, gender, and specific diseases but also on morbidity and healthcare utilization patterns. A discussion of segmentation was outside the scope of this study.
Many methods for risk stratification exist internationally. Current literature regarding risk stratification models prominently focuses on stratifying hospital populations, based on readily available hospital data. However, primary care data have a great potential to improve healthcare quality and reduce health costs. 5 Especially, in countries where primary care registries have nearly 100% coverage of the total population, such as the Netherlands and the United Kingdom (UK), the opportunity arises to assess the whole population by using these routinely collected primary care data. Distribution of risk in a primary care population is different from a hospital or specialized care population. Current literature also mainly focuses on risk stratification models with disease-specific outcomes, whereas in this study. The focus is on more generic utilization outcomes such as risk on hospitalization, emergency department visits, future high healthcare utilization, and high pharmaceutical expenditures.
The aim of this study was to perform a systematic literature review to describe and assess the performance of different risk stratification tools with generic health utilization outcomes using routinely collected data and with possibilities of application to the European context, such as in Dutch primary care. Based on the description of the performance of the tools, we recommend the risk stratification tool best suited for usage in Dutch primary care.

| METHODS
The PRISMA statements regarding conduction and reporting systematic literature reviews were followed throughout the literature review process. 6 This review was conducted through searches in the search engines Pubmed and Embase. The search-string which contained both keywords and MeSH terms is shown in the Supporting Information S1. The most important keywords were "primary care," "risk stratification," and "model." EndNote X8.2 was used as the reference manager for the articles. The search-string was produced in collaboration with the Leiden University Medical Center (LUMC) Walaeus library.
The PRISMA flow diagram displays the numbers of included and excluded articles ( Figure 1).

| Inclusion criteria
The search characteristics are specified by the Population, Intervention, Control, and Outcome method. In our research, the population is the primary care population. Therefore, we only included articles where models applied to primary care populations are discussed. The interventions investigated were the risk stratification approaches and models that are applied to primary care data. Outcomes investigated are risks of hospitalization, high healthcare costs, emergency department visits, high pharmaceutical drug expenditure, mortality, and other generic health utilization outcomes. were reviewed. The inclusion criteria narrowed the search down to a context which was more applicable in a European primary care situation with a gatekeeper's role, such as the Dutch primary care system.

| Exclusion criteria
Articles that used risk stratification tools on populations consisting of hospitalized patients or patients seeking consultation with a specialist (eg, an oncologist or cardiologist) were excluded. These patients were not considered to represent those in a primary care setting. In addition, research looking at specific disease outcome was also excluded, as this review aims at exploring general population outcomes. Articles not freely accessible were excluded as well as articles that were not available in English.

| Assessing performance of models
The different models were compared on three aspects: frequency of use, statistical diagnostic validity, and performance in primary care.
For each identified risk stratification model, the frequency of use of the model was presented, taking into account all included studies.
For the assessment of the statistical diagnostic validity, reviewed studies were divided into application, validation, and comparison studies.
In the application studies, risk stratification tools were applied for purposes other than assessing their statistic diagnostic validity. Therefore, application studies did not present any statistical diagnostic measures of the risk stratification tools. In the validation studies and in most of the comparison studies, statistical diagnostic measures of the applied risk stratification tools were provided. Area under the curve (AUC) and C-statistics for models with dichotomous outcomes and R 2 values for models with continues outcomes were used to validate risk stratification tools. Models with AUC or C-statistic values between 0.5 and 0.6 were classified as performing poorly, values between 0.6 and 0.7 were considered sufficient, and values above 0.7 were considered good. 9 Ten of the reviewed papers, the comparison studies, compared more than F I G U R E 1 PRISMA flowchart displaying numbers of included and excluded articles one risk stratification tool in the same study population with the same record data, enabling a more appropriate comparison between risk stratification tools. Most of the comparison studies presented statistical diagnostic values, as they are mostly also validation studies.
For performance in primary care, we assessed the type of routinely collected data that are used as input of the model. Models using input data available in Dutch primary care health records were assumed to have a good potential performance in Dutch primary care.

| RESULTS
A total of 31 risk stratification models were identified in the literature.
The three most frequently applied tools, taking into account all included studies, concern the Adjusted Clinical Groups (ACG), the Charlson Comorbidity Index (CCI), and the Hierarchical Condition Categories (HCC). These three main risk stratification tools are presented in Table 1, with predicted outcomes and diagnostic values. Assessment of these tools, their diagnostic validity, and applicability in primary care are described in order. The remaining 28 risk stratification tools can be found in the Supporting Information S2.

| Adjusted clinical groups: 23 Studies
The ACG is the most frequently applied risk stratification tool in our review. The ACG system is a risk stratification model designed by the Johns Hopkins University. The model was originally developed to predict and measure multimorbidity in a population. The ACG system is a measure of comorbidity and can predict utilization costs, hospitalization, and emergency department visits. The model is able to use patients' data from electronic health records (EHRs), insurance claims, disease registries, and health status surveys. 10 Minimal input data for the model are healthcare diagnoses in a specific time interval, gender, and age, to which the ACG classifies people to one of 93 ACG categories. These categories represent expected healthcare utilization. In addition, different probabilities for future utilization of healthcare services are calculated. This information can be used by healthcare professionals to make informed clinical and administrative decisions. 4 Of the 23 ACG studies, eight provided statistical diagnostic values for the accuracy of the model, calculated for different outcomes. For prediction of hospitalization, the model is diagnostically assessed three times with AUC and C statistic values between 0.73 and 0.82. 4,11,12 The diagnostic accuracy can be classified as good.
In one study, a C-value of 0.67 is presented for prediction of emergency department visitation, which classifies as sufficient, and a C-value of 0.76 for prediction of high total costs, again classifying as good. 4 Three other studies presented R 2 values between 0.37 and 0.41 for explaining the variation of healthcare costs by the ACG model. 10,13,14 Variations in high utilization of different healthcare services, such as primary care visits, specialists' visits and numbers of diagnostic imaging tests, diagnoses, and hospitalizations, are discussed in three studies, with R 2 values ranging from 0.24 to 0.77. 13,15,16 ACG is highly suitable for application in primary care populations, as using International Classification of Primary Care (ICPC) codes as input is possible. 10 ICPC codes are used to classify complaints and diagnoses of patients in many primary care settings, such as in the Netherlands. This information is stored in EHRs. The model uses other input variables such as age, gender, pharmaceutical information, and previous visitation, stored in the EHR as well.

| Charlson comorbidity index: 19 Studies
The CCI is the second-most studied risk stratification model. The CCI was developed by Charlson and colleagues in 1987 and was originally an age-comorbidity index that predicted a relative risk of death within a year for hospital-admitted cancer patients. 17 Since that time, many adjustments have been made, and in addition to mortality predictions, the model is now used to predict hospitalization, emergency department visitation, future healthcare utilization, and morbidity in wider populations. The system categorizes the population into six categories, based on the presence of comorbidities and chronic conditions, of which a weighted sum is provided (from zero conditions as category 1-5 or more conditions as category 6). 18,19 The model investigates the effect of multimorbidity and predicts several outcomes.
Variations of the CCI exist, and the validity on predictions has been consistently investigated. 4 From the 18 studies in which the CCI or a modification was used,  14,18,19 For healthcare utilization of different healthcare services, R 2 values were between 0.13 and 0.26. 15,16,23 Input variables for the CCI include combinations of age, race, gender, mental illness, pregnancy, drug or alcohol addiction, type of health plan, type of provider, number of therapeutic classes, and number of medications prescribed. The CCI is fit for use with primary care data but focuses primarily on the absence or presence of chronic conditions, apart from other demographics. Although there is no evidence in the included studies of use of the CCI with ICPC codes, the coding system used in Dutch primary care, there is evidence for use with Read codes, a British primary care coding system. 24 Possibilities to use the model with coding systems other than International Classifications of Disease (ICD) codes are therefore very likely.
The software algorithm for CCI is published and available. 4

| Hierarchical condition categories: Seven studies
The third most frequently studied model (n = 7) is the HCC. This model was first designed and implemented by the Centers for Medicare and Medicaid Services (CMS) to adjust capitation payments for enrolees with higher risk than others. The model uses demographic data of patients as well as ICD 10th revision (ICD-10) diagnosis codes.
ICD codes are used in all American healthcare service providers. 25 The ICD classification is adapted in other countries, yet these are codes most prominently used in hospital administrative registries. 26 Based on this information, the model categorizes a patient into one of 70 aggregated condition categories, which contributes to an individualized risk score.   department visitation, but a much higher C-statistic of 0.70 for prediction of high total costs. 4 A major concern regarding this model is that it makes use of ICD codes rather than ICPC codes, making it difficult to apply in the Dutch primary care settings.

| Comparison studies
A total of 10 papers compared more than one risk stratification tool applied within the same study populations. However, only five articles compared more than one of the three above-mentioned risk stratifica-

| Remaining risk stratification tools
In addition to the three above-mentioned risk stratification tools, 28 other tools were identified within this systematic literature review.
One of the 28 identified risk stratification tools is called the Elixhauser Index or Method and was mentioned in five studies. The Elixhauser Index uses a set of 30 dichotomous variables as comorbidity measures. 27 Outcomes concern high utilization and pharmaceutical expenditure. One out of the five studies, mentioning the Elixhauser Index, provided C-statistics between 0.62 and 0.74 for different health utilization outcomes. 21 The study by Ou and colleagues com- For the applicability in primary care, evidence shows that the ACG has the possibility to make use of ICPC codes, the coding system of the

| Further research
From all the articles included in this study, a small percentage explicitly defines "risk stratification." With the growing need for tailored care and health management approaches, a precise definition will be useful. Risk stratification and other terms such as population segmentation are now used interchangeably. Studies contributing to a generalized definition of the term risk stratification will be of great scientific and practical value. By using the same definition, miscommunications regarding the meaning of risk stratification will be reduced, and information on highly performing methods and implementations thereof can be shared more effectively.
With this review, we studied which risk stratification tools are best suited for the European primary care setting. However, primary care settings differ between countries. To find the best suitable tool for a specific primary care system, the performance of different tools should be investigated within the same setting, centered on desired outcomes. Based on the results of this literature review, further studies assessing the performance of desired risk stratification models will be beneficial for Dutch primary care.

| CONCLUSION
In conclusion, based on application frequency, statistical validity, and used diagnosis coding systems, we suggest the ACG as the best model for use in European primary care settings, such as Dutch Primary Care.
However, further local assessment of the ACG system is needed to ensure proper implementation.

ACKNOWLEDGMENTS
The authors acknowledge the Walaeus library of the Leiden University Medical Center for their collaboration in generating the search strategy for our Pumbed and Embase search. The authors also acknowledge the contribution of Josanne Mansveld in the study screening process.

CONFLICT OF INTEREST
All authors declare that there is no conflict of interest that could be perceived as prejudicing the impartiality of the research reported.

AUTHOR CONTRIBUTIONS
Conceptualizing

TRANSPARENCY STATEMENT
Authors affirm that the manuscript is an honest, accurate and transparent account of the study being reported, that no important aspects of this study have been omitted and that there were no discrepancies from the study as planned.

DATA AVAILABILITY STATEMENT
Data sharing not applicable to this article as no new data were created or analyzed during this study.

ETHICS STATEMENT
Ethical approval was not required for this study.