New practical scoring system to predict post‐endoscopic retrograde cholangiopancreatography pancreatitis: Development and validation

Abstract Background and Aim Post‐endoscopic retrograde cholangiopancreatography (ERCP) pancreatitis (PEP) is the most serious adverse event of ERCP. Therefore, it is important to identify high‐risk patients who require prophylactic measures. The aim of this study was to develop a practical prediction model for PEP that identifies high‐risk patients. Methods Patients who underwent ERCP at three tertiary hospitals between April 2010 and September 2012 were enrolled. The dataset was divided into a training set (two centers) and validation set (one center). Using the training set, univariable and multivariable analyses were performed to identify predictive factors for PEP. We constructed a scoring system and evaluated reproducibility using the validation set. Results A total of 2719 ERCPs were analyzed. PEP developed in 133 cases (4.9%). Risk factors (adjusted odds ratios [OR]) in the training set were a history of PEP (OR: 4.2), intact papilla (OR: 2.4), difficult cannulation (OR: 1.9), pancreatic guidewire‐assisted biliary cannulation (OR: 2.2), pancreatic injection (OR: 2.1), pancreatic intraductal ultrasonography (IDUS)/sampling from the pancreatic duct (OR: 2.2), and biliary IDUS/sampling from the biliary tract (OR: 2.8). A scoring system was constructed using these seven clinical variables. The areas under the receiver‐operating characteristic curve of this model were 0.799 in the training set and 0.791 in the validation set. In the high‐risk group at a score of 3 or higher, the incidence of PEP was 13.4%, and all severe/fatal cases were in the high‐risk group. Conclusions This scoring system helps to predict each patient's risk and select preventive measures.


Introduction
Endoscopic retrograde cholangiopancreatography (ERCP) is an invaluable procedure for the diagnosis and management of pancreaticobiliary diseases. However, it occasionally causes adverse events such as pancreatitis, bleeding, perforation, and infection. Acute pancreatitis is the most common adverse event of ERCP. Post-ERCP pancreatitis (PEP) may be severe or fatal. The reported incidence of PEP widely varies between 1.6 and 15%, and a systematic survey reported that the cumulative incidence of PEP was 3.47% and case fatality was 0.11%. 1 Therefore, prophylaxis for PEP is a clinically important issue.
Several strategies have been proposed to prevent PEP, including appropriate patient selection, the rectal administration of nonsteroidal anti-inflammatory drugs (NSAIDs), and aggressive hydration with lactate Ringer's solution. [2][3][4][5][6] Pancreatic stenting is considered the most promising prophylactic procedure for high-risk patients because it maintains the outflow of pancreatic juice. 7 However, attempting prophylactic pancreatic stenting (PPS) has the potential risk of pancreatitis, pancreatic ductal damage, and inward migration. 8 Since prophylactic measures sometimes lead to adverse events, they are regarded as a doubleedged sword. To safely perform ERCP, it is important to individually estimate the risk of PEP in each patient and identify highrisk patients.
Over the past few decades, risk factors for PEP have been the focus of many studies; however, there are currently only a few reliable methods for predicting PEP in individual patients. 6,9,10 Friedland et al. were the first to propose a scoring system to predict post-ERCP pancreatitis. 11 This scoring system consisted of pain during the procedure, pancreatic duct cannulation, a history of PEP, and the cannulation attempt score. However, since pain during the procedure is difficult to measure, a prediction model for PEP needs to consist of more objective and measurable components.
Three more prediction models have recently been reported. However, these models are not suitable for use in clinical practice because of their low discriminability, complexity, or lack of external validation. [12][13][14] In the present study, we aimed to develop and validate a practical prediction model for PEP based on the ERCP database of three major tertiary hospitals in the western metropolitan area of Japan.

Methods
Setting/participants. In our retrospective cohort study, 2976 consecutive ERCP procedures performed from April 2010 and September 2012 were extracted from the databases of Kyoto University Hospital (Kyoto), Kitano Hospital (Osaka), and Yodogawa Christian Hospital (Osaka).
Among these procedures, those with transpapillary ERCP were included in the analysis, whereas those on patients younger than 18 years, with comorbid acute pancreatitis, and with altered gastrointestinal anatomy, such as the Roux-en-Y reconstruction, were excluded. All ERCP patients stayed in the hospital for at least 24 h after the procedure to monitor the clinical manifestations of pancreatitis, and received continuous intravenous fluid infusion from around the time of ERCP until the following morning.
Outcomes. The main outcome was incident PEP. The diagnosis of PEP was based on the consensus definition: new or worsened abdominal pain, new or prolonged hospitalization for at least 2 days, or an increased post-procedure (>24 h) serum amylase level three-fold the upper normal limit or higher. [15][16][17] The severity of PEP was defined by modified Cotton's criteria: mild PEP was defined as a 2-3-day extension of the hospital stay or fasting; moderate PEP as a 4-10-day extension of the hospital stay or fasting; and severe PEP as a > 10-day extension of the hospital stay or fasting, hemorrhagic pancreatitis, pancreatic necrosis, pseudocyst, or the need for percutaneous drainage or surgical intervention. Other outcomes included the adverse events defined in the 2010 American Society for Gastrointestinal Endoscopy (ASGE) lexicon for endoscopic adverse events. 18 Predictors. We assessed two types of candidate predictive factors. Patient-related factors included younger age, female sex, history of PEP, the absence of chronic pancreatitis, normal serum bilirubin, hyperamylasemia before ERCP, periampullary diverticulum, intact papilla, and a suspected sphincter of Oddi dysfunction. Procedurerelated factors included difficult cannulation, pancreatic guidewire (PGW)-assisted biliary cannulation, precut sphincterotomy, biliary sphincterotomy, biliary balloon sphincter dilation, pancreatic injection, PPS, pancreatic drainage, endoscopic biliary stenting, endoscopic naso-biliary drainage, endoscopic metallic stenting, the extraction of biliary stones, biliary intraductal ultrasonography (IDUS), pancreatic IDUS, and sampling from the biliary tract and pancreatic duct. In the present study, younger age was defined as <60 years, chronic pancreatitis as the presence of pancreatic stones, and intact papilla as no previous sphincterotomy or papillary balloon dilation or stenting. Difficult cannulation was defined as that taking more than 15 min. Pancreatic drainage included pancreatic stenting and naso-pancreatic drainage, except for PPS. Sampling was obtained by brushing cytology or forceps biopsy.
Statistical methods. Categorical data are presented as numbers (percentage) and continuous data as means (standard deviation) for normally distributed data and medians (range) for skewed numerical data.
The dataset was divided into a training set and validation set. The training set used data from two centers, Kyoto University and Yodogawa Christian Hospital, and the validation set used data from one center, Kitano Hospital.
To develop the prediction model, we assessed the multicollinearity of the predictor variables and selected representatives. A univariable analysis was then performed using the chi-squared test for each of the potential predictors. Candidate predictors were selected based on P values <0.2 in the chi-squared test other than known definite risk factors. A multiple logistic regression analysis was then performed using backward stepwise methods and the odds ratio (OR) and its 95% confidence interval (CI) were indicated.
We constructed a scoring system to predict PEP based on the findings of the multivariable analysis. To generate a simple integer-based point score for each predictor variable, scores were given by multiplying the β coefficient by 10 and rounding up or down to the nearest integer. The overall risk score for each patient was calculated by summing the scores of all components.
To assess the calibration of the scoring system, the incidence of PEP was plotted against the total score, and a visual inspection of the histogram and the Hosmer-Lemeshow goodness-of-fit test were performed. To assess the discrimination of the scoring system, we drew a receiver-operating characteristic (ROC) curve for the risk of PEP and calculated the area under the ROC curve.
Overfitting and optimism in the training set were evaluated using the bootstrap method by sampling with replacements for 2000 iterations. The area under the ROC curve was calculated in each resampling. Optimism was calculated as the difference between training performance and bootstrap performance. 19,20 We also evaluated the calibration and discrimination of the validation set and considered risk stratification based on the score-incidence graph and ROC curve in all cases.
Analyses were performed using R 4.0.3 (The R Foundation for Statistical Computing Platform, Vienna, Austria) and JMP10 (SAS Institute, Cary, NC, USA).
Ethics. The present study was approved by the Institutional Review Boards of Kyoto University, Yodogawa Christian Hospital, and Kitano Hospital. This study was registered in the University Hospital Medical Information Network (UMIN000038243).

Results
Patient characteristics. During the study period, 2976 ERCPs were performed at the three participating hospitals. Among these, 257 ERCPs were excluded for the following reasons: patient age <18 years, 10; comorbid acute pancreatitis, 70; altered gastrointestinal anatomy, 114; anastomosis or fistula, 32; data missing, 19, and others, 12. Therefore, 2719 ERCPs were included in the present study. In the patient cohort, 40% were women, and the median age was 67.2 years. ERCP for biliary diseases and that for pancreatic diseases were 73.2 and 26.8%, respectively. The success rate of selective cannulation was 98.5% for all patients and 96.5% for those with intact papilla. The training set consisted of 1969 cases in two centers, while the validation set consisted of 750 cases in one center, and their characteristics were similar (Table 1).
Predictor selection. Biliary IDUS and sampling from the biliary tract had multicollinearity for PEP, and their frequencies were low. Therefore, these factors were combined as biliary IDUS and/or sampling from the biliary tract. Similarly, pancreatic IDUS and tissue sampling from the pancreatic duct were combined as pancreatic IDUS and/or sampling from the pancreatic duct. Thirteen candidate factors with P values <0.2 were selected in the univariable analysis of the training set: female sex, a history of PEP, the absence of chronic pancreatitis, intact papilla, difficult cannulation, PGW-assisted biliary cannulation, precut sphincterotomy, biliary sphincterotomy, pancreatic injection, PPS, pancreatic IDUS and/or sampling from the pancreatic duct, the extraction of biliary stones, and biliary IDUS and/or sampling from the biliary tract ( Table 3).
The multivariable logistic regression analysis of the training set identified seven significant independent risk factors: two patient-related and five procedure-related factors ( Construction of the scoring system. The probability of pancreatitis was predicted using the following equation:   2 Â history of PEP þ 1 Â intact papilla þ 1 Â difficult cannulation þ 1 Â PGW À assisted biliary cannulation þ 1 Â pancreatic injection þ 1 Â pancreatic À IDUS=sampling from the pancreatic duct þ 2 Â biliary À IDUS=sampling from the biliary tract Performance of the scoring system. In the training set, the incidence of PEP was 1.2% at score 0 (n = 661), 0.7% at score 1 (n = 426), 4.5% at score 2 (n = 399), 10.1% at score 3 (n = 248), 13.6% at score 4 (n = 158), and 24.7% at score 5 or higher (n = 77). The incidence of PEP tended to increase according to the total score (Fig. 1). The P value of the Hosmer-    External validation. In the validation set, the incidence of PEP was 0% at score 0 (n = 252), 3.7% at score 1 (n = 163), 4.6% at score 2 (n = 132), 8.3% at score 3 (n = 97), 15.7% at score 4 (n = 70), and 16.7% at score 5 or higher (n = 36). The score-incidence graph of the validation set was similar to that of the training set (Fig. 1). The P value of the Hosmer-Lemeshow goodness-of-fit test on the validation set was 0.7814. The area under the ROC curve on the validation set was 0.791, which was similar to performance in the training set (0.799).
Risk stratification. Based on the score-incidence graph and ROC curve, all cases were stratified into two groups: a low-risk group (scoring 2 points or less) and a high-risk group (more than 3 points). The incidence of PEP was 2.0% at scores of 0-2 (n = 2033), and 13.4% at a score of 3 or higher (n = 686). All cases with severe or fatal PEP were classified as the high-risk group (Table 5). Among the 1381 ERCPs performed on patients without intact papilla and history of PEP, 96.2% had a score of 2 or less (low risk). Among these ERCPs, 19 cases developed PEP, with an incidence of 1.4%.

Discussion
Our new scoring system, which considers the respective weight of seven predictive factors, indicated excellent discriminability and reproducibility. This scoring system consisted of a history of PEP, intact papilla, difficult cannulation, PGW-assisted biliary cannulation, pancreatic injection, pancreatic IDUS/sampling from the pancreatic duct, and biliary IDUS/sampling from the biliary tract, which are well-known risk factors. 21,22 PEP developed in 13.4% of cases with a score of 3 or higher, and all severe or fatal PEP cases were classified with a score of 3 or higher. Therefore, a score of 3 or higher needs to be considered as high risk and proactive prophylactic measures need to be taken to prevent PEP. According to the 2020 European Society of Gastrointestinal Endoscopy (ESGE) recommendations, 100 mg of diclofenac or indomethacin is administered immediately before ERCP to all patients without contraindications. 6 However, prior to ERCP, it is possible to determine that most ERCPs for patients without intact papilla and history of PEP are low risk. Since NSAIDs sometimes cause adverse events, such as hypersensitivity reactions, these patients do not require NSAIDs before ERCP; NSAIDs only need to be administered after ERCP if the score reaches 3 or higher. 23,24 ASGE and ESGE recommend PPS for high-risk patients with easy pancreatic stenting: PGW-assisted biliary cannulation, transpancreatic sphincterotomy, and repeated inadvertent main pancreatic duct cannulation. 6,10 We also recommend PPS for patients with a score of 3 or higher and easy pancreatic stenting, particularly those with PGW-assisted attempts at biliary   cannulation. 25 Among patients receiving PGW-assisted biliary cannulation, the incidence of PEP was reported to be significantly lower in those with PPS than in those without stenting (2.9% vs 23%). 26 The effectiveness of aggressive hydration with lactated Ringer's solution was recently demonstrated. 3,4,27,28 In patients not at risk of fluid overload, it may be useful to initiate aggressive hydration when the score reaches 3 or higher.
Moreover, hospitalization, post-ERCP blood tests, fasting, and other types of post-ERCP management need to be provided based on the risk level.
The scoring system of Friedland et al. consisted of pain during the procedure, pancreatic duct cannulation, a history of PEP, and the cannulation attempt score. 11 The most prominent difference between the previous scoring system and ours is objectivity. "Pain during the procedure" and "the cannulation attempt score" are difficult to measure objectively because "pain during the procedure" is markedly affected by sedation levels, the types of analgesic agents used, and patient characteristics, and the definition of "cannulation attempts" is also ambiguous. In addition, our system uniquely included "intact papilla", which may markedly affect the difficulty of ERCP, and "PGW-assisted biliary cannulation," which is closely related to PPS. All of our factors are easy to measure, and, thus, our scoring system may be more reliable in clinical practice.
Chiba et al. proposed a prediction model using a propensity score analysis. 14 Their model consisted of five factors: intact papilla, PGW-assisted biliary cannulation, difficult cannulation, pancreatic injection, and the absence of a pancreatic stent, four of which are also examined in our model. Therefore, intact papilla, PGW-assisted biliary cannulation, difficult cannulation, and pancreatic injection are regarded as key factors for predicting PEP. Their model uses a propensity score to estimate the risk of PEP, which needs to be confirmed against a complex look-up table, whereas our model uses a simple addition of integer scores, which can be easily used by clinicians in the endoscopy room.
The present study had several limitations. NSAIDs were not evaluated because prophylactic NSAIDs had not been used at the three hospitals during the study period. In Japan, the national health insurance system only covers up to 50 mg of diclofenac or indomethacin, and the effectiveness of low-dose NSAIDs has not been demonstrated. Another limitation is that the present study was ERCP-based, not patient-based. Repeated ERCPs were treated as an independent procedure, and this may have induced clustering effects. However, the clinical courses of first and second ERCPs were not necessarily similar, even in the same patient. Therefore, we included all ERCPs.
In conclusion, this scoring system will serve as a useful prediction tool for PEP in clinical practice. For high-risk patients with a score of 3 or higher, we recommend aggressive preventive measures and close monitoring after ERCP.