Superiority of the new sex‐adjusted models to remove the female disadvantage restoring equity in liver transplant allocation

Model for End‐stage Liver Disease (MELD) and MELDNa are used worldwide to guide graft allocation in liver transplantation (LT). Evidence exists that females are penalized in the present allocation systems. Recently, new sex‐adjusted scores have been proposed with improved performance respect to MELD and MELDNa. GEMA‐Na, MELD 3.0, and sex‐adjusted MELDNa were developed to improve the 90‐day dropout prediction from the list. The present study aimed at evaluating the accuracy and calibration of these scores in an Italian setting.


| INTRODUC TI ON
2][3][4] Starting in 2016, MELD Sodium (MELDNa) has substituted MELD score, thanks to its higher ability to predict survival and stratify patients on the waiting list in risk classes. 5,6][9][10][11][12] This phenomenon led to various efforts to conceive a more equitable system.Attempts have been made at sex-balancing creatinine among the MELDNa variables, although this proved ineffective due to inherent limitations related to muscle loss and sarcopenia in most patients with CLD. 13 Recently, MELD 3.0 was developed using a simulated allocation system based on data from all the candidates registered on the liver transplant wait list in the 2016-2018 US national registry.Female sex and serum albumin were added to the MELDNa variables, allowing the reclassification of 8.8% of decedents to higher scores and, therefore, better chances of receiving LT, particularly in female patients with lower creatinine values due to smaller muscle mass. 14Furthermore, a single-centre experience from the United States also developed a sex-adjusted MELDNa score, where female LT recipients had more decompensated traits despite having lower median MELDNa scores. 15Finally, a large population (n = 7682) from the UK Transplant Registry consented to create the Gender-Equity Model for liver Allocation sodium (GEMA-Na): this score was externally validated in an Australian cohort (n = 1638), showing an improvement in the discrimination and a significant reclassification benefit compared with existing scores. 16alian LT patients compose a peculiar population due to a specific donor-(i.e. up to 50% prevalence of extended criteria donors) and recipient-related characteristics (i.e.elderly age). 17The Italian allocation system is based on MELDNa, offering national priority for patients with MELDNa ≥30. 18,19w the novel sex-adjusted scores [14][15][16] fit into such a particular setting has not been investigated yet.Moreover, discrimination ability and calibration of these scores have not been validated externally on a multicentre population.
We hypothesized that the recently proposed scores have a more accurate dropout prediction than the unadjusted versions in CLD patients awaiting LT.
Using the data of four LT centres in Rome, Italy, 20 we aimed to evaluate the predictive capacity of MELD 3.0, sex-adjusted MELDNa and GEMA-Na scores in an independent Italian cohort.

| Study design
This is a retrospective multicentre observational study investigating the data of cirrhotic patients enlisted for receiving a first LT.Data were prospectively recorded for allocation needs.
The Institutional Review Board approved the present study of the coordinator centre.The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines were followed to create the study.

| Setting
The involved centres were four LT centres in Rome, namely Sapienza University, Tor Vergata University, Catholic University and San Camillo Hospital.The four collaborative centres shared the listing criteria, the donor pool and the allocation system.

| Population
We performed a retrospective analysis of the data of 855 adult The study was performed following the ethical standards of the Declaration of Helsinki.

| Outcomes
The primary outcome of the study was dropout up to 90 days from the time of listing.Dropout was defined as mortality within the waiting list or delisting due to clinical deterioration.Patients were censored if they remained alive and active on the waiting list at 90 days or if they underwent transplantation or were excluded for reasons other than worsening before that timepoint.The last follow-up date was 31 March 2019.

| Definitions
For the calculation of MELD, MELDNa, MELD 3.0 with or without albumin, sex-adjusted MELD and GEMA-Na, the following standard formulas were used, as previously described: and 137 mmol/L. 6x-adjusted MELDNa was calculated according to the score map proposed by Sealock et al. 15 GEMA was calculated according to the formula: The Royal Free Hospital Glomerular Filtration Rate (RFH-GFR) was calculated at inclusion in the waiting list according to the original formula: 45.9 × (creatinine −.836 ) × (urea All the models were calculated at waiting list inclusion.The resulting scores were rounded to the nearest whole number to yield the corresponding scores.In the present analysis, in the case of patients receiving exception points (i.e.HCC), we calculated the laboratory scores without adding extra points.

| Statistical analysis
Continuous variables were reported as medians and interquartile ranges (IQRs).Categorical variables were reported as numbers and percentages.Missing data relative to study covariates always involved less than 10% of patients.In all the cases, missing data were handled with a single imputation method.In detail, a median of imputation in nearby points was adopted.The median instead of the mean was adopted due to the potentially skewed distribution of the managed variables.
The discriminatory ability of the GEMA-Na, sex-adjusted MELDNa and MELD 3.0 with and without albumin was tested against MELD and MELDNa.Model ability to rank enlisted patients for LT according to their risk of dropout within 90 days was evaluated with the c-statistic using the method by Harrell et al.The c-statistics and 95% CIs were reported. 21The method described by Kang et al. was adopted for comparing the c-statistics results. 22e Brier score and the Brier Skill score were reported with the intent to explore the additive accuracy in the prediction of the risk of 90-day dropout provided by the new proposed scores. 23A positive value of the Brier Skill score corresponds to a superior performance of the tested model with respect to the MELD considered as the reference score.
The models' reliability was tested by comparing the expected versus observed survival rates.The Greenwood-Nam-D'Agostino test was used for assessing the calibration between observed and expected event rates. 24Bar calibration plots were constructed to visually represent the agreement between predictions and observations in each group of risk.
The reclassification rate was defined as the proportion of patients with a score change of at least 2 when comparing MELD versus the other scores.Patients granted at least 2 extra points with the new models were considered upgraded on the list, whereas patients with the same score reduction were considered to be downgraded.
To calculate the number of potential lives saved, we first calculated the number of transplantations performed within 90 days in the whole cohort and considered this number equal to the number of available organs within this period.Subsequently, patients were ranked according to each prioritization score evaluated and the cohort was stratified in patients who would have been prioritized by both models, and patients who would have been differently prioritized by either of them.The number of potential lives saved resulted from subtracting the number of patients reaching the primary outcome in the new score-prioritized group from the number of patients reaching the primary outcome in the MELD-prioritized group, divided by the total number of patients included. 16p-value of <.05 was considered significant for all the analyses, and all tests were two-tailed.We used the SPSS statistical package Stratifying the population according to sex, several relevant differences were observed between the male and female groups regarding the underlying liver disease, HCC, anthropomorphic aspects and creatinine (Table 1).In detail, HCV and alcohol were more commonly observed in males as the cause of cirrhosis, and biliary cirrhosis was more commonly reported in females.Notably, no differences were reported in terms of median MELD (p = .22)and MELDNa (p = .08)between the two groups.On the contrary, when the sex-adjusted scores were calculated, the median value of the different scores was significantly higher in females compared No differences were observed in dropout when males and females were compared.
When a sub-analysis focussed on females only was performed, the c-statistics of all the explored scores declined, with the GE-MA-Na showing the best concordance (.79; 95% CI = .70-.88; p = .002).No statistical differences were observed comparing the different scores with the MELD used as the reference.

| Dropout prediction improvement with the proposed models
With the intent to clarify the magnitude of prediction improvement with the sex-adjusted scores, the Brier score and the Brier Skill score were calculated.Considering the best score as a Brier score = 0, the GEMA-Na reported the lowest value (.188) among the different scores, followed by the MELD 3.0 with albumin (.233) and the sexadjusted MELDNa (.254).
As for the Brier Skill score results, MELD always presented a reduction in the prediction ability.In detail, the GEMA-Na had the best prediction improvement (4.4%), followed by MELD 3.0 with albumin (3.0%) and sex-adjusted MELDNa (2.4%) (Table 3).
As reported in Figure 1, all the tested scores presented a good calibration when the predicted dropout events were compared with the observed ones.
Within the first 90 days after inclusion, one in nine dropouts could be potentially saved and one dropout per 285 patients included could be saved by using GEMA-Na instead of MELD.Using MELD 3.0 with albumin instead of MELD would potentially save one in 13 dropouts and could save one dropout per 428 patients included.Similarly, using sex-adjusted MELDNa instead of MELD would potentially save one in 13 dropouts and could save one dropout per 428 patients included (Table 3).
In a sub-analysis performed only including women, GEMA-Na showed the best results.Using GEMA-Na instead of MELD would potentially save one in three dropouts and could save one dropout per 54 women included.All the other scores showed smaller number of patients saved.

| DISCUSS ION
This study evaluated the reliability and accuracy of the recently developed sex-adjusted models in an Italian multicentric homogeneous cohort for reclassifying patients with cirrhosis awaiting LT.Our TA B L E 1 General patient characteristics and stratification according to sex.

F I G U R E 1
Concordance between observed and predicted curves in the investigated scores.
proposed model represents the critical point to forward its large-scale application, as suggested in the MELD 3.0 criticism following the study of Kim et al. 25,26 This is the first study investigating the calibration of the MELD 3.0 and sex-adjusted MELDNa.On the contrary, the GE-MA-Na score calibration and reclassification abilities have been already investigated in the original study proposing the new score. 168][29][30][31] In this perspective, an evolving transplant candidate population needs a continuous refinement of organ allocation tools to ensure and maintain fairness in LT.
Furthermore, the enlargement of the donor pool and the availability of novel algorithms capable of stratifying the early allograft failure risk pushed us to ameliorate the scores to predict recipient outcomes. 17,32ere has been growing awareness of sex-related inequities in recent years.4][35] Nevertheless, to date, none of the proposed tools to mitigate these inequities in LT has obtained a multicentre external validation, and neither reached widespread use in clinical practice. 11,36e recent proposal of replacing MELDNa with sex-balanced MELD 3.0 rejuvenated the debate around disparities in access to LT. 14 MELD 3.0 allowed a net reclassification of 8.8% of deceased patients (14.9% for women) who would have had better chances of survival if the system were in place. 14In parallel, Sealock et al. demonstrated sex-related differences in all the variables of the MELDNa score.The logarithmic nature of the score magnifies these differences, thus resulting in higher MELDNa values in males compared to females, despite higher decompensation counts in female LT candidates. 15A score map of sex-adjusted MELDNa values was calculated based on the lab values of a single-centre mixed population of healthy controls and CLD and LT population.This provided a less pronounced effect on the reclassification of patients on the waiting list (i.e.female individuals had a 1% higher transplant rate compared with a .7%higher rate for male individuals). 15dríguez-Perálvarez et al. created the GEMA-Na score sung a very rigorous statistical methodology. 16A European population was adopted for creating and internally validating the score, while an Australian cohort was used for external validation.GEMA-Na showed improved discrimination in predicting mortality or delisting due to clinical deterioration within the first 90 days after waiting list inclusion compared with MELDNa (Internal validation: .77vs. .74, p = .006;external validation: .77vs. .75,p = .014).Considering the patients that would have been differently prioritized for transplantation, one in 21 deaths could potentially be avoided by using GEMA-Na instead of MELDNa, showing an improved discrimination and a significant reclassification benefit. 16e composition of the CLD population in Italy has its peculiarities, mainly with a higher donor and recipient age and prevalence of HCC.Our work is the first to test the fit of the United States-derived MELD 3.0 and sex-adjusted MELDNa in a LT population outside the United States, and to explore the performances of GEMA-Na in the setting of a South European population.
Female LT recipients were 19.1% of our population, while the female-to-male ratio of Italian CLD patients reported in the literature ranges between 1:3 and 1:7. 28,31,37,38However, historical data do not reflect the evolution in the epidemiology of decompensated liver cirrhosis, due to the introduction of HCV direct -acting antivirals (DAAs).If we look at the sex differences by aetiology, male sex is clearly prevalent in the case of alcoholic cirrhosis, which is actually the first indication for LT, with a 2.3 male-to-female ratio.In a recent European study analysing the ELTR database regarding posttransplant outcomes, male sex was clearly prevalent over female (70.5% vs. 29.5%).Nevertheless, only few data about sex differences are reported in the literature with an intention-to-treat perspective, which is also a limitation of our work. 39The American series 14 reported almost twice higher values (36.8%).As well documented in numerous studies, we confirm the imbalance between female patients potentially needing a LT and those who effectively receive it.
Analysing different MELD-based scoring systems, we compared the performance of each in our study population and confirmed the improved dropout prediction ability of the novel scores compared with MELD.However, the new sex-adjusted scores do not appear to add significant discriminatory power concerning MELDNa, which is currently used in most allocation systems.
With the intent to unshell the magnitude of the prediction improvement with the novel scores, we applied the Brier Skill score. 23terestingly, having set MELD as the reference, we observed increased performance with the sex-adjusted scores compared to currently widely used scores for the prediction of 90-day dropout, with GEMA-Na resulting in the best score with a 4.4% prediction improvement.
Focussing on the number of avoidable dropouts after reclassifying the patients with the new scores, it is noteworthy that one Correlation plot of MELD and GEMA-Na.Based on the number of transplanted patients after the first 90 days (n = 296), the highest-ranked patients according to both scores separately were assigned a liver graft, as represented by the horizontal (graft granted by MELD) and vertical (by GEMA-Na) lines.
in nine dropout cases could be theoretically avoided using the GE-MA-Na, with one dropout avoided every 285 patients listed for transplantation.Even better, when only women were considered, using GEMA-Na instead of MELD would potentially save one in three dropouts and could save one dropout per 54 women included.
This datum potentially expands the role of the novel scores to a yet untested population that is epidemiologically different, if not unique.
As previously reported, in the work by Sealock et al, 15 female LT candidates presented lower values of MELDNa components and the score itself, despite a higher number of registered disease decompensation events before transplant.It is plausible that lower MELDNa scores observed in women affect the timing of listing.
Female patients may reach the threshold for listing (MELDNa 15 in many transplant programmes 40,41 ), probably after a longer disease course and increased CLD-related complications.Therefore, many female patients are supposedly deprived of the opportunity of LT due to overcoming clinical deterioration and fatal complications.In this context, using sex-corrected scores may allow timely access to the list, thus expanding the chance of receiving life-saving LT.
Our study suggests the potential of the novel sex-adjusted scores in minimizing and possibly resolving sex-related inequality in access to LT, as testified by the higher median value of these scores in the female group.
In detail, in our series, 6.2% of cases have biliary cirrhosis as an indication of LT, with a 16.0 female versus 3.9% male distribution, respectively.Biliary cirrhosis mainly results from primary biliary cholangitis (PBC) and primary sclerosing cholangitis (PSC).PBC has an overwhelming female sex prevalence (92%), while PSC has a 2:1 male-to-female ratio.Some pieces of evidence exist of higher waitlist mortality and dropout of patients with PBC, especially in the higher MELDNa category, 42 while PSC subjects show a lower mortality rate. 42,43MELD and MELDNa scores show lower predictive ability in subjects suffering from PBC if compared to other aetiologies and prove to be suboptimal tools in this category of patients. 44,45It is possible to hypothesize, at least in part, a sex-driven effect to explain this and speculate that the new sex-based scores can correct or mitigate this effect.New and specifically designed studies are needed to address this issue.
Recent evidence in cirrhotic patients with ascites revealed that long-term albumin administration improves survival, 46 prevents complications 47 and reduces ascitic recurrence. 48 Our work has some limitations.First, the study is retrospective, but this characteristic is shared with most studies testing new prognostic scores in LT.The proposal of such new scores in clinical practice will inevitably require a prospective testing phase in real life to verify the effects on liver graft allocation and effects on the waiting list, as pointed out in some comments on Kim and Sealock papers. 25,27,49Moreover, unlike Kim et al., we decided to include HCC subjects in the analysed population.As we decided not to consider dropout due to tumour progression among the study outcomes, we consider this aspect unable to affect our results significantly, also considering that dropout from the list because of death from tumour progression is an unlikely event.
Another relevant issue to underline is that the reduced sample size of the present study and the reduced number of events occurring at 90 days hampered our possibility to perform specific analyses of model accuracy in the group of women.In addition, the study was not powered to detect differences in waiting list outcomes between men and women.
In conclusion, sex adjustment of scores utilized for allocating organs should represent a useful tool in removing the penalization against female candidates.The performance of the novel scores in our work showed a superiority when compared with MELD.However, no statistically significant differences in discrimination ability were observed when the new scores were compared to MELDNa.
Yet, several issues remain unsolved, such as muscle mass weighing calculation and height of CLD patients (e.g.tall women with heavier muscle masses should not be prioritized over males with opposite characteristics) and the type of kidney dysfunction.
Recently, a height-adjusted MELD has been proposed in the literature using a simulation algorithm on data from the US Scientific Registry of Transplant Recipients (SRTR).This approach could provide an additional tool to counterbalance woman penalization in the allocation system without overcorrecting for taller women.

FU N D I N G I N FO R M ATI O N
None about the here presented work.

CO N FLI C T O F I NTER E S T S TATEM ENT
The authors have no conflicts of interest to declare about the present study.

(
≥18 years) cirrhotic patients listed for LT from 1 January 2012 to 31 December 2018.Non-inclusion criteria were (a) age <18 years, (b) non-cirrhotic disease or acute liver failure as the main indication for waiting list inscription, (c) combined transplantations Conclusions: Validation and reclassification of the sex-adjusted score GEMA-Na confirm its superiority in predicting short-term dropout also in an Italian setting when compared with MELD.K E Y W O R D S allocation, cirrhosis, equity, GEMA, liver transplantation, MELD Na, MELD, MELD 3.0, Na, sex Key points Waitlist priority for liver transplantation is defined through prognostic scores (namely, MELD and MELDNa).In our population, new scores considering sex (GEMA-Na, MELD 3.0 and sex-adjusted MELDNa) better predict the risk of death and delisting, thus improving equity in organ allocation.and (d) re-transplantation.Patients with hepatocellular carcinoma (HCC) on cirrhosis were included in the study.Although the MELD value was corrected in the HCC patients for the allocation process by adding exception points, 19 this study considered HCC patients with their 'biochemical' MELD and not with HCC exception points.
Obviously, such an administration can affect the result of the MELD 3.0 score with albumin.No data regarding albumin supplementation were available in our database.How albumin supplementation can affect the ability of the score to predict outcomes in cirrhotic subjects with ascites enlisted for liver transplant is unknown.Future studies are needed to balance the potential beneficial effects of long-term albumin supplementation with the reduction in the chance of LT resulting from a lower MELD 3.0 score.

A
balance between further refinements to MELD-based allocation systems and avoiding over-sophisticated scores should be the objective of future research to improve equity without hampering usability and the widespread of novel systems.AUTH O R CO NTR I B UTI O N SGiuseppe Marrone, Valerio Giannelli, Alfonso Wolfango Avolio, Gabriele Spoletini, Quirino Lai are responsible for the conception, design, analysis and writing of the study; Giuseppe Marrone, Valerio Giannelli, Flaminia Ferri, Ilaria Lenci, Raffaella Lionetti, Martina Milana, Tommaso Maria Manzia, Quirino Lai are involved with the collection and interpretation of data; Giuseppe Marrone, Salvatore Agnes, Alfonso Wolfango Avolio, Leonardo Baiocchi, Giammauro Berardi, Giuseppe Maria Ettorre, Stefano Ginanni Corradini, Antonio Grieco, Nicola Guglielmo, Massimo Rossi, Giuseppe Maria Ettorre, Gabriele Spoletini, Giuseppe Tisone and Quirino Lai participated in data management, review and editing of the manuscript.

Variables N = 855 Median (IQR) or n (%) Males 692 (80.9%) Females 163 (19.1%) p-value
Concordance for MELD, MELDNa, sex-adjusted MELDNa, MELD 3.0 with and without albumin, GEMA-Na.MELD score used as reference.Accuracy of sex-adjusted scores compared to currently widely used scores for the prediction of 90-day dropout.MELD is the reference.
a Evaluated with the method described by Kang et al. 22 TA B L E 2 Abbreviations: GEMA-Na, Gender-Equity Model for liver Allocation Sodium; MELD, Model for End-stage Liver Disease; MELDNa, Model for Endstage Liver Disease Sodium.