The Kidney Allocation Score: Methodological Problems, Moral Concerns and Unintended Consequences

Authors


* Corresponding author: Benjamin Hippen, benjaminhippen@gmail.com

Abstract

The growing disparity between the demand for and supply of kidneys for transplantation has generated interest in alternative systems of allocating kidneys from deceased donors. This personal viewpoint focuses attention on the Kidney Allocation Score (KAS) proposal promulgated by the UNOS/OPTN Kidney Committee. I identify several methodological and moral flaws in the proposed system, concluding that any iteration of the KAS proposal should be met with more skepticism than sanguinity.

Introduction

The impetus for replacing the current system of allocation comes from a broad consensus in the transplant community that current approaches to increasing supply or decreasing demand are insufficient, and that the current system of organ allocation is suboptimal in morally compelling ways. Specifically, the UNOS/OPTN Kidney Committee has argued that the current system is inefficient as regards the placement of extended-criteria donor kidneys that results in high discard rates, engenders unpredictable patterns of organ allocation, is responsible for variable access to organs based on blood group and geography and lower rates of transplantation of highly sensitized recipients, and results in demographic mismatches between donors and recipients, resulting in more death with graft function, and more organs of poorer quality being transplanted into healthier recipients, resulting in an increased need for retransplantation (1), p. 10–11. The Kidney Allocation Score (KAS) is offered as a remedy to each of these alleged inadequacies.

The variables incorporated in and excluded from the calculations of life-years from transplant (LYFT) portion of the KAS are listed in Table 1. LYFT is defined as an estimate of the number of years of life gained from a transplant minus the estimated number of years of life from remaining on dialysis, adjusted for quality of life. While the Committee has registered opposition to using LYFT maximization as a sole measure of how organs are allocated, LYFT calculations are offered as an improvement on the ‘wastage’ of kidneys under the current system, with the example of an ‘old’ donor organ transplanted into a ‘young’ recipient most frequently cited.

Table 1.  Variables included in calculations of future life-years from transplant (LYFT) (adapted from [1])
  1. 1Exclusion criteria consisted of lack of ‘objectivity, statistical significance, clinical importance, or data quality’.

Included variables
 Candidate age at offer
 Zero antigen mismatch
 Degree of mismatch at the HLA-DR loci
 Candidate and donor located in same donor service area
 Donor after cardiac death
 Donor age
 Donor cause of death
 Donor CMV serology
 Donor hypertension
 Donor weight
 Candidate years on dialysis at offer
 Candidate BMI
 Candidate albumin
 Candidate diabetes status
 Candidate previous transplant
 Candidate CPRA
 Candidate diagnosis of polycystic disease
Excluded variables1
 Candidate diagnosis of angina
 Candidate cerebrovascular disease
 Candidate peripheral vascular disease
 Candidate previous malignancy
 Candidate gender
 Candidate insurance status
 Candidate diagnosis of hypertension or drug-treated hypertension
 Candidate type of dialysis
 Candidate race/ethnicity
 Candidate HLA-A and HLA-B loci
 Candidate diagnosis of glomerulonephritis
 Candidate diagnosis of hypertension

Accordingly, the Kidney Committee has sought comment on how to integrate two other variables into the KAS to mitigate the undesirable consequences of a LYFT-only system. Since time on dialysis has long been recognized as a significant risk factor for early graft loss (2), patients with an extended vintage on dialysis would be disadvantaged under a LYFT-dominated allocation system, hence the suggestion that a patient's time on dialysis also be taken into account. Since elderly candidates would also be a priori disadvantaged by LYFT, a mitigating ‘Donor Profile Index’ is introduced primarily to preferentially siphon kidneys from young donors to young recipients and the growing number of kidneys available from extended-criteria donors to older recipients.

However, using LYFT calculations for allocation in any fashion raises methodological and moral concerns, and may give rise to unintended but foreseeable consequences. (Table 2) The methodological concerns take precedence, since whether or not LYFT ought to be part of how organs are allocated depends entirely on whether or not LYFT calculations are accurate. In the language of Kantian moral philosophy, ‘ought implies can’ (3). If the reproducible accuracy of LYFT calculations is doubtful, the moral case for introducing LYFT into a system of allocation is moot.

Table 2.  Overview of objections the KAS
Methodological flaws of moral interest
 No prospective validation of any iteration of KAS, in the context of the failure of prior prognostic studies
 Low index of covariance
   • LYFT calculations cannot reproducibly provide estimates of prognosis, and therefore cannot sustain supporting moral claims for increasing life years from transplant
 Risk factors may not be prognostic factors
   • Unintentional discrimination without increasing LYFT
 Limitations of the data set
   • Use of diagnostic categories with insufficient granularity (e.g. diabetes)
Moral concerns
 Little transparency about which patient subgroups would be disadvantaged by KAS
 No assessment of harms to balance claims of benefit from KAS
 No assessment of variable interpersonal comparisons of utility
 Insufficient discussion of why special dispensation for some groups at the expense of others is justified
Unintended, foreseeable consequences
 No account of foreseeable, undesirable effects implementing KAS on trends in living donation
 No forward projections of the effect on the waiting list of preferential allocation to patients with high LYFT scores
   • Homogenization of the list and gaming KAS at the margins
   •‘Trolling’ for organs with noncandidates with high LYFT scores

Methodological Concerns About LYFT

A crucial concern regarding the accuracy of LYFT calculations is that the model itself is based entirely on a retrospective analysis of prospectively collected data to estimate future outcomes. There has been no prospective validation of LYFT calculations to date. This should give pause, since as Meier–Kriesche and Kaplan have shown (Figure 1), past attempts to robustly project allograft half-lives have proven to be vast overestimates compared to observed outcomes (4).

Figure 1.

Projected versus actual Kaplan–Meier allograft half-lives, 1988–1995. From (4).

These past discrepancies impose a higher burden of proof on new prospective calculations of allograft outcomes, and a recent report by Wolfe and colleagues on the methodological underpinnings of LYFT only highlights this concern (5). One way of establishing the internal validity of a predictive formula is to examine the index of covariance (IOC) within a data set. Typically, the data set is divided into two groups that are demographically similar. Predictive calculations derived from the data set from the first group are then applied to the (demographically similar) second group. The degree to which the predictions derived from the first data set accurately predict outcomes in the second data set is the IOC, a number between 0.5 and 1.0. An IOC of 0.5 indicates that the accuracy of the prediction is no greater than chance, like a coin flip, whereas an IOC of 1.0 reflects 100% accuracy. Applied to the LYFT calculations, the IOC for waitlist survival was 0.6, patient survival 0.68 and graft survival 0.57 (5). In short, the transplant community is being asked to invest substantial moral confidence in the veracity of a calculation of allograft half-lives that has never been prospectively validated, the predictive accuracy of which is slightly better than chance. A score that poorly predicts survival cannot carry the weight of supporting moral arguments, which depend on accurate and reproducible predictions of survival.

Furthermore, the LYFT calculations include variables, which taken individually, may be associated with waitlist, patient and graft survival, but may not significantly change the IOC of a calculation of LYFT. To use a risk factor for prognostic purposes, the risk factor must also be able to discriminate between groups of recipients with different outcomes (e.g. graft survival versus graft loss) with high sensitivity and specificity (6,7). Absent that feature, false positive and false negative prognostic calculations abound. If a risk factor does not reliably discriminate between two opposite outcomes, adding that factor to the prognostic model will not increase the IOC. The moral import of including these variables in an organ allocation score is to codify discrimination against candidates who fall afoul of these variables, without increasing the prognostic power of the LYFT calculation, and thus without any reliable assurance of increasing life years from transplant by this allocation decision.

Why does LYFT perform poorly? Any answer is of necessity speculative, but one tenable hypothesis can be drawn from the limits of the data set itself. To take one example, the diagnosis of diabetes in LYFT is based entirely on whatever criteria happen to be employed by the individuals reporting the diagnosis to the registry. Accordingly, the definition of diabetes as used in the LYFT calculation lumps diabetes as the cause of ESRD with diabetes as a comorbidity, the presence or absence of end-organ damage from diabetes clearly correlated with patient outcomes (e.g. coronary artery disease, retinopathy, etc.), and it cannot account for the duration or control of diabetes. Although this lack of granularity is a built-in limitation of the data set, using this definition of ‘diabetes’ in a LYFT calculation clearly disadvantages patients who carry the diagnosis, but not the extensive morbidities of diabetes. The transplant community should not accept a radical restructuring of how organs are allocated based on flawed and substantially incomplete data, even if it is ‘the best we can do’. Pointing to the good faith efforts of our statisticians, which is not in doubt, is simply not sufficient, morally speaking.

Moral Concerns About LYFT

The purported benefits of incorporating LYFT into a KAS are familiar, if methodologically dubious. But, even if these methodological concerns were easily assuaged, conspicuously absent from the discussion of the KAS, in any of its iterations, is what groups of recipients would be disadvantaged by the new system. For example, predictions of which and how many listed candidates are likely to die on the list are not discussed in any iterations of the published LYFT models. If LYFT is a strict utility calculation (maximizing LYFT), it is missing a feature common to nearly all utility calculations in other contexts: minimizing harm. KAS proponents repeatedly emphasize the benefit of more years from transplant, but what about those who will remain on dialysis or die without a transplant? If years from transplant is a calculable benefit, is not death a calculable harm? If death is a harm to those who die on the waiting list, then how many life-years gained from transplant are required to ‘cancel out’ the harm of death?

Older recipients and interpersonal comparisons of utility

KAS calculations raise other perennial problems for utility functions: incongruities in interpersonal comparisons of utility, and discount rates over time for graft survival. While LYFT does have a discount rate for time for transplantation versus dialysis (the Kidney Committee included a 0.8 quality-of-life adjustment for years of survival on dialysis vs. transplantation), LYFT treats a year as a year, as regards graft survival. The value of the 15 → 20th year off dialysis in a transplanted young person is equal in value to the value of 0 → 1st year off dialysis for five potential recipients in their 60 s at the time of transplant. So, ceteris paribus, if resource limitations compel a zero-sum choice between the two, maximizing LYFT entails that the value of the 15→ 20th year off dialysis for a young person is actually more valuable than 0 → 4th year off dialysis for a single 60-year-old recipient, since the latter is a direct consequence of valuing the maximization of number of life years for recipients taken as an entire group (8).

Is this morally uncontroversial? Consider: the fastest growing cohort of incident ESRD patients is those aged 45–64, a trend mirrored by the age trend of new incident waitlisted patients (9). Five additional years of life to a 60-year old, and to that person's family and friends, might be quite valuable indeed, especially when compared to zero additional years of life. In fact, it could plausibly be argued that especially empathetic younger patients who would stand to get 20 rather than 12 or 15 years (with a lower-quality organ) might agree that their additional 5- to 8-year benefit should not come at the expense of the 4 or 5 years added to the life of the 60-year-old patient. But, as societal judgment about allocation policy, any generalizations about the attitudes of this or that generation of dialysis-dependent patients are almost certainly wrong, since pluralities of opinion on obligations to others, and the value and quality of one's life, are by no means easily or usefully demarcated by age.

As an example of the difficulties of generalizing about attitudes toward transplantation based on age, consider the question of whether there should be a ‘discount rate for time’ on dialysis, a slightly different way of thinking how to integrate ‘dialysis time’ into an allocation system. Suppose one wanted to justify preferentially transplanting the young at the expense of the old. One approach is to argue that time on dialysis is worse for young people, rather than old people. Not because of an abridged life-span, (the young live longer on dialysis) but because dialysis interferes with the activities more associated with youth (raising a family, holding a full-time job, etc.), with the implication that older persons have already had an opportunity to do these sorts of things, and therefore have an inferior moral claim to an organ (10). The problem is that the reasoning can be just as plausibly reversed: such a system would systematically deprive older patients of the opportunity to enjoy their retirement, their adult children and grandchildren, the opportunity to travel and enjoy their remaining years after a lifetime of taxpaying and work, etc (11).

Group membership and special pleading

Even when it is possible to accurately prognosticate utility gains based on being a member of a demographic group, the emergence of public, vociferous conflict between groups is entirely foreseeable if the transition is made to allocating a scarce resource on the basis of group membership, rather than based on individual need for an organ. Under conditions of scarcity, of all candidates who might potentially benefit, some will be advantaged at the expense of others. A kidney allocated to ‘A’ is therefore not allocated to ‘B’, ‘C’ or ‘D’. If it is the considered opinion of LYFT's proponents that some recipients should receive priority over others in the allocation of kidneys, transparency requires explicitly stating and defending such claims, which would also mean being explicit about which groups are likely to be disadvantaged in the process. Undoubtedly, elderly recipients, recipients with diabetes, recipients with polycystic kidney disease and other groups not already accorded special dispensation will desire the opportunity to proffer their competing claims to organs.

This is not novel territory in discussions of organ allocation. Special pleading for priority in the current allocation system has been explicitly defended for pediatric patients, highly sensitized recipients and prior living donors with kidney disease. But, if increasing transplantation for specific ethnic groups is judged to be a virtue of the new allocation system, surely those who will be disadvantaged in the new allocation system are deserving of some justification. At a minimum, any proposal to revise the allocation system should be transparent about who is harmed, as well as who benefits, relative to the current system. It is this crucial point that undergirds a litany of moral concerns about incorporating LYFT calculations into allocation decisions.

In short, it is prima facie unclear why a system of allocation of a legally designated public resource, which deliberately privileges the collective gains or losses of some demographic groups over others, is morally justified. And, it is simply implausible in a pluralistic society to assume that broad consensus on this question exists. Even if such consensus were shown to exist in the transplant community, it is nowhere justified that the transplant community possesses superior moral authority to impose its collective will on how a public resource ought to be distributed. This is another confusion of facts (expertise on predicting outcomes after transplantation) with values (arguments regarding which allocation system is more justified than others).

Recognizing the problem, the proponents of KAS suggest recourse to the principles of allocation elucidated by the Department of Health and Human Services (HHS) (1), p. 9–10, which stipulate that organ allocation should be predicated on ‘sound medical judgment’, ‘seek to achieve the best use of donated organs’, ‘be specific for each organ type’ and ‘be designed to avoid wasting organs…promote patient access to transplantation, and to promote the efficient management of organ placement’. But, these principles offer little substantive help, since there is demonstrable disagreement over what constitutes the ‘best’ use of organs, and over whether it is even possible to ration a scarce resource both ‘equitably’ and ‘efficiently’, without abandoning the debate by sanctifying one's own favorite solution with the benediction of ‘best medical judgment’. These high-minded concepts are so elastic as to be nearly devoid of content, merely an adornment for which conflicting theories of resource allocation compete. Whatever one's considered views on the matter, debating and dissecting fundamental disagreements about what these vague principles ought to mean for the purpose of organ allocation should be at the beginning of any discussion of resource allocation, unencumbered by any obligation to arrive at ‘consensus’, since that forecloses the possibility that ‘we’ may never reach broad agreement on these points (12).

Robust moral arguments to privilege certain demographic groups over others in the allocation of a scarce resource can be marshaled and critically defended. But, such debate requires a transparent accounting of both the anticipated harms and benefits. Since organs are (statutorily and historically) understood as a public resource, knowing whom among the public is disadvantaged by a new allocation system is as morally important as who (allegedly) benefits.

Unintended Consequences

Even if the methodological and moral concerns with the KAS system, however, constructed might be overcome, there are plausible reasons to believe that changing allocation of deceased-donor organs from a system based on waiting time and blood type to a hybrid calculation of utility and controversial social justice goals will have a number of foreseeable unintended consequences.

Implementing KAS may result in broad, unintended changes in patterns of living donation, a concern already validated by the results of the existing allocation exception for pediatric recipients. As Ross and Thiselthwaite reported, although pediatric recipients manifestly benefit more from living donor kidneys compared to deceased donor kidneys, the institution of the pediatric exception in 2005 resulted in fewer living donations directed to pediatric recipients (Figure 2) (13). On the other end of this trend, an allocation system dominated by LYFT will result in subgroups of patients with lower LYFT scores enduring longer waiting times, which in turn has been correlated with higher rates of transplantation from living donors, an outcome equally inconsistent with the goal of maximizing LYFT (14).

Figure 2.

Trends in living and deceased donation after implementation of the pediatric exception in 2005 (from SRTR).

Both trends suggest additional unintended and undesirable consequences for recipients. First, young, healthy adult patients with minimal comorbidities will be more likely to forego living donor offers, in favor of easier access to deceased donor organs. Second, an iteration of KAS that privileges recipients with high predicted LYFT scores will result in a homogenization of the waiting list, an invitation to gaming KAS scores at the margins. Since rate of growth of the youngest (age 18–34) cohort of listed recipients has been flat for a decade, disproportionately removing those patients from the list will quickly and substantially change the demographic makeup of the existing list, rendering the LYFT-driven KAS score of the remaining recipients (largely middle-age, majority diabetic) very similar. Such marginal differences in LYFT scores will either (i) substantially raise the importance of other variables such as time on dialysis, or (ii) encourage gaming of specific demographic variables at the margins, to provide one recipient a slight but suddenly crucial LYFT-calculated edge over tens of thousands of others. Finally, a KAS system in which either LYFT or time on dialysis is disproportionately weighted may encourage ‘trolling’. Trolling is when patients with favorable demographic features resulting in high KAS scores, who are nevertheless known not to be candidates for transplantation for other reasons, are listed to attract organ offers. When these candidates are suddenly ‘discovered’ not to be candidates after an organ offer is accepted, local backup allocation waivers offered in order to avoid excessive cold ischemia time would undermine allocation patterns originally intended by KAS.

What Is to Be Done?

The United States has a system of allocation for kidneys from deceased donors already in place. Whatever its flaws, this system has the virtue of being easily understood by patients and transplant professionals alike. Trivial as some critics of the current system may believe this to be, the relationship between a clear and widespread comprehension of how the allocation system works and the maintenance of public trust in the system of allocation should not be underestimated. The methodological flaws, moral inadequacies and unintended consequences of a KAS-based allocation system, however, its various components are weighed, unjustifiably threatens this hard-won edifice in exchange for a dubious promise of gains, predicated on calculations that are both questionable and inadequately comprehended by the larger transplant community and its many lay-constituencies.

Still, challenges to the current system of allocation offer occasion to reflect on some of its virtues. In addition to being clear and easily understood, the current system recognizes that individual recipients have prima facie individual claims to a public resource. These individual claims are a serviceable default moral position: until argued otherwise, the claims of individuals in need of an organ ought to be treated equally, and equal privileging of individual claims to a public resource invites us to cast a jaundiced eye on exceptions and special pleading from all quarters. The problem of allocation (or rationing) (15) remains, but the starting point for the discussion is clear.

This starting point casts a different light on considerations of utility. On this view, expected utility (and disutility, or harm) of various options are considered relative to the individual in question. While this doesn't get one much further along in the discussion of scarce resource allocation, employing utility calculations in this fashion privileges maximizing utility for discrete individuals over maximizing group utility. KAS, on the other hand, is designed to maximize group utility while mitigating the repugnant conclusions of utility maximization by singling out some disenfranchised populations at the expense of others. Politically, the current system treats individuals as such, whereas the KAS system, however, unintentionally, subsumes individual claims to an organ to demographic identities without clear empirical substantiation or plausible moral justification.

None of these stipulations will significantly increase the number of kidneys available for transplantation, and the daunting challenges of addressing demographic and geographic disparities in access to organs remain. The excruciations of rationing organs in the face of rising demand was the impetus for the gains in deceased donation made by the Organ Donor Collaborative. The successful efforts of the collaborative in increasing procurement rebutted the assumption that there were few gains to be made in deceased donation. But, the inadequacy of these efforts to meet the growing demand for organs implies that any attempt to realize promised gains of any significance by revising the current allocation system is a futile errand (16). This plausible possibility stands as a challenge to the various anathemas against the introduction of a spectrum of incentives to increase living donation (17,18). The demoralizing alternative is to inform more and more of our patients that they are simply beyond our assistance.

Just as negative studies also offer positive knowledge, the hard work and good-faith efforts of the Kidney Committee have not been in vain. Rather, the KAS system demonstrates the methodological and moral limits of using a robust prognostication scoring system for allocating kidneys based on the data sets currently available. Awareness and confession of the limits of our empirical knowledge has deep moral value. It usefully reminds all of us that in the practice of medicine, facts are ultimately in the service of moral obligations to individual patients.

Ancillary