Big Data in Organ Transplantation: Registries and Administrative Claims


  • A. B. Massie,

    1. Department of Surgery, Johns Hopkins University School of Medicine, Baltimore, MD
    2. Department of Epidemiology, Johns Hopkins School of Public Health, Baltimore, MD
    Search for more papers by this author
  • L. M. Kuricka,

    1. Department of Surgery, Johns Hopkins University School of Medicine, Baltimore, MD
    2. Department of Epidemiology, Johns Hopkins School of Public Health, Baltimore, MD
    Search for more papers by this author
  • D. L. Segev

    Corresponding author
    1. Department of Surgery, Johns Hopkins University School of Medicine, Baltimore, MD
    2. Department of Epidemiology, Johns Hopkins School of Public Health, Baltimore, MD
    Search for more papers by this author


The field of organ transplantation benefits from large, comprehensive, transplant-specific national data sets available to researchers. In addition to the widely used Organ Procurement and Transplantation Network (OPTN)-based registries (the United Network for Organ Sharing and Scientific Registry of Transplant Recipients data sets) and United States Renal Data System (USRDS) data sets, there are other publicly available national data sets, not specific to transplantation, which have historically been underutilized in the field of transplantation. Of particular interest are the Nationwide Inpatient Sample and State Inpatient Databases, produced by the Agency for Healthcare Research and Quality. The USRDS database provides extensive data relevant to studies of kidney transplantation. Linkage of publicly available data sets to external data sources such as private claims or pharmacy data provides further resources for registry-based research. Although these resources can transcend some limitations of OPTN-based registry data, they come with their own limitations, which must be understood to avoid biased inference. This review discusses different registry-based data sources available in the United States, as well as the proper design and conduct of registry-based research.


Agency for Healthcare Research and Quality


cold ischemic time


Centers for Medicare and Medicaid Services


deceased-donor kidney transplant


end-stage renal disease


Model of End-Stage Liver Disease


Nationwide Inpatient Sample


organ procurement organization


Organ Procurement and Transplantation Network


standard analysis files


State Inpatient Database


Scientific Registry of Transplant Recipients


Transplant Candidate Registration


Transplant Recipient Follow-up


Transplant Recipient Registration


University Health Consortium


United Network for Organ Sharing


United States Renal Data System


Studies based on national registries and other administrative data sets have made enormous contributions to the field of organ transplantation. Registry-based studies offer a number of advantages over clinical trials or prospective cohort studies. They are relatively quick and inexpensive to conduct, and ethical approval is often straightforward. Typically, registries allow for many more subjects than would be feasible with primary data collection; the larger sample size and multicenter nature of many registries enhance study power and allow researchers to conduct sophisticated multivariate or multilevel analyses. Registries also draw from more transplant centers than would be feasible in most cohort studies (in particular at small or nonacademic centers), meaning that inferences from registry-based studies are likely to generalize across the United States.

Historically, most registry-based studies in organ transplantation have used data collected by the Organ Procurement and Transplantation Network (OPTN). However, other large data sets exist that contain transplant-related data not available in OPTN-based data sets. Additionally, linkages of OPTN-based data sets to novel data sources can address questions, which may not be answerable from OPTN data alone.

This review will explore various national data sets in the context of their applicability to transplant research. For each data set, we will outline the data provided, identify key strengths and limitations, provide illustrative examples of research using the data and discuss relevant analytical and study design considerations. Additionally, we will discuss the proper design and conduct of registry-based studies.

OPTN-Based Data Sources

Since 1987, OPTN has collected data on all transplant recipients and waitlist registrants for solid organ transplantation, as well as all live and deceased organ donors. Unique data collection forms exist for each organ, and separate forms exist for adult and pediatric patients [1]. Most data are collected via one of four forms:

  • The Transplant Candidate Registration (TCR) form includes information at the time of listing: demographic data (e.g. age at listing, race, gender); prior transplant history; basic clinical information (e.g. height, weight, ABO); co-morbidities (e.g. diabetes, peptic ulcer, angina) and organ-specific information (e.g. exhausted access for kidney, portal vein thrombosis and transjugular intrahepatic portosystemic shunt for liver). A TCR is completed for every waitlist registration; if a patient registers twice (for the same organ after organ failure, for the same organ at two different centers or for a different organ), two TCRs are completed.
  • The Transplant Recipient Registration (TRR) form includes information from the initial transplant admission: pretransplant clinical data (e.g. height, weight, functional status); infectious disease status (e.g. HIV, cytomegalovirus, Epstein-Barr virus); data on the transplant procedure (e.g. cold ischemia time [CIT], procedure type); posttransplant clinical data (e.g. acute rejection during the initial hospitalization, creatinine at discharge for kidney, bilirubin and international normalized ratio for liver) and information on immunosuppressive medications. A TRR is completed for every live-donor and deceased-donor transplant; if a patient receives several transplants over time, a TRR is completed for each transplant.
  • The Transplant Recipient Follow-up (TRF) form includes information at each visit following a transplant: vital status, cause of death if applicable, graft status, patient education and employment status and clinical information (e.g. height and weight, infectious disease detection). Theoretically, a TRF is completed for every surviving transplant recipient at 6 and 12 months posttransplant and at 12-month intervals thereafter until the organ fails or the patient dies.
  • The Deceased Donor Registration (submitted by the organ procurement organization [OPO]) and Living Donor Registration forms (submitted by the hospital performing the donor operation) include information at the time of organ donation: donor demographics, co-morbidities, infectious disease status and cause of death (for deceased donors) or postoperative clinical information (for live donors).

Additional forms include the Living Donor Follow-up form, the Donor Histocompatibility form and the Post-Transplant Malignancy form. All forms are available at

In addition to the forms described above, the OPTN records waitlist status updates (e.g. Model of End-Stage Liver Disease [MELD] score changes for liver waitlist registrants, waitlist removals, status changes from active to/from inactive) and data generated through the organ allocation process (match runs including organ acceptance and/or decline).

The United Network for Organ Sharing

The OPTN data are linked by United Network for Organ Sharing (UNOS) to the Social Security Death Master File to augment ascertainment of candidate and recipient death. The resulting data are available free of charge to researchers and have been used in numerous important studies of transplantation [2-4].

The Scientific Registry of Transplant Recipients

The Scientific Registry of Transplant Recipients (SRTR) supplements OPTN data with data from various secondary sources. Notably, the SRTR obtains additional ascertainment of graft failure and death from the Centers for Medicare and Medicaid Services (CMS), cancer ascertainment from the Surveillance, Epidemiology and End Results program and additional death ascertainment from the National Death Index. The SRTR data set is used to compile SRTR program-specific reports [5, 6] and has provided data for many high-impact papers in organ transplantation [7-9]. The SRTR provides standard analysis files (SAFs) to researchers by request, for a fee. The markedly improved ascertainment of kidney graft loss is a key difference between SRTR and UNOS data, and constitutes a strong argument for using SRTR (rather than UNOS) data for analysis of kidney transplant outcomes: A 2005 study found that, of 4040 graft failures reported by either the OPTN or CMS, 22% were reported only by the OPTN and 13% were reported only by CMS [1].

The United States Renal Data System

The United States Renal Data System (USRDS) includes data on all patients in the United States who developed end-stage renal disease (ESRD) requiring renal replacement therapy—either dialysis or a kidney transplant—since 1995. In contrast to the OPTN data, which includes only transplant recipients and waitlist registrants, the USRDS data set contains data on patients irrespective of access to transplantation. Data from 1988 to 1994 are available but include only patients insured by Medicare. Data are drawn from a variety of sources, including the CMS, OPTN, the ESRD Networks and USRDS special studies [10].

Providers are required to file the CMS Medical Evidence Report (Form-2728) within 45 days of ESRD onset. This form captures demographics, insurance coverage, primary cause of renal failure, dialysis type, dialysis access type, laboratory values (e.g. HbA1c, creatinine), co-morbidities (e.g. chronic obstructive pulmonary disease, diabetes, myocardial infarction), functional status and access to nephrology care [10]. There are, however, several limitations. Unfortunately, there are no data on severity of co-morbidities, and validation studies have shown low sensitivity for some co-morbidities [11]. Furthermore, for most patients, this form is only filed at ESRD onset, so changes over time cannot be assessed.

Place, time and cause of death are ascertained for all patients via the CMS ESRD Death Notification (Form-2746), required to be filed by the provider within 45 days of a patient death. Other outcomes are ascertained from OPTN data and include listing for transplantation, receipt of a transplant and graft loss. OPTN data are included in the SAFs; researchers can obtain additional transplant analytic files, which include the UNOS kidney and kidney pancreas transplant follow-up data sets linked to USRDS [10].

For the subset of ESRD patients insured through Medicare, claims data are available and capture detailed longitudinal information on diagnoses, co-morbidities (via ICD-9 codes), treatment modalities and cost. Information on hospitalizations are derived from institutional claims, allowing for analyses of hospital readmissions [12, 13]. Beginning in 2006, data sets including detailed prescription drug information are available from Medicare Part D. In addition to Medicare claims data for ESRD patients, the USRDS data set also contains all claims from a randomly chosen 5% sample of Medicare participants irrespective of ESRD, allowing researchers to study relevant outcomes (e.g. chronic kidney disease or progression to ESRD) in the general Medicare population.

A disadvantage of Medicare claims is that inferences may not be generalizable to the non-Medicare population. Importantly, while all patients with ESRD are eligible for Medicare, patients under 65 lose eligibility 3 years after receipt of a kidney transplant. Furthermore, although claims data can be highly informative and allow longitudinal assessment of co-morbidities, they do not capture co-morbidities perfectly. A 2010 single-center study comparing ascertainment of cardiovascular disease events via USRDS-derived Medicare claims data versus electronic medical records found that Medicare claims captured 82–91% of events, depending on the algorithm used [14].

In addition to patient-level data, information about each dialysis facility is ascertained annually through a variety of sources including the CMS Facility Compare data, the Independent Renal Facility Cost Report (CMS 265–94) and the CDC National Surveillance of Dialysis Associated Diseases. This can be linked to patient data through Form-2728, which captures dialysis facility at ESRD onset. Examples of facility-level data include volume, number of deaths, facility ownership, chain affiliation, freestanding versus hospital-based, zip code and staffing (e.g. number of nurse practitioners, social workers, etc.) [10].

Researchers have used the USRDS data set to study a variety of topics including access to transplantation [15, 16], survival [17], complications [18], facility ownership and access to transplant [19] and treatment costs [20]. USRDS SAFs are available for a fee from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK); in addition to the Core data set, various SAFs (e.g. Medicare payment data, 5% Medicare sample, transplant data set) are sold separately [10].

Other Data Sources

Pharmacy and private claims data

Some researchers have created novel linkages between OPTN data and external claims data. Due to missingness or inconsistency in linking variables, linkage may require a complex algorithm [21]. Private claims provide the same advantages and limitations as discussed above for Medicare claims, albeit with different potential selection biases. Pharmacy claims potentially provide better ascertainment of immunosuppression and other posttransplant medication than OPTN data, but come with their own challenges and limitations. Linkages to private payer claims data have been used in studies of costs of liver transplantation [22] and postdonation morbidity in live kidney donors [23].

Nationwide Inpatient Sample

The Nationwide Inpatient Sample (NIS), maintained by the Agency for Healthcare Research and Quality (AHRQ), contains data on hospitalizations at about 1000 hospitals across the United States, comprising a 20% sample of hospitals [24, 25]. The sample varies from year to year. Sampling is stratified across five criteria (geographic region, public vs. private, urban vs. rural, teaching vs. nonteaching and bed size). Data available through the NIS include patient demographics; ICD-9-CM diagnoses and procedures; hospital charges and length of stay; discharge disposition; anonymized physician and hospital identifiers, and hospital characteristics (e.g. geographic region, teaching status, bed size). Much of the data in the NIS (notably charges, physician IDs and many diagnoses/co-morbidities) are unavailable in OPTN data. In contrast with Medicare claims data, the NIS includes patients with public and private payers, as well as uninsured patients. Also, since the NIS is a sampling of all hospitalizations, it includes patients who are not on a transplant waiting list but may nevertheless be of interest for transplant-related research questions.

However, NIS data have several important limitations. First, despite the stratified sampling mechanism used to create the NIS, a random sample of all hospitals is not necessarily an unbiased sample of transplant centers or transplant patients. For example, the total number of kidney transplants in the NIS sample rose from 2967 (at 47 distinct centers) in 2007 to 4119 (at 42 distinct centers) in 2008, even though the total number of transplants in the United States showed little increase (OPTN data). Also, NIS data lack longitudinal information (i.e. a single individual cannot be tracked across multiple hospitalizations). Therefore, NIS-based studies are limited to short-term, same-hospitalization outcomes such as cost, complications or perioperative mortality. Finally, because identifiers cannot be released, NIS cannot be linked to OPTN data.

Despite these limitations, NIS data have been used to examine exposures unavailable in OPTN data (including hospital and surgeon characteristics [26, 27] and Clostridium difficile exposure [28]); complication and cost outcomes [28, 29]; potential deceased donors who are ineligible for donation due to HIV [30] and insurance status of deceased donors [31]. NIS files may be purchased from AHRQ.

State Inpatient Databases

The State Inpatient Databases (SID) contain data on hospital admissions from individual states. Forty-seven states (excluding Alabama, Delaware and Idaho) participate in the SID [32]. They contain many of the same elements as the NIS, and in fact the NIS sample is drawn from the SID. Unlike the NIS data, which represent a 20% sample of hospitals in the United States, the SID are comprehensive (100%) for the states and years for which they are available. Also, for some states and some years, “revisit files” allow researchers to link multiple hospital admissions for a single individual [24], and some states provide anonymous physician identifiers, allowing researchers to link patients in multiple hospitals treated by the same physician [33]. While national studies using the SID would be possible in principle, they would be expensive and logistically complex, since data for each state and year must be purchased separately and data use agreements must be negotiated separately.

Data from the SIDs have been used to study perioperative complications of live liver donors in New York [34] and the relationship between hospital/surgeon volume and inpatient mortality in liver resection and transplantation in Maryland, Florida and New York [33]. SID files may be purchased from the HCUP, Rockville, MD.

Additional data sources

The University Health Consortium (UHC) is an alliance of 120 academic medical centers and 290 affiliated hospitals in the United States. The UHC maintains a database, available to member institutions, of de-identified patient data, including patient demographics, ICD-9 diagnosis and procedure codes, and billing and cost data [35]. UHC data have been used to study perioperative complications in live liver donation [34] and linked to OPTN data in studies of costs of liver transplantation [35, 36]. Data are available from the UHC,

Even without linkage, other national data sets can be used as negative controls in comparison to transplant data. For example, we have previously compared long-term survival in live kidney donors (from OPTN data) to matched, healthy nondonor controls drawn from the third National Health and Nutrition Examination Survey III [4, 37].

A summary of the advantages and disadvantages of the data sets described above appears in Table 1.

Table 1. Summary of selected data sources available for transplant research
Data sourcePopulationStrengthsWeaknesses
  1. ESRD, end-stage renal disease; NIS, Nationwide Inpatient Sample; OPTN, Organ Procurement and Transplantation Network; SID, State Inpatient Database; SRTR, Scientific Registry of Transplant Recipients; UNOS, United Network for Organ Sharing; USRDS, United States Renal Data System.
UNOSLive and deceased donors, transplant candidates, transplant recipientsRepresents entire US transplant population; longitudinal follow-up; available free of chargeLacks many co-morbidities; poor graft loss ascertainment
SRTRLive and deceased donors, transplant candidates, transplant recipientsRepresents entire US transplant population; longitudinal follow-up; good graft loss ascertainmentLacks many co-morbidities
USRDSESRD patients, 5% sample of MedicareLongitudinal follow-up; ESRD incidence on entire US population; rich claims dataClaims data limited to Medicare participants
NISInpatients at 20% sample of US hospitalsContains diagnoses and procedures unavailable from OPTN sourcesNo longitudinal follow-up; no long-term outcomes; 20% sample may not be representative of transplant population
SIDInpatients at hospitals in most US statesContains diagnoses and procedures unavailable from OPTN sources; link multiple records of one patient in some casesNo long-term outcomes; each state/year must be purchased separately
External linkagesDepends on linked data setAccess novel data beyond the scope of standard data setsLinkage is challenging

The Conduct of Registry-Based Studies

By the time a clinical trial or other prospective study begins, the investigators have already had to design a protocol and justify it to an institutional review board (and, likely, to funders). This process helps researchers carefully consider the questions they wish to address and the appropriateness of their methods, reducing the likelihood that severe design flaws will derail an expensive study or lead to biased inference.

With registry-based studies, barriers to the conduct of research are much lower. These data have already been gathered and are available at the start of the study. Even if investigators wrote a research protocol at the outset, they face no technical barriers in modifying analytical plans or investigating new questions on the fly. However, careful, a priori study design is as important for conducting proper analysis of registry data as it is for prospective studies or clinical trials. We will now consider general concepts and common pitfalls of study design and analytical approach in the context of registry-based studies.

Posing a research question

Proper study design begins with a well-articulated hypothesis. A researcher may hypothesize, for example, that prolonged CIT is associated with increased risk of graft loss in deceased-donor transplants. The hypothesis may come from existing knowledge of biological processes, from clinical observation or from a research finding in another field. In a prospective study, the hypothesis cannot come from the data itself.

Large registries can place thousands of variables at an investigator's fingertips. A few hours of clever coding could automatically examine pairwise correlations among all the variables in the OPTN database. However, many seemingly high correlations are inevitably spurious, a result of either confounding or statistical chance. Investigators should resist the temptation to fish for statistical associations in the absence of a biologically plausible hypothesis. Modern statistical packages generally include commands to perform stepwise regression, a set of techniques which essentially tests for statistical association at random from a list of possible exposures; in most situations, stepwise regression is best avoided.

Population selection

A well-defined research question inherently addresses a specific population. For example, an investigation of the association between CIT and graft loss in deceased-donor kidney transplant (DDKT) recipients addresses the population of DDKT recipients. The study design should reflect the population of interest. There are three populations for the researcher to consider: the target population, the source population and the study population [38]. The target population refers to a category of patients about whom the researcher hopes the study will give valid inference: for example, adult DDKT recipients, as above. The individual membership of the target population inherently cannot be specified; generally, the goal of clinical research is to provide insight into disease processes or treatments of future patients.

By contrast, the source population is a specific, enumerable set of individuals with the characteristics that define the target population. For example, a researcher might choose all adult, first-time, deceased-donor kidney-only recipients from 2005 to 2012 appearing in the SRTR registry. The study population consists of individuals who are actually included in a study. In a prospective study, individuals eligible for inclusion may not be contacted, or may refuse consent; such individuals fall in the source population, but not the study population. In registry studies, the source population and the study population are generally the same, unless the study design calls for using data from only a subset of eligible individuals (e.g. in a matched design, the matching algorithm may select only a subset of study patients from the source population) [4, 39].

In selecting a source population, the researcher must strike a balance between including a broad range of patients representative of the target population, and excluding atypical individuals whose outcomes might bias the results. For example, analyses of the general waitlist or transplant population sometimes exclude pediatric patients, patients with a prior history of transplant, and/or multi-organ registrants/recipients in order to describe the experience of a “typical” adult patient [8, 40]; consequently, analyses of these populations may not generalize to the excluded groups. Exact inclusion/exclusion criteria will depend on the nature of the research question, but typically, criteria will include at least an age range (e.g. adult patients), a date range (e.g. transplants from 2005 to 2010) and an organ/procedure type (e.g. kidney-only deceased-donor transplant recipients).

A study will suffer from selection bias if individuals in the study population are not representative of the target population, and this difference affects inference. In a comprehensive registry of transplant recipients, selection bias will not be a problem unless investigator-specified inclusion criteria are flawed. However, in registries that include only a subset of patients (e.g. the NIS 20% sample, or Medicare claims, which contain data only on Medicare patients), some selection bias may be inherent in the data set.

Data quality

In prospective studies, investigators work to ensure standardization of measurement, data collection and data entry. Investigators of registry studies do not have this luxury. Registry data may be collected at hundreds of different transplant centers, by thousands of individuals. Often, data are not gathered primarily for research purposes, but rather in the course of clinical care, billing or regulation. As a result, missing data and mismeasured data are realities of most registries. Careful exploratory data analyses of key variables are necessary to identify potential threats to data quality, and proper analytical techniques are required to avoid bias.

Data missingness may arise by design (e.g. in the case of MELD at transplant in liver recipients, which is missing from all OPTN data prior to the introduction of MELD-based allocation in 2002) or because it was not recorded or entered by treatment providers (e.g. CIT, which is missing for 30.3% of live-donor transplants between 1990 and 2005 [41]). In general, when considering OPTN data, those data used for organ allocation and recipient priority determination are generally the least missing, and those used in the PSRs are a close second; all other elements in the OPTN data require careful exploration before trusting them for a research study. Strategies for dealing with missingness include (in order of robustness): case-wise deletion (complete-case analyses), missing indicator variables and imputation [42]. Researchers need to be aware that, by default, most statistical software packages address missingness with case-wise deletion without warning the user. For example, a study of liver transplant recipients between 1990 and 2005 that “adjusts for MELD” becomes effectively a study of only those recipients between 2002 and 2005 (because of the missingness pattern described above), without warning the user that four-fifths of the study population was dropped. Published examples of misled naïveté in the face of missing data are, disappointingly, not uncommon.

Measurement error (and the risk for misclassification bias) can occur for a variety of reasons: Patients may not remember their health history, or may not report their history honestly; lab results may be flawed; providers may make mistakes in recording or entering data. In most statistical techniques, some data points are more influential than others, meaning that they have a greater effect on the overall summary statistic [43]. This is particularly likely to be true of outlier points. For example, if a weight of 250 pounds is mistakenly recorded as 250 kg, the erroneous measurement may lead to an artificially high estimate of mean weight among a group of patients. Measurement error can be difficult to detect, but researchers should examine the distribution of key variables, particularly over time. While not all outliers are influential in a statistical model, researchers should carefully consider values that fall outside of the normal range, and how influential these observations are (a determination for which statistical methods exist).

Statistical models

Modern statistical software largely automates the process of performing hundreds of statistical analyses, from simple analysis such as a t-test to complicated multilevel regression models. However, these tools make it easy for researchers to ignore the mathematical assumptions underlying statistical models. For example, linear regression assumes the expected value of an outcome variable Y varies linearly with each exposure variable x; that variance in the outcome is constant across all Y; and that residuals are normally and independently distributed [43]. If these assumptions are not checked, linear regression may easily lead to mistaken inference. The next section of this review will consider specific statistical models of particular relevance to transplantation.

Survival models

The most common form of outcome in transplantation is time-to-event or survival outcome, in which patients at risk for an event (e.g. graft failure or death) are followed starting at a defined point (e.g. date of transplant) until either the time of event or end-of-follow-up (censorship). Nonparametric models (e.g. Kaplan–Meier curves) make no assumption about the distribution of times to event; they can be fit to any survival data set [44]. Semi-parametric models make some assumptions about the distribution of events. For example, Cox proportional hazards models allow the risk of an event over time (the hazard) to vary according to the data. However, when two groups of patients are compared, Cox models assume that the relative risk of the event in one group compared to the other group (the hazard ratio) stays constant over time. For example, a hazard ratio of 2.0 comparing kidney recipients who had received a prior kidney transplant to first-time kidney recipients implies that the risk of graft failure is twofold higher for retransplant recipients at all times, from the day of transplant through the duration of the graft. Cox proportional hazards models have found wide use in transplantation [7, 8, 17]. However, caution is warranted. Some exposures (such as surgery) may increase risk in the short term while decreasing risk in the long term, in which case the assumptions of a Cox model are violated. Fully parametric models assume that hazard fits a specific distribution, such as a Weibull distribution [45] or generalized gamma distribution [46]. The proportional hazards assumption is not a requirement for some parametric models. All of the models described above assume that the risk of censorship is independent of the risk of the outcome of interest (the assumption of noninformative censorship).

Prediction models

Statistical models intended to predict individual outcomes are common in transplantation; examples include the MELD score [47] and the Kidney Donor Risk Index [48]. Prediction models require assessment of the model's predictive accuracy. Predictive accuracy can be partitioned into calibration (agreement between predicted and observed data) and discrimination (chance that an observation with a higher observed value will have a higher predicted value). Calibration can be assessed with Hosmer–Lemeshow tests [49] or calibration plots [50]. Discrimination can be assessed with the area under the receiver operating curve [49], or with the Net Reclassification Index or the Integrated Discrimination Index [51]. Prediction models should be validated by using them to predict data separate from the data used to create the model. This can take the form of internal validation, in which a model is validated using additional records from the same data set used for model building (e.g. cross-validation [52] or a bootstrap [53]), or external validation, in which a model is validated using a separate data source [54]. An example of using several different methods to validate a prediction model in transplantation is our work on the probability of discard/delay model for deceased-donor kidneys [55].

Statistical software

A variety of statistical packages are commonly used for statistical analysis of transplant data. SAS (SAS Institute, Cary, NC) has been commercially available since the 1970s and has wide use in health research. SAFs from UNOS and the SRTR are made available in SAS format, although software exists to convert them to other formats. Stata (Statacorp, College Station, TX) is a more recent alternative, which has become common in academia. An advantage of Stata is that it is available for a relatively low one-time cost, whereas an SAS license requires users to pay a yearly fee. R (R Foundation, Vienna, Austria) is a free, open-source statistical package. All three of the above packages have scripting languages, which allow researchers to write complex custom programs. R and Stata both benefit from large libraries of user-written functions, which supplement the official programs.


Registry-based studies have made substantial contributions to the field of transplantation, and will no doubt continue to do so in the future. In addition to the commonly used OPTN-based data sets, there are a number of other registries, which can be used to address a wide variety of research questions beyond the scope of the OPTN data. However, the availability of rich, comprehensive data sets does not obviate the need for careful study design. Properly conducted, registry-based research can provide novel insights to improve patient care and advance our understanding of the field of organ transplantation.


This work was supported by grant number 1R01DK0960008 from the NIDDK. The analyses described here are the responsibility of the authors alone and do not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products or organizations imply endorsement by the US Government.


The authors of this manuscript have no conflicts of interest to disclose as described by the American Journal of Transplantation.