Microarray Diagnosis of Antibody-Mediated Rejection in Kidney Transplant Biopsies: An International Prospective Study (INTERCOM)

Authors


Abstract

In a reference set of 403 kidney transplant biopsies, we recently developed a microarray-based test that diagnoses antibody-mediated rejection (ABMR) by assigning an ABMR score. To validate the ABMR score and assess its potential impact on practice, we performed the present prospective INTERCOM study (clinicaltrials.gov NCT01299168) in 300 new biopsies (264 patients) from six centers: Baltimore, Barcelona, Edmonton, Hannover, Manchester and Minneapolis. We assigned ABMR scores using the classifier created in the reference set and compared it to conventional assessment as documented in the pathology reports. INTERCOM documented uncertainty in conventional assessment: In 41% of biopsies where ABMR features were noted, the recorded diagnoses did not mention ABMR. The ABMR score correlated with ABMR histologic lesions and donor-specific antibodies, but not with T cell–mediated rejection lesions. The agreement between ABMR scores and conventional assessment was identical to that in the reference set (accuracy 85%). The ABMR score was more strongly associated with failure than conventional assessment, and when the ABMR score and conventional assessment disagreed, only the ABMR score was associated with early progression to failure. INTERCOM confirms the need to reduce uncertainty in the diagnosis of ABMR, and demonstrates the potential of the ABMR score to impact practice.

Abbreviations
ABMR

antibody-mediated rejection

ATAGC

Alberta Transplant Applied Genomics Center

AUC

area under the receiver operating characteristic curve

BFC403

403 biopsies for cause

DSA

donor-specific antibody

GN

glomerulonephritis

IDI

integrated discrimination improvement

IFTA

atrophy/fibrosis

INT300

300 new indication biopsies from consenting subjects in six kidney transplant programs from five countries

MRS

median risk score

NRI

net reclassification index

SDMS

Scientific Data Management System

TCMR

T cell–mediated rejection

Introduction

Antibody-mediated rejection (ABMR) is the major cause of kidney transplant failure [1-3], but the conventional assessment of ABMR is problematic. ABMR was first identified on the basis of microcirculation lesions and circulating donor-specific antibody (DSA) [4] and was later found to be associated with immunostaining for complement factor C4d [5]. The diagnosis depends on three sets of features: histologic lesions, C4d staining and DSA [6], each with its own limitations. Histology assessment has poor reproducibility, as shown by the low agreement between observers [7], and relies on nonspecific lesions (microcirculation changes, fibrosis, arterial fibrous intimal thickening) that also occur in other conditions such as T cell–mediated rejection (TCMR) and glomerulonephritis (GN) [8]. C4d staining is performed with two different methodologies and misses the majority of cases [9], and DSA measurements vary between laboratories [10]. There is intrinsic heterogeneity within ABMR, for example, in acuity, chronicity, antibody subclass and antigen targeted. Because expert pathologists have failed to agree on criteria for diagnosing C4d-negative ABMR [11], there is no standard system for diagnosing this major disease. Moreover, any arbitrary consensus that eventually emerges will not solve this problem: Alternative systems that are proposed must be evaluated in prospective trials, and will remain subject to the inherent limitations of histology and C4d staining. This problem of a weak diagnostic gold standard, which is widespread in disease studies, not only affects patient care but makes the assessment of new tests difficult—a problem known as Reference Standard-related bias [12, 13].

Because ABMR alters gene expression [14-16], measurement of transcripts in biopsies may be a solution to this impasse. We recently developed a microarray-based test for ABMR in a prospective study of 315 patients undergoing 403 biopsies for cause (the “BFC403” population) [14]. To assign conventional diagnoses in the absence of a gold standard, we first created a diagnostic Reference Standard for interpreting the conventional features [14]. Using the Reference Standard labels, we developed classifiers that used microarray measurements to assign an ABMR score between 0 and 1.0 to each biopsy, with an arbitrary cutoff of 0.2 selected as positive. The genes selected by the classifier algorithm were primarily endothelial, plus some that reflected interferon-γ effects and NK cells. The ABMR score correlated with the presence of DSA and microcirculation lesions, with the Reference Standard diagnosis of ABMR and with graft survival. The accuracy of the ABMR score for the Reference Standard diagnoses was 85%, but some discrepancies were noted as expected given the controversies in the conventional diagnoses and the uncertainties about the interpretation of the ABMR score at the 0.2 cutoff.

Because newly developed diagnostic tests must be validated in a new population, we undertook the present prospective INTERCOM study to assess the ability of published microarray-based tests to identify disease states such as TCMR and ABMR in unknown biopsies, and to estimate the potential impact of the tests on practice. As recently published for the TCMR score [17], the study design simulated that of the multicenter studies of immunosuppressive drugs, in which the conventional diagnosis is assigned by the clinicians and pathologists in established centers using their standard of care. In the present analysis, the conventional diagnosis or suspicion of ABMR is compared to the centrally assigned ABMR score. To reflect actual practice and acknowledge the controversy among experts on the criteria for C4d-negative ABMR, we did not attempt to “correct” the local assessment or perform central review, since there is no basis for assuming any one central reviewer is more correct than the experienced teams in these centers. We collected 300 new indication biopsies from consenting subjects in six kidney transplant programs from five countries (INT300) to sample the international kidney transplant population. The ABMR scores were assigned from the microarray measurements by a classifier developed in the reference set. We compared the ABMR score to the local assessment of lesions, DSA and diagnoses or suspicion of ABMR, and studied the relationship of the molecular and conventional assessments to graft survival after the biopsy. The hypothesis was that ABMR score would confirm the findings in the reference set, and would thus (i) agree with the conventional assessment of ABMR in most cases; (ii) reproduce the patterns of discrepancies seen in the original BFC403 population and (iii) add prognostic insight beyond the conventional assessment of ABMR.

Methods

Patient population, specimens and data collection

INTERCOM (clinicaltrials.gov NCT01299168) was a prospective observational study involving Edmonton, Minneapolis, Baltimore, Barcelona, Hannover and Manchester. Local institutional review board approval was obtained in each center. Inclusion criteria specified all kidney transplant recipients ≥18 years of age undergoing a kidney biopsy for clinical indications that were able and willing to give informed consent. However, the centers did enroll two patients under age 18, and they were left in the study.

The local biopsy assessment was performed as per standard of care, without knowledge of the microarray results. The local C4d staining method was either immunofluorescence (University of Alberta and University of Minnesota at Fairview) or deparaffinized immunoperoxidase (University of Maryland, Hospital Vall d'Hebron, Hannover Medical School and Manchester Royal Infirmary). Positivity was determined using published guidelines [6]: diffuse staining by immunofluorescence (Grade 3) or diffuse/focal by immunoperoxidase (Grades 2 and 3). One 5 mL serum sample was collected for local HLA antibody testing at the time of biopsy.

The study required one additional biopsy core beyond those obtained as standard of care. The six centers contributed 320 samples collected between October 2008 and March 2012, of which 300 were suitable for microarray processing. The 20 excluded biopsy specimens were either too small or improperly collected and stabilized. We also included control kidney samples from histologically normal areas of six nephrectomies for renal carcinoma.

Data collection

Copies of the original pathology reports (with patient identifiers removed), along with relevant clinical and laboratory data, were sent to the Alberta Transplant Applied Genomics Center (ATAGC). This information was stored in anonymous fashion in a Scientific Data Management System (SDMS) at the University of Alberta.

Classification of the local center pathology reports

The local assessment of ABMR based on conventional features (histology, C4d staining and DSA) was often expressed using terms of uncertainty such as “suspicious for” and “rule out.” To compare the local assessment to the ABMR score, a pathologist in Edmonton (Dr. M. Mengel) read all reports to ensure that the diagnoses recorded in our SDMS represented the opinion or suspicion of the local center as stated in the pathology report. For comparison with the reference set, these diagnoses were assembled into categories previously used for the reference set [14] without knowledge of the ABMR scores, using the rules outlined in Table S1.

Biopsy processing and microarray analyses

To prevent mRNA degradation, the study core was immediately stabilized in RNAlater® (Life Technologies, Carlsbad, CA), kept at −4°C for 24 h and stored at −20°C until processing. Samples from distant centers were batched and shipped to ATAGC on dry ice for processing on Affymetrix U133 2.0 microarrays (Santa Clara, CA).

We normalized microarray results using the Bioconductor “RefPlus” package. Because Affymetrix changed their labeling kit before the INT300 samples were processed, BFC403 and INT300 were normalized as separate batches. The BFC403 expression values were then adjusted for batch effects using the Ratio-G method [18]. All analyses used “R” version 2.12.1 (64-bit) [19], with various libraries from Bioconductor 2.8. Microarray expression files for BFC403 are posted on the Gene Expression Omnibus website (GSE36059). Files for INT300 will be posted at GEO on publication.

Assigning ABMR scores

ABMR scores were assigned using a classifier algorithm built using the BFC403 reference set [14], details of which can be found in Supplementary Methods. Since INT300 can be considered an independent test set, the entire BFC403 data set was used to train the classifier, rather than using the cross-validation method from the original paper. The classifier output is a score between 0.0 and 1.0, reflecting the probability that ABMR is operating in the biopsy.

Survival analysis

Since graft failure is uncommon in patients undergoing an indication biopsy less than 1 year after transplantation [1], survival analyses were restricted to late (>1 year) biopsies. Within this subpopulation, if a patient had more than one biopsy, the first that had either histologic or molecular ABMR was used. If none had ABMR, the earliest biopsy was used. In order to compare the predictive ability of the molecular and histologic diagnoses, a time cutoff was required and we chose 3 years postbiopsy. All biopsies that were functioning at 3 years were given a censor time of 3 years, and assigned a graft status of “working,” regardless of any failures in the post–3-year period. Thus, all survival analyses in this study refer to 3-year survival. Cox regression and Kaplan–Meier plots were performed using the “R” package “survival.” Comparison of the predictive ability of different models used the integrated discrimination improvement (IDI), net reclassification index (NRI) and median risk score (MRS) improvement methods in the “R” package “survIDINRI” [20].

Results

The study analyzed 300 unselected biopsies for clinical indications from 264 patients. The INT300 population was similar to the BFC403 reference set, as expected, since both were comprised of unselected indication biopsies (Table S2). The follow-up time was shorter in the more recent INT300 (384 days) versus BFC403 (1136 days).

INTERCOM was designed to compare central microarray assessment to local standard-of-care interpretation of the biopsy. Unlike the reference set, which had to be assessed centrally to develop a new Reference Standard after recognizing the importance of C4d-negative ABMR, INT300 was designed to validate the ABMR score in new biopsies and to assess its potential impact in established centers. Thus, we compared the ABMR score to the local pathology reports recording and interpreting the conventional features (histology, C4d, DSA). Because local assessment of ABMR features was often reported in text, the report was interpreted by a pathologist at ATAGC (Dr. M. Mengel) to reflect the local interpretation, including suspected ABMR, and permit comparison with the reference set. Thus, no changes were made to the local assessments and there was no central reading of histology, C4d or DSA.

Conventional assessments by the local center

As expected, there was considerable uncertainty in the conventional assessment of ABMR in these internationally recognized centers, echoing the lack of agreement in the Banff group [11]: 19 of 46 biopsies (41%) with ABMR features reported in the text did not mention ABMR or possible ABMR as primary or secondary diagnoses (Table 1). This was particularly apparent in C4d-negative biopsies with ABMR features, where 15/27 (56%) did not mention ABMR as a diagnosis. In contrast, there was certainty about TCMR: 30/32 reports (94%) describing diagnostic TCMR features indicated TCMR as the primary or secondary diagnosis.

Table 1. Interpretation of local center pathology report, reflecting histology, C4d staining and DSA
Histology-DSA diagnosisnLocal report
Clear statement in the final diagnosisStatement of suspicion in the final diagnosisFeatures noted only in the text
  1. ABMR, antibody-mediated rejection; DSA, donor-specific antibody; TCMR, T cell–mediated rejection.
  2. Clear statement: cited ABMR, humoral rejection or TCMR inside “Diagnosis” section, WITHOUT uncertainty phrases (probable, possible, suspicious).
  3. Indication of suspicion: cited ABMR or “humoral rejection,” TCMR inside “Diagnosis” section WITH uncertainty phrases (probable, possible, suspicious, rule out).
  4. Features noted only in the text: The report did not specifically mention ABMR or humoral rejection, or TCMR or cellular rejection, inside the “Diagnosis” section. However, the text noted the ABMR features and raised the possibility of ABMR.
All ABMR or mixed4618919
C4d+ ABMR13913
C4d− ABMR276615
Mixed6321
TCMR322912

The local centers identified or suspected ABMR in 46/300 biopsies (15%): 13 C4d-positive; 27 C4d-negative and six mixed (all C4d-negative; Table 2). This is lower than the frequency of ABMR and mixed in the reference set—87/403 biopsies (22%), but the INTERCOM centers classified 20 biopsies as transplant glomerulopathy with no mention of ABMR, compared to only four in the reference set (Table 2). Transplant glomerulopathy can result from several conditions and is not synonymous with ABMR [21].

Table 2. List of diagnoses based on histology, C4d and DSA
Histology-DSA diagnosisINTERCOM (INT300)1: interpreted from local biopsy reportsReference set (BFC403): Reference Standard classification
Early (<1 year)Late (>1 year)All (% of total)Early (<1 year)Late (>1 year)All (% of total)
  • ABMR, antibody-mediated rejection; BFC403, 403 biopsies for cause; DSA, donor-specific antibody; INT300, 300 new indication biopsies from consenting subjects in six kidney transplant programs from five countries; PVN, polyoma virus nephropathy; TCMR, T cell–mediated rejection.
  • 1The local assessment of ABMR and suspicious for ABMR was interpreted from the pathology reports, and included such terms as probable, suspicious for and rule out.
  • 2One biopsy classified as ABMR with C4d not done was considered as C4d-negative ABMR.
  • 3Others (n = 8) included “acute tubular necrosis” (n = 1), C4d staining with no pathology (n = 1), histological diagnosis not defined (n = 1), pyelonephritis (n = 1) and diabetic nephropathy (n = 4).
ABMR-related
C4d-positive ABMR21113 (4%)31417 (4%)
C4d-negative ABMR126227 (9%)34548 (12%)
C4d-positive mixed rejection000 (0%)099 (2%)
C4d-negative mixed rejection156 (2%)21113 (3%)
Total ABMR44246 (15%)88187 (22%)
Transplant glomerulopathy21820 (7%)134 (1%)
TCMR26632 (11%)27835 (9%)
Borderline242246 (15%)291342 (10%)
PVN9413 (4%)10212 (3%)
Glomerulonephritis33740 (13%)63541 (10%)
Acute kidney injury14014 (5%)50050 (12%)
Atrophy/fibrosis83038 (13%)63440 (10%)
Others2683 (3%)11516 (4%)
No major abnormalities241943 (14%)344276 (19%)
Total116184300 (100%)182221403 (100%)

Thirteen of 46 ABMR biopsies (28%) in INT300 were C4d-positive, similar to the reference set (30%).

Assigning the ABMR scores

The ABMR scores in INT300 (Figure 1) were assigned by the classifier generated in the reference set and were divided into high or low using the cutoff of 0.2 from the reference set (see Supplementary Methods for details). As in the reference set, positive ABMR scores were mainly in late biopsies (54/184), compared to 9/116 in early biopsies.

Figure 1.

Relationship between the ABMR score and the ATAGC Reference Standard classification diagnoses in INT300. Horizontal ordering within each diagnosis is random. The horizontal line shows the arbitrary threshold of 0.2 for defining high versus low ABMR scores. The different symbols represent time posttransplantation: early (<1 year: empty triangles) and late (>1 year: solid triangles). ATAGC, Alberta Transplant Applied Genomics Center; INT300, 300 new indication biopsies from consenting subjects in six kidney transplant programs from five countries; ABMR, antibody-mediated rejection; TG, transplant glomerulopathy; M, mixed; TCMR, T cell–mediated rejection; Bord., borderline; PVN, polyoma virus nephropathy; GN, glomerulonephritis; AKI, acute kidney injury; NOMOA, no major abnormalities; Oth., others; Neph., nephrectomies.

Correlation of the ABMR scores with histologic lesions

Like the reference set, the ABMR score >0.2 in INT300 correlated with ABMR lesions: double contours (cg), peritubular capillary inflammation (ptc) and glomerulitis (g) (Table 3), as well as with atrophy/fibrosis (IFTA) and arteriolar hyalinosis. (Atrophy and fibrosis are strongly associated with ABMR and are accepted by the Banff consensus as diagnostic features of ABMR [6].) However, many biopsies with only IFTA had low ABMR scores (Figure 1). There was no correlation with TCMR lesions (tubulitis and interstitial inflammation) or intimal arteritis.

Table 3. Gamma-rank correlations between ABMR scores1 and histologic lesions in the 300 INTERCOM biopsies and in the reference set
Histologic lesionsCorrelation
INTERCOM (INT300)Reference set [14]
  • ABMR, antibody-mediated rejection; INT300, 300 new indication biopsies from consenting subjects in six kidney transplant programs from five countries; TCMR, T cell–mediated rejection.
  • 1For purposes of analysis, the ABMR scores were split into high (>0.2) vs. low (≤0.2).
  • *p < 0.05.
  • **p < 0.01.
  • ***p < 0.001.
ABMR-related lesionsPeritubular capillaritis (ptc)0.62***0.81***
Glomerulitis (g)0.56**0.77***
Glomerular double contours (cg)0.74***0.80***
TCMR-related lesionsInterstitial inflammation (i)0.150.21*
Tubulitis (t)−0.020.15
TCMR/ABMR-related lesionIntimal arteritis (v)−0.240.29
Atrophy-scarring-related lesionsInterstitial fibrosis (ci)0.57***0.50***
Tubular atrophy (ct)0.51***0.52***
Arterial fibrous intimal thickening (cv)0.160.23**
Arteriolar hyalinosis (ah)0.29*0.36***
Time posttransplant: early (<1 year) vs. late (>1 year)0.39***0.34***

The relationship of the ABMR score to the local diagnosis/suspicion of ABMR and with DSA

The mean ABMR score was higher in biopsies with a local diagnosis/suspicion of ABMR, compared to other biopsies, whether the biopsy was C4d-positive or C4d-negative (Table 4). The ABMR score was also higher in biopsies from patients who had DSA at the time of biopsy, compared to those with no HLA antibody or with HLA antibody that was not donor specific (Table 4).

Table 4. ABMR scores grouped by conventional ABMR1 assessment, C4d staining and HLA antibody status in INTERCOM
Biopsy and HLA antibody assessmentnABMR score
Mean ± SDp-Value2
  • ABMR, antibody-mediated rejection; DSA, donor-specific antibody; NDSA, non-DSA; PRA, panel reactive antibody; SD, standard deviation.
  • One biopsy, classified as ABMR with C4d not done, was used only for analysis when comparing the “all ABMR” group.
  • 1ABMR here includes mixed rejection.
  • 2Wilcoxon test, uncorrected p-values.
  • 3Thirteen biopsies with C4d staining not available.
  • 4Patients with no HLA antibody assessment (n = 44) were excluded from this analysis.
Local diagnosis or suspicion of ABMR based on histology, C4d3 and DSA
1. All ABMR460.31 ± 0.20ajt12465-gra-0001ajt12465-gra-0002ajt12465-gra-0003
C4d+ ABMR130.39 ± 0.23
C4d− ABMR320.28 ± 0.18ajt12465-gra-0004
2. All non-ABMR biopsies2540.09 ± 0.11
C4d+ non-ABMR biopsies60.16 ± 0.11ajt12465-gra-0005
C4d− non-ABMR biopsies2360.09 ± 0.20
HLA antibody status of the patient at the time of biopsy4 (n = 256 biopsies with HLA antibody testing available)
1. DSA+750.21 ± 0.19ajt12465-gra-0006ajt12465-gra-0007
2. NDSA810.10 ± 0.11
3. PRA−1000.09 ± 0.10

Positive ABMR scores were strongly associated with the local diagnosis/suspicion of ABMR (Table 5). The ABMR score was also high in 6/20 diagnosed as transplant glomerulopathy. Most biopsies with other diagnoses such as TCMR did not have high ABMR scores.

Table 5. Agreement of the ABMR score >0.2 with local assessment of ABMR in INT300, and accuracy statistics for INT300 and BFC403
ABMR scoreHistologic assessment in INT300Total
ABMRNon-ABMR
ABMR (C4d+ and C4d−)MixedTotalTransplant glomerulopathyTCMRNo ABMR, TCMR or transplant glomerulopathyTotal
>0.22933263223163
≤0.2113141429180223237
Total406462032202254300
 Prediction of the local diagnosis of ABMR or Mixed Rejection by the ABMR score >0.21
nAccuracySensitivitySpecificityPPVNPVAUC
  • AUC, area under the receiver operating characteristic curve; ABMR, antibody-mediated rejection; BFC403, 403 biopsies for cause; INT300, 300 new indication biopsies from consenting subjects in six kidney transplant programs from five countries; NPV, negative predictive value; PPV, positive predictive value; TCMR, T cell–mediated rejection.
  • Chi-squared test comparing all ABMR-related vs. non-ABMR: p-value < 0.0001.
  • Accuracy, sensitivity, specificity, PPV and NPV based on a cutoff ABMR score of 0.2.
  • 1For these analyses, transplant glomerulopathy is regarded as non-ABMR.
INT30030085698750940.85
BFC40340385679064910.88

The agreement between the ABMR score and the conventional assessment was similar to that in the reference set, with accuracy of 85% in both. The area under the receiver operating characteristic curve (AUC) of the ABMR score for the conventional assessment was 0.85 in INT300 and 0.88 in the reference set (Table 5 and Figure S1). While these calculations were based on transplant glomerulopathy being regarded as non-ABMR to reflect the local assessment, they were not significantly changed when we excluded transplant glomerulopathy or considered it to be ABMR.

Discrepancy analysis

This pattern of discrepancies was similar to the reference set [14], and will need further experience to resolve given the state of the Reference Standard for conventional diagnosis and the arbitrariness of the 0.2 cutoff for the ABMR score.

Eighteen of 31 ABMR score positive/local assessment negative biopsies were either DSA positive or DSA unknown, and appeared to be true ABMR, including six diagnosed as transplant glomerulopathy. These 18 also had higher scores than the remaining 13 cases, which tended to have scores between 0.2 and 0.3, and may represent false positive ABMR scores at the arbitrary cutoff of 0.2.

The 14 ABMR score negative/local assessment positive biopsies may reflect histology false positives due to TCMR, which can induce ABMR lesions and “false negative” ABMR scores due to inactive ABMR (n = 4) or treatment before biopsy (n = 2).

The association of graft survival with the ABMR score and conventional assessment

We compared the ABMR score >0.2 to the local assessment of ABMR (defined as C4d+ ABMR, C4d− ABMR or mixed rejection) for prediction of 3-year graft survival in patients with late (>1-year posttransplant) biopsies. In univariable Cox regression, both the ABMR score (p = 3 × 10−8) and the histologic diagnosis/suspicion of ABMR (p = 0.01) were associated with death-censored graft loss (32 failures in 166 patients, including 13 failures in 38 patients locally assessed as ABMR, and 24 failures in the 50 patients with positive ABMR scores). In multivariable analyses using both the ABMR score and local assessment of ABMR, only the ABMR score was significantly associated with graft loss in INT300, as was the case in the reference set (Table 6).

Table 6. ABMR score and death-censored graft survival using Cox regression in late biopsies
 Hazard ratio95% Confidence intervalp-Value
  1. ABMR, antibody-mediated rejection; BFC403, 403 biopsies for cause; INT300, 300 new indication biopsies from consenting subjects in six kidney transplant programs from five countries.
INT300 (n = 166 patients with 32 failures)
Univariable analyses
Molecular ABMR score2.83(1.96–4.1)3 × 10−8
Local diagnosis/suspicion of ABMR (including mixed)2.42(1.19–4.91)0.01
Multivariable analysis
Molecular ABMR score2.93(1.97–4.36)1 × 10−7
Local diagnosis/suspicion of ABMR (including mixed)0.84(0.39–1.8)0.66
BFC403 (n = 186 patients with 60 failures)
Univariable analyses
Molecular ABMR score1.61(1.26–2.04)0.0001
Reference Standard diagnosis of ABMR (including mixed)1.61(1.26–2.04)0.03
Multivariable analysis
Molecular ABMR score1.57(1.19–2.08)0.001
Reference Standard diagnosis of ABMR (including mixed)1.09(0.61–1.97)0.77

We evaluated the ability of the ABMR score to add predictive value to the conventional assessment, and vice versa, using the IDI, NRI and MRS methods that assess the ability of one diagnostic system to improve the predictive power of another (see Methods section) [20]. The ABMR score improved the 3-year survival prediction significantly: IDI = 29% improvement, 95% confidence limits 6–50%; NRI = 60% improvement, 95% confidence limits 19–85% and MRS = 29% improvement, 95% confidence limits 0–73%. Adding the local assessment to ABMR score did not add predictive value: improvement of 1%, −21% and 0%, respectively, for IDI, NRI and MRS.

Patients were divided into four categories based on whether they were ABMR score >0.2 positive (S+) or negative (S) and conventional assessment positive (C+) or negative (C). Figure 2 shows a Kaplan–Meier plot for this analysis. When the ABMR score was positive, graft loss was rapid in the first year, regardless of the local assessment (S+C+ and S+C). In comparison, SC+ biopsies had little graft loss in the first year, but were at risk later. The S+C population had significantly lower graft survival than did the SC+ population (p = 0.003 by log-rank test).

Figure 2.

Kaplan–Meier graft survival curves. Based on one random biopsy per patient in the patients receiving late (>1-year posttransplantation) biopsies in INT300. The ABMR score is dichotomized into S+ (>0.2) and S (≤0.2), as is the conventional assessment of ABMR/mixed as C+ and C by the local center. ABMR, antibody-mediated rejection; INT300, 300 new indication biopsies from consenting subjects in six kidney transplant programs from five countries.

Discussion

This prospective study of unselected indication biopsies compared central microarray-based assessment of ABMR to local assessment by conventional features—histology, C4d and DSA—in established centers. The goal was to examine the potential of the previously defined ABMR score to circumvent the vexing problems associated with the conventional diagnosis and thus move toward a more reliable gold standard for this major disease. We confirmed the uncertainty in the conventional assessments in these internationally known centers: 41% of reports noting ABMR diagnostic features in the text failed to report ABMR as the diagnosis, often using transplant glomerulopathy as the diagnosis, which is not equivalent to ABMR [21]. In contrast, TCMR diagnostic features almost always led to a diagnosis of TCMR. ABMR in INT300 was mostly late and C4d-negative, similar to the reference set. The ABMR score correlated with histologic ABMR lesions and was higher in biopsies with ABMR diagnosed or suspected locally and in biopsies from DSA-positive patients. However, the ABMR score in C4d-negative ABMR was similar to C4d-positive ABMR, confirming concerns about C4d staining. Thus, microarray-based testing of biopsies assesses the probability that ABMR is operating in an unknown biopsy without knowledge of conventional features. The relationship between the ABMR score and conventional assessment in INT300 was the same as in the reference set (e.g. accuracy 85%). In survival analysis, the ABMR score added predictive power to the conventional assessment but not vice versa. Grafts with positive ABMR scores often progressed to early loss, whereas those with negative scores had few early losses even when assessed as ABMR by conventional criteria. This differential survival in the discrepant groups illustrates how the molecular phenotype adds prognostic insight to the conventional assessment, but further studies will be necessary to interpret the discrepancies, for example, the meaning of ABMR scores below the 0.2 arbitrary cutoff.

The potential value of the combination of the ABMR score (S) with conventional assessment (C) is underscored by the analysis shown in Figure 2, in which S+C+ and S+C kidneys both progress rapidly but the SC+ kidneys nevertheless are at risk for late progression to failure after 1 year. We are currently re-examining these groups, in the belief that the ABMR score cutoff of 0.2 is in fact underestimating the extent of smoldering ABMR in the SC+ group. It now seems likely that this group progresses very slowly because the activation of its microcirculation is much less than in the S+ groups as currently defined, but that there is significant molecular activity in the range below 0.2 and true risk may extend as low as 0.05. Thus, some biopsies in the SC+ group will actually be S+C+ when the cutoff is lowered. The result illustrates how the comparison of molecular scores with conventional assessment stimulates a new generation of research directions. However, the diagnostic and prognostic applications of the ABMR score should be evaluated separately.

The present study illustrates how to address a problem faced by many types of disease studies, namely the difficulty of developing molecular diagnostic tests against a flawed gold standard [13, 22]. This is particularly acute when the standard of care is based on locally managed tests such as histology and C4d staining, where reproducibility and inter-observer and inter-center agreement are limited. This problem is not solved by imposition of consensus rules with unknown accuracy and impact on actual practice in established centers. One must avoid Reference Standard–related bias that is insisting on close agreement between the new test and the flawed conventional assessment. Nevertheless, the correlations between the ABMR score and the reference points such as lesion grades, DSA and survival is reassuring, and the agreement between the findings in INT300 and the reference set (accuracy 85%), including the similar pattern of discrepancies, confirms that the ABMR score and conventional assessments are fundamentally related to the underlying ABMR disease process despite their differences and their limitations.

The local conventional assessment was selected as the comparator because it determines therapy and has been the universal end point for all clinical trials in kidney transplantation, for example, the belatacept [23, 24] and Symphony studies [25]. Central histologic reading has never been used to replace the assessment in the local center because it is just another opinion using the same flawed criteria, with no evidence that it is superior to local assessment in experienced centers. Moreover, the question of the relationship of the ABMR score to central assessment has been answered in the BFC403 analysis [14].

The understanding of ABMR as a disease in complicated by its inherent heterogeneity and dynamic range and the understanding of this disease will evolve as the ABMR score and conventional phenotyping systems are used in parallel. The scoring of histologic features and their organization into new rules must be based on actual biopsy data and outcomes: Arbitrary opinion-based consensus solves little, and must always be evaluated in prospective trials of unselected indication biopsies. For example, we derived criteria for C4d-negative ABMR as the ATAGC Reference Standard (http://atagc.med.ualberta.ca/), but we have also evaluated alternative histology scoring systems and found no clear winner. The cutoffs used for features such as DSA and histologic lesions scores and C4d staining are currently empirical, and their validity and reproducibility and use in transplant centers are unknown. As shown by the ABMR score, arbitrary cutoffs always raise tradeoffs between specificity and sensitivity. The potential modifying effect of therapies must also be established. It is unlikely that one disease definition will incorporate all phenotypes, active and inactive, acute or chronic, due to anti HLA Class I and Class II, antibody against other alloantigens (e.g. MICA), and potentially anti AT1 and other autoantibodies producing similar phenotypes [26, 27]. The key will then not be to propose a new consensus definition, but to maintain prospective studies of alternative diagnostic definitions and disease classifications that reflect dynamic view that can lead ultimately to evidence-based therapy.

By establishing the operational feasibility of central molecular testing of biopsies to augment local conventional assessments and establish prognosis, INTERCOM moves transplantation in the direction emerging in oncology, where tests such as OncotypeDX [28] and the 70 gene signature [29, 30] are emerging. The ABMR score will also be useful in validating alternative systems for conventional assessment and will facilitate new multicenter trials of prevention or treatment, for example, by identifying patients at risk for rapid progression to failure. Indeed, the relationship of the ABMR score to the utility of therapy should now be a priority. The ultimate value of the ABMR score and other precision diagnostics [31] will lie in directing therapy to improve outcomes. The cost of microarrays is rapidly falling, and the value of the microarray approach will be augmented because other tests will be performed in parallel on the same microarray, for example, the TCMR score [32], acute kidney injury score [33] and risk score [34]. Because the genes used are not kidney specific, they may be useful starting points for looking at the molecular phenotype of ABMR in heart and lung transplants, where the gold standard is even more problematic. The present study illustrates a pattern for designing studies in these applications: develop a reference set with complete conventional characterization and clinical outcomes; establish a new Reference Standard conventional classification on a reference set of biopsies; use the Reference Standard labels to train the molecular test; and validate the new test against the conventional assessment in a new biopsy set.

The ABMR score will be useful in refining conventional phenotyping but it will probably find its main application in augmenting the existing tests. One of the next priorities will be to determine how the histologic lesions, DSA and ABMR score can be integrated to create a more complete and clinically relevant understanding of each case. In parallel, the limitations of the molecular phenotyping should be acknowledged and addressed, including cost, complexity and sources of variance in the classifier outputs.

Acknowledgments

We are grateful to Dr. M. Mengel for reviewing the biopsy reports from the centers to help categorize them. We would like to thank Richard Ugarte at the University of Maryland for help with organizing samples and clinical data. This research has been supported by funding and/or resources from Novartis Pharma AG, Canada Foundation for Innovation and in the past by Genome Canada, the University of Alberta Hospital Foundation, Roche Molecular Systems, Hoffmann-La Roche Canada Ltd., the Alberta Ministry of Advanced Education and Technology and the Roche Organ Transplant Research Foundation and Astellas. Dr. Halloran held a Canada Research Chair in Transplant Immunology until 2008 and currently holds the Muttart Chair in Clinical Immunology.

Disclosure

The authors of this manuscript have conflicts of interest to disclose as described by the American Journal of Transplantation: P. F. Halloran holds shares in Transcriptome Sciences, Inc., a company with an interest in molecular diagnostics. The other authors have no conflicts of interest to disclose.

Ancillary