SEARCH

SEARCH BY CITATION

Keywords:

  • Allograft rejection;
  • Banff schema;
  • microarrays;
  • prediction

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information

The transcriptome has considerable potential for improving biopsy diagnoses. However, to realize this potential the relationship between the molecular phenotype of disease and histopathology must be established. We assessed 186 consecutive clinically indicated kidney transplant biopsies using microarrays, and built a classifier to distinguish rejection from nonrejection using predictive analysis of microarrays (PAM). Most genes selected by PAM were interferon-γ—inducible or cytotoxic T-cell associated, for example, CXCL9, CXCL11, GBP1 and INDO. We then compared the PAM diagnoses to those from histopathology, which are based on the Banff diagnostic criteria. Disagreement occurred in approximately 20% of diagnoses, principally because of idiosyncratic limitations in the histopathology scoring system. The problematic diagnosis of ‘borderline rejection’ was resolved by PAM into two distinct classes, rejection and nonrejection. The diagnostic discrepancies between Banff and PAM in these cases were largely due to the Banff system's requirement for a tubulitis threshold in defining rejection. By examining the discrepancies between gene expression and histopathology, we provide external validation of the main features of the histopathology diagnostic criteria (the Banff consensus system), recommend improvements and outline a pathway for introducing molecular measurements.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information

While histopathology is the basis for assessing needle biopsies in clinical medicine, it is subjective and opinion-based, and therefore an imperfect gold standard. Kidney transplant rejection is diagnosed using histopathologic features defined by an international consensus, the Banff classification (1). This assigns a diagnosis of T-cell-mediated rejection (TCMR) or antibody-mediated rejection (ABMR) based on empirically derived rules and semiquantitatively graded lesions. Histopathology diagnoses correlate with treatment response and graft outcome (2–4), but their accuracy has never been validated because no independent system for assessing rejection exists. Potential areas of concern include: arbitrary lesion grades (0–3); subjectivity in assigning these grades; and variation between biopsy cores. Together, these problems limit reproducibility. Agreement between two pathologists on lesion scoring is 10–50% and on diagnosis 45–70% (5–7), and even intraobserver reproducibility is only approximately 80–85%. The Banff criteria also include questionable rules. For example, tubulitis t1 versus t2 differs by four versus five lymphocytes in one tubular cross section anywhere in the cortex, with a Kappa value between two observers of 0.17 (5), that is, in the range of pure chance, and is thus essentially nonreproducible. This creates diagnostic uncertainty because the t1/t2 interface distinguishes borderline TCMR from TCMR (5,7). TCMR can also be diagnosed on the basis of one inflammatory cell underneath the arterial endothelium (‘isolated v lesions’), but whether this is actually rejection is unknown. These limitations, highlighted in previous analyses (8) and at the 2007 Banff meeting (9), affect treatment and produce inaccurate endpoints for clinical trials. Objective molecular measurements could improve biopsy diagnoses if the relationship between histopathology and molecular phenotype could be established.

In kidney transplant biopsies, many molecules show altered expression due to the allogeneic response or other injury (8,10–13). Sarwal et al. (10) found that transcripts related to inflammation were elevated in some rejecting biopsies but not in others. Flechner et al. (11) compared biopsies from five TCMR and two borderline cases to biopsies lacking rejection and reported rejection-specific gene sets that included immunity- and inflammation-related transcripts. Their classifiers distinguished rejection from normal kidneys, but did not distinguish rejection from other types of renal dysfunction. In clinical practice, a diagnostic system for rejection must be able to distinguish between rejection and other disease processes.

Because TCMR and ABMR have similar molecular profiles (8), we built a gene-based classifier to diagnose rejection, regardless of its type. Due to the imperfect histopathology ‘gold standard’, a molecular classifier should not agree completely with that standard, given the unselected nature of this population that includes many nonrejecting but nonetheless dysfunctional kidneys. By examining the discrepancies between classifier predictions and histopathology, we are able to recommend improvements to the current Banff system, and outline a pathway for introducing molecular measurements.

Materials and Methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information

Patient population

The study was approved by the University of Alberta Health Research Ethics Board (Issue 5299). The initial study population consisted of 194 consecutive biopsies for cause from 143 patients at the University of Alberta (September 2004–March 2007) and the University of Illinois, Chicago (November 2006–February 2007, approval by the University of Illinois Institutional Review Board, Protocol 2006-0544), plus eight control samples from nephrectomy biopsies at the University of Alberta. The controls were used to calculate pathogenesis-based transcript (PBT) scores (8), and were not otherwise used. To reduce potential bias from pseudoreplication in patients with multiple biopsies, the following randomization test was done. Samples were split into two groups: rejecting and nonrejecting (based on criteria specified below). Within each group, a Spearman correlation between all biopsy pairs was calculated. The correlation was measured across the expression levels of all probesets passing the interquartile range (IQR) filter. Any correlation between a biopsy pair from the same patient falling in the upper 5% of the distribution of correlations resulted in the removal of the more recent biopsy. This eliminated three rejecting and five nonrejecting biopsies, leaving 186 samples. The predictive ability of a PBT-based classifier was published previously using the earliest 138 biopsies from this study's University of Alberta population (8), plus five samples not included in this study because of the pseudoreplication concern. The analysis represented in that manuscript was limited to three previously defined transcript sets and did not incorporate a gene-based assessment of the molecular phenotype and molecular classification of rejection.

Biopsy acquisition and histopathology evaluation

Biopsy cores were obtained under ultrasound guidance by spring-loaded needles (ASAP Automatic Biopsy, Microvasive, Watertown, MA). In addition to cores for routine diagnostic histopathology assessment, one 18-gauge biopsy core was collected for gene expression analysis after receiving written consent and placed immediately in RNALater, kept at 4°C for 4–24 h, then stored at −20°C. For diagnostic evaluation, paraffin sections for each biopsy were prepared, stained and graded according to the recently updated Banff criteria by a renal pathologist (9,14,15).

Diagnostic classifications

Histopathologic diagnoses were based on the Banff classification scheme. For patients with histopathologic rejection, we defined a clinical rejection ‘episode’ based on retrospective assessment of functional changes during the clinical course by two nephrologists, independent of the transcript results, based on compatible histopathology with clinically apparent functional changes: decrease in estimated glomerular filtration rate (GFR) ≥25% from baseline (up to 4 months preceding biopsy to include cases with infrequent visits) and/or response to therapy (an increase in estimated GFR ≥25% within 1 month), in the absence of alternative explanations (e.g. obstruction, calcineurin inhibitor toxicity).

Microarray experiments

Following homogenization in 0.5 mL of Trizol reagent (Invitrogen, Carlsbad, CA), total RNA was extracted and purified using the RNeasy Micro Kit (Quiagen, Ont. Canada) (average 4 μg/core). RNA (1–2 μg) was labeled using GeneChip® HT One-Cycle Target Labeling and Control Kit. Quality of labeled cRNA was assessed on an Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA) (RNA integrity number >7) before hybridization to HG_U133_Plus_2.0 GeneChip (Affymetrix Santa Clara, CA). Detailed protocols are available in the Affymetrix Technical Manual (http://www.affymetrix.com).

PBTs

PBTs were derived previously from mouse transplant models and in vitro human cell lines. For each biopsy, PBT scores were calculated as follows: the fold-change of each probeset in the PBT versus the mean value of that probeset in eight controls was determined. The mean of these fold-changes across all probesets in the PBT was then used as that biopsy's PBT score. In all cases, log2 transformed values were used. The definitions and algorithms for the PBTs referenced in the Tables can be found in the following sources: interferon-γ inducible transcripts GRIT1, GRIT2 (16); cytotoxic T-cell-associated transcripts CAT1, CAT2 (12), and QCAT (17); natural killer cell-associated transcripts NKAT (18), macrophage-associated transcripts CMAT (19), AMA (20), B cell-associated transcripts BAT (21), endothelial transcripts ENDAT (unpublished data), injury and repair transcripts IRIT and IRIT5 (22) and GRIT-like transcripts GRIT-L (unpublished data).

Gene filtering

Nonspecific IQR filtering was used to eliminate probesets with low variation across the dataset. A total of 11 691 of the original 54 675 probesets passed this filtering step and were retained for further analysis. The list of differentially expressed genes between rejecting and nonrejecting samples was generated with the Bioconductor package ‘limma’ (23), using the Benjamini and Hochberg correction for false discovery rates.

For the purpose of the classifier analysis, we defined two classes based on the histopathologic diagnosis:

  • 1
    Rejecting ([TCMR, n = 33], [ABMR, n = 15] and [TCMR + ABMR (Mixed), n = 3]). Nrej= 51.
  • 2
    Nonrejecting (BK virus [BK, n = 6], borderline TCMR [n = 27] and ‘other’[n = 102]). Nnon-rej= 135.

As described above, a further class labeling distinguished clinical ‘episodes’ of rejection from nonrejecting:

  • 1
    Rejecting ([TCMR, n = 24], [ABMR, n = 15] and [TCMR + ABMR (Mixed), n = 3]). Nrej= 42.
  • 2
    Nonrejecting. Nnon-rej= 144.

Unless otherwise stated, all results are based on histopathology.

Classifier statistics

Classifier statistics were calculated using a multiple resampling method (24). The 186 samples were split 50:50 into training and test sets of size 93. Stratified sampling (25) was used to maintain equal proportions of rejecting cases in each set. This splitting was done 1000 times, and each of the classifier methods was trained and evaluated using the same set of 1000 splits. All reported classifier statistics are based on averages over all 1000 test sets.

Results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information

Patient demographics

We analyzed 194 consecutive renal transplant biopsies from 143 patients biopsied between one week and 32 years posttransplant. Demographics are shown in Table 1 (more extensive demographics are shown in Table S1). Eight repeated biopsies within single patients were removed to avoid possible pseudoreplication issues (see Materials and Methods). Of the remaining 186 biopsies from 143 patients (16 (8.5%) from the University of Illinois, the others from the University of Alberta), histopathology diagnosed 33 as TCMR and 27 as borderline TCMR. Retrospective review of the clinical course of these 60 biopsies (described in Materials and Methods) confirmed 24 as rejection ‘episodes’ (19 TCMR and five borderline TCMR biopsies). Fifteen biopsies were histologically diagnosed as ABMR (with diffuse peritubular capillary C4d staining and respective histopathology (15)). All ABMR met clinical episode criteria. Three biopsies met criteria for mixed TCMR and ABMR; these also met the criteria for clinical rejection episodes. Thus 51 biopsies had histologic rejection (ABMR, TCMR or mixed ABMR/TCMR), and 27 had borderline TCMR. Six biopsies were found to have BK virus infection, while the remaining 102 were classified as ‘other’ (acute tubular necrosis, calcineurin inhibitor toxicity, fibrosis and atrophy (IFTA), glomerulonephritis, etc.). The most common indication for biopsy was deterioration in function (61%), followed by stable but impaired function (12%) and proteinuria (11%). The average time of the biopsy posttransplantation was 45 ± 63 months (57 ± 51 in biopsies diagnosed as ABMR, 52 ± 69 in borderline TCMR and 27 ± 40 in TCMR).

Table 1.  Demographics, immunosuppression and indication for biopsy
Patient demographics [n = 143 patients]n (%)
Recipient gender (% male)89 (59%)
Race 
 Caucasian85 (59%)
 Black17 (12%)
 Other20 (14%)
 Unknown21 (15%)
Primary disease 
 Diabetic nephropathy23 (16%)
 Hypertension/large vessel disease16 (11%)
 Glomerulonephritis/vasculitis60 (42%)
 Interstitial nephritis/pyelonephritis12 (8%) 
 Polycystic kidney disease18 (13%)
 Others8 (6%)
 Unknown etiology6 (4%)
Previous transplant11 (8%) 
Donor gender (% male)55 (39%)
Donor type (% deceased donor transplants)72 (50%)
Clinical characteristics at time of biopsy [n = 186] 
Maintenance immunosuppressive regimens at biopsy 
 MMF, tacrolimus, steroid70 (37%)
 MMF, tacrolimus13 (7%) 
 MMF, cyclosporine, steroid35 (19%)
 MMF, steroids7 (4%)
 Others57 (31%)
Indication for biopsy 
 Primary nonfunction8 (4%)
 Rapid deterioration of graft function45 (24%)
 Slow deterioration of graft function69 (37%)
 Stable-impaired graft function28 (15%)
 Investigate proteinuria21 (12%)
 Follow-up from previous biopsy9 (5%)
 Others6 (3%)

Gene expression differences between histologically rejecting and nonrejecting kidneys

Class comparison:  Comparison of the 51 biopsies with histologic rejection versus the 135 nonrejecting samples by a 2-tailed Bayesian t-test identified 1693 differentially expressed probesets at a false discovery rate of 0.01: 1543 were higher in histologic rejection, and 150 lower. Table S2 lists the top 100 probesets by p-value, 74 of which were previously annotated in our experimental systems as members of pathogenesis-based transcript sets (PBTs). Of these 74, 69 (93%) were interferon-γ- and rejection-inducible transcripts (GRITs) (16) or cytotoxic T-cell-associated transcripts (CATs) (12). By comparison, only 40 of the top 100 probesets were annotated by membership in KEGG pathways. The most frequently seen pathways were cytokine–cytokine receptor interaction, toll-like receptor signaling pathway, antigen processing and presentation, and natural killer cell-mediated cytotoxicity.

Class prediction:  We used predictive analysis of microarrays (PAM) (26) to predict the molecular rejection/nonrejection status of the 186 biopsy samples using a multiple resampling validation method (24). The classifier was trained using the histopathology defined classes, comparing the 51 biopsies with Banff rejection (ABMR, TCMR and mixed ABMR/TCMR) to the remaining 135 nonrejecting biopsies (borderline TCMR, BK virus and ‘other’) using 1000 training set/test set splits. The median number of probesets selected by PAM in the 1000 training sets was 24. Table 2 lists the top genes used by PAM to identify the rejection phenotype (PAM rejection), ordered by the proportion of times each gene was incorporated into the 1000 resampled classifier gene sets. As with the class comparison, most of the top genes had previously been annotated in experimental systems as PBTs, primarily as interferon-γ-inducible transcripts. There was considerable overlap between the genes used most frequently by the classifier and the top genes identified by class comparison.

Table 2.  Genes most frequently included in the PAM classifier
Affymetrix hgu133plus2 probeset IDGENE1PBT2Proportion sampled3Rejection/nonrejection expression ratio
  1. 1For genes with multiple probesets, only the one sampled most frequently is shown.

  2. 2PBTs are identified in the Methods section.

  3. 3Proportion sampled = the proportion of times the probeset was chosen in the 1000 resampled PAM classifier gene sets.

210163_atCXCL11GRIT20.9995.6
202270_atGBP1CAT20.9913.1
203915_atCXCL9GRIT10.99 4.7
210029_atINDOGRIT20.9733.2
204533_atCXCL10GRIT20.9723.8
229390_atFAM26FCAT10.9523.5
238581_atGBP5CAT20.9452.7
235229_at  0.9282.9
242907_atGBP2GRIT1 CAT20.8022.7
205890_s_atUBDGRIT10.76 2.8
226474_atNLRC5 0.6632.1
235175_atGBP4CAT10.6182.1
205269_atLCP2CAT1 CAT20.5812.6
238725_at  0.4362.1
200629_atWARSGRIT20.3981.9
204279_atPSMB9GRIT10.3762.1
229937_x_atLILRB1 0.3742.3
205488_atGZMACAT1 CAT2 NKAT QCAT0.37 2.4
204655_atCCL5GRIT2 CAT1 CAT20.3592.5
202307_s_atTAP1CAT1 CAT20.3442.0
204205_atAPOBEC3GCAT20.3282.1
204103_atCCL4CAT20.3142.2
206914_atCRTAMCAT20.3011.9
210164_atGZMBCAT1 CAT2 NKAT QCAT0.2962.1
214567_s_atXCL2 0.2852.1
229560_atTLR8CMAT0.2772.4
212588_atPTPRC 0.2572.8
222838_atSLAMF7CAT2 GRIT20.2572.6
206366_x_atXCL1CAT2 NKAT0.25 2.0
205758_atCD8ACAT1 CAT2 NKAT QCAT0.2492.1

Comparing the molecular classification with the histopathology phenotype

Classification by PAM was generally in agreement with the histologic classification (Figure 1). Of the 51 biopsies with Banff rejection, 35 (13 ABMR, 20 TCMR and 2 mixed ABMR/TCMR) were classified as rejection by PAM using 0.5 as the rejection probability threshold (sensitivity = 0.69). Of 135 biopsies with Banff nonrejection, 108 were called nonrejection by PAM (specificity = 0.80). In 43 biopsies, histopathology and PAM did not agree, 16 because of Banff rejection PAM nonrejection (BanffR PAMNR), and 27 because of Banff nonrejection, PAM rejection (BanffNR PAMR) (overall accuracy = 0.77). Of the 19 TCMR cases later retrospectively identified as rejection ‘episodes’ based on the clinical course, 14 (74%) were called rejection by PAM. We examined these discrepancies in detail.

image

Figure 1. PAM rejection probabilities. Each symbol represents the predicted rejection probability for each sample. Each prediction is the average value over the ∼500 times each sample was in one of the 1000 test sets. Samples are grouped horizontally according to their histopathology diagnoses. Triangles represent patients later diagnosed as having a clinical rejection episode. ‘X’'s received steroid treatment within the week prior to the biopsy. The lone symbol inside a square received ATG treatment in the 6 weeks prior to the biopsy. Probabilities over 0.5 are predicted to be rejecting.

Download figure to PowerPoint

BanffR PAMNR biopsies: Of 16 Banff-rejection, PAM nonrejection biopsies, four had been treated with steroids less than 1 week prior to biopsy. In addition, one had been given antithymocyte globulin (ATG) treatment 6 weeks prior to the biopsy, resulting in lymphopenia. Thus treatment before the biopsy appeared to suppress the molecular features despite persistence of the histology features. Two additional biopsies represented a state previously defined as ‘isolated v lesion’ (8,9). Two other biopsies were very close to the 0.5 rejection cutoff definition of the molecular classifier.

BanffNR PAMR biopsies: We observed 27 BanffNR PAMR biopsies (3 BK cases, 12 borderline TCMR and 12 others). Overall assessment of the biopsies classified as borderline TCMR (n = 27) showed that the biopsies in this category were not homogeneous but were divided by PAM into two distinct molecular phenotypes: 44% were classified as rejection and 56% as nonrejection. All five biopsies called borderline TCMR by histopathology but later identified as rejection episodes by clinical review, were called rejection by PAM. Within the Banff borderline category, biopsies that PAM called rejection had higher degrees of inflammation—interstitial infiltrate (i-score) and glomerulitis (g-score)—than did the biopsies PAM called nonrejection (Figure 2A). Such biopsies were called nonrejection by Banff, in part, because they had more tubular atrophy and interstitial fibrosis (IFTA) (higher ci and ct scores). This precludes a histopathologic diagnosis of rejection since, by the current Banff rules, tubulitis cannot be scored in atrophic tubules (see Discussion). The same pattern was seen in the ‘other’ category (Figure 2B): PAM rejection biopsies had more inflammation than PAM nonrejection (i-, t- and g-scores), but also had higher IFTA. Thus many of the BanffNR PAMR cases reflect flaws in histopathology: in biopsies called nonrejection by Banff criteria, PAM revealed a continuum of rejection probabilities, and identified biopsies with significant abnormalities that were missed by the current histopathology rules. We applied to these biopsies the recently suggested but as yet provisional Banff total i-score, which takes all cortical inflammation into account, including that in areas of IFTA. The BanffNR PAMR samples had significantly more total-i than did the BanffNR PAMNR samples (42.3% vs.14.3%, p = 6 × 10−6).

image

Figure 2. Average Banff scores in high- versus low-probability PAM rejection calls. Samples with PAM predictions over 0.5 are considered high probability. The borderline TCMR category contained 12 high and 15 low PAM probability cases, while the ‘other’ category contained 12 high and 86 low PAM probability cases. Banff lesions: t = tubulitis; i = interstitial inflammation; g = glomerulitis; ct = tubular atrophy; ci = interstitial fibrosis; cg = glomerulopathy.

Download figure to PowerPoint

Robustness of molecular classification by PAM

Effect of repeat biopsies on classifier probabilities:  To see if inclusion of repeat biopsies affected the results, we repeated the PAM analysis excluding all repeat biopsies, that is, using only the first biopsy from all 143 patients. The Spearman correlation between the PAM rejection probabilities based only on this subsample, compared with the probabilities from the same 143 biopsies from the full 186 sample analysis, was 0.993. None of the 143 samples changed their PAM rejection diagnosis by crossing the 0.5 rejection probability threshold.

Robustness of PAM classification to changes in the gold standard:  Retrospective chart review permitted us to assign a clinical diagnosis of rejection to 42 biopsies. The clinical episode and histopathology rejection diagnoses differed from each other by a total of 19 rejection calls: 5 histopathology borderline TCMR cases (labeled nonrejecting for the purpose of building the classifier) were classified as clinical TCMR, and 14 Banff TCMRs (labeled rejecting for the classifier) were classified as clinical nonrejection. To see what effect reclassification of cases by clinical criteria would have on PAM predictions, we repeated the PAM analysis using clinical episodes as the gold standard for rejection. Although 18 samples switched their diagnostic labels, the PAM-derived probabilities of rejection were very similar to those derived from the histopathology gold standard (Spearman r = 0.996), with the average absolute difference between the probabilities predicted by the two methods being only 0.032. Only seven samples out of 186 changed their PAM diagnosis by crossing the 0.5 rejection probability threshold. In each of these seven pairs, both probabilities were in the 0.43 to 0.56 range.

Ranking individual genes for their predictive ability

The predictive value of a test can be assessed by accuracy, sensitivity or specificity, depending on the costs associated with false negatives versus false positives. We compared the best individual genes when using single gene threshold classifiers using three criteria: (a) accuracy, (b) specificity at a sensitivity of 0.7 and (c) sensitivity at a specificity of 0.8 (Figure 3). A separate classifier was built for each criterion, using the same 1000 dataset splits used in the PAM analysis. For each gene in each split, the gene expression threshold that provided the best training set predictions was used to predict the rejection status of each sample in the corresponding test set. The genes were then ranked by their average predictive ability over the 1000 test sets. Despite different selection criteria, there was considerable overlap in the top genes, which largely represented interferon-γ-induced and cytotoxic T-cell-associated transcripts. The ranking order changed slightly between the three tests, but the genes identified as the top 10 by one test could always be found among the top 50 based on the others, indicating the robustness of the genes and the overall similarity between assessment methods. The top genes all performed similarly and resembled the predictions obtained by PAM, indicating that each is detecting the same stereotyped disturbance. Given the known error in the histopathology diagnoses, it is not possible to determine which of the classifiers is objectively ‘best’.

image

Figure 3. Ranking of top genes for classifying rejection. (A) Genes ranked by classification accuracy. (B) Genes ranked by specificity, based on training set thresholds determined by the cutoff achieving 70% sensitivity. (C) Genes ranked by sensitivity, based on training set thresholds determined by the cutoff achieving 80% specificity. We labeled the same 22 genes in each panel: The top 10 genes in any of the panels are labeled in all three panels, along with CXCL9 and CXCL10 (based on their high PAM ranking), and GRZA, GRZB and PRF1 (based on their literature described association with rejection). For genes with multiple probesets, only the best probeset is shown. ‘NA’= Affymetrix hgu133plus2 probeset 235229_at, which does not correspond to any known gene.

Download figure to PowerPoint

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information

In this large, unselected sequential series of human kidney allograft biopsies for cause, we predicted rejection diagnoses using PAM, and compared these diagnoses with those assigned by histopathology. The unselected nature of the data is critical: ours is the first study that includes the full spectrum of kidney transplant disease states encountered in the clinic. Thus, we avoid the overoptimistic accuracies, sensitivities and specificities reported when comparing rejecting biopsies with biopsies from normal, healthy patients. We found that single-gene threshold classifiers performed similar to PAM. Thus, the molecular phenotype of the biopsies was robust, whether using multi- or single-gene-based classifiers. Moreover, the probability of rejection assigned by the molecular phenotype agreed strongly with the diagnostic lesions defined in the Banff consensus classification for example, interstitial infiltrate, glomerulitis, tubulitis, and IFTA. In diagnosing rejection, molecular predictions and histopathology were mostly in agreement, but showed discrepancies in approximately 20% of biopsies. Inadequacies in the current histopathology classification, and prebiopsy steroid treatment, explain most of these discrepancies. Predictive accuracies were similar for many of the top molecules, indicating that many individual molecules could be used interchangeably to classify rejection. Molecular classification indicates some areas where the current criteria used by histopathology should be reexamined, namely the reliance on tubulitis, particularly in biopsies with scarring, and the ‘isolated v’ definition of TCMR in biopsies showing minimal intimal arteritis without concomitant interstitial infiltrate.

The molecular phenotype of allograft biopsies has particularly strong negative predictive value: rejection can be excluded with considerable certainty if the transcriptome lacks the inflammatory disturbance that is the large-scale stereotyped change in gene expression. Several TCMR samples classified as nonrejection by PAM had received high-dose steroid treatment before the biopsy and displayed less molecular disturbance than expected, suggesting that molecular features are suppressed by treatment more quickly than are histopathology lesions. These treated cases are therefore not ‘false negatives’ by PAM: treatment rapidly reverses typical TCMR episodes, and thus the molecular features probably reflect the true state of the tissue more reliably than the histopathologic lesions, which can linger despite successful treatment.

Because many of the cases where histopathology and PAM disagree reflect problems in the current histopathology scoring system, the potential exists for improving the agreement by correcting the problematic areas within the Banff classification system. An example of this is seen in biopsies with scarring: PAM diagnosed rejection but the histologic criteria could not. The current Banff rules for diagnosing TCMR forbid assessment of tubulitis and interstitial inflammation in scarred areas (IFTA). Since a diagnosis of TCMR requires a minimal degree of tubulitis (t-score >1) in addition to interstitial infiltration (i-score >1), it is difficult to diagnose TCMR in cases with advanced scarring. Scarring explains why many cases with molecular rejection were classified by Banff as borderline TCMR or ‘other’: the BanffNR PAMR cases had more IFTA and total inflammation (interstitial infiltrate) than did those that were BanffNR PAMNR. They could not meet the t1 versus t2 threshold because too few nonatrophic tubules were available to score tubulitis. Today, under potent immunosuppression, the majority of allografts function longer and experience later biopsies due to slow functional deterioration, showing advanced IFTA as the histological picture. Since the current Banff rules preclude the diagnosis of rejection in these late biopsies, no therapeutic intervention is undertaken.

The molecular phenotype separates borderline TCMR into two distinct subgroups, rejection and nonrejection, which will improve management of these troublesome biopsies. This agrees with previous literature showing that some borderline TCMR cases have improved function after antirejection treatment or progress to TCMR if untreated (27,28), whereas others do not behave like rejection (27,28) and resolve spontaneously (29). The heterogeneity of borderline TCMR cases revealed by the molecular assessment indicates that some of these biopsies are indeed true TCMR, while others are nonrejecting, a finding that will improve clinical management of these otherwise ambiguously diagnosed patients. Thus, the PAM-derived molecular phenotype serves as an independent standard to guide restructuring of the histology grading system, particularly since it reinforces previously documented weak points.

The genes that best separate rejection from nonrejection are those induced by interferon-γ. TCMR and ABMR show extensive sharing of their molecular features (8) despite their differences in histopathology, because both produce inflammatory compartments that trap T cells, and both trigger IFNG production. Antibody recruits NK cells by virtue of Fc and complement receptors, and creates an inflammatory compartment that may trap effector memory T cells. Both effector memory T cells and NK cells express many CATs and can release IFNG. In addition, the tissue injury induced by either ABMR or TCMR can attract inflammatory cells. Many different individual genes or gene sets can be used to measure this disturbance, and these are able to classify the cases with similar accuracies. It is therefore to be expected that studies focusing on a single gene will find it to be strongly associated with rejection, as long as that gene is one of the several hundred sharing the stereotyped inflammatory response (e.g. GZMB, PRF1, etc.). Different classifier algorithms, training sets and class definitions will change the order of the ‘best’ genes, but the top gene lists will have significant overlaps whichever test method is used.

The overwhelming strength of the inflammatory signal explains why different definitions of the gold standard gave almost identical PAM rejection probabilities. A classifier bases its predictions not only on the training set class definitions, but also on underlying structure present in the data. If there is strong underlying structure and the class definitions are at least highly correlated with the ‘truth’, the predictions will not be unduly influenced by errors in the gold standard. However, they necessarily have to disagree with the gold standard due to the significant error present in that standard, that is, histopathology. This is to be expected: the histopathology system was formulated from subjective, empirically derived expert opinions, without the benefit of independent biological and mechanistic validation (1). The histology consensus created criteria that correlated with outcomes, but these arbitrary decisions should be revisited now that the transcriptome presents an independent reference point. Thus, the implications of the present findings will be first a revised Banff histopathology classification that more accurately predicts the molecular state, followed by a new system combining molecules and histopathology. This general principle should be a prototype for examining the relationship between molecular phenotypes and histopathology, and ultimately for defining the mechanistic basis for disease.

Acknowledgments

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information

The authors wish to thank Dr. Zija Jacaj for help with collection of the clinical data and Vido Ramassar, Anna Hutton, Kara Allanach, and Stacey Lacoste for technical support.

This research has been supported by funding and/or resources from Genome Canada, Genome Alberta, the University of Alberta, the University of Alberta Hospital Foundation, Alberta Advanced Education and Technology, Roche Molecular Systems, Hoffmann-La Roche Canada Ltd., the Alberta Ministry of Advanced Education and Technology, the Roche Organ Transplant Research Foundation, the Kidney Foundation of Canada, and Astellas Canada. Dr. Halloran also holds a Canada Research Chair in Transplant Immunology and the Muttart Chair in Clinical Immunology.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgments
  8. References
  9. Supporting Information

Table S1: Demographics of the study population.

Table S2: Probesets differentially expressed between rejecting and nonrejecting samples.

Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

FilenameFormatSizeDescription
AJT_2694_sm_TableS1.doc141KSupporting info item
AJT_2694_sm_TableS2.doc165KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.