Solving the dilemma of the immunohistochemical and other methods used for scoring estrogen receptor and progesterone receptor in patients with invasive breast carcinoma

Authors

  • Edwin R. Fisher M.D.,

    Corresponding author
    1. National Surgical Adjuvant Breast and Bowel Project Pathology Center and Allegheny General Hospital, West Penn Allegheny Health System, Pittsburgh, Pennsylvania
    • Allegheny General Hospital, Cancer Center, 5th Floor, 320 East North Avenue, Pittsburgh, PA 15212
    Search for more papers by this author
    • Fax: (412) 359-8685

  • Stewart Anderson Ph.D.,

    1. Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, Pennsylvania
    Search for more papers by this author
  • Scott Dean M.B.A.,

    1. Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, Pittsburgh, Pennsylvania
    Search for more papers by this author
  • David Dabbs M.D.,

    1. Department of Pathology, Magee-Womens Hospital, Pittsburgh, Pennsylvania
    Search for more papers by this author
  • Bernard Fisher M.D.,

    1. National Surgical Adjuvant Breast and Bowel Project Operations Center and Allegheny General Hospital, West Penn Allegheny Health System, Pittsburgh, Pennsylvania
    Search for more papers by this author
  • Richard Siderits M.D.,

    1. Department of Pathology, Robert Wood Johnson University Hospital, Hamilton, New Jersey
    Search for more papers by this author
  • Jeffrey Pritchard D.O.,

    1. National Surgical Adjuvant Breast and Bowel Project Pathology Center and Allegheny General Hospital, West Penn Allegheny Health System, Pittsburgh, Pennsylvania
    Search for more papers by this author
  • Telma Pereira M.D.,

    1. National Surgical Adjuvant Breast and Bowel Project Pathology Center and Allegheny General Hospital, West Penn Allegheny Health System, Pittsburgh, Pennsylvania
    Search for more papers by this author
  • Charles Geyer M.D.,

    1. National Surgical Adjuvant Breast and Bowel Project Operations Center and Allegheny General Hospital, West Penn Allegheny Health System, Pittsburgh, Pennsylvania
    Search for more papers by this author
  • Norman Wolmark M.D.

    1. National Surgical Adjuvant Breast and Bowel Project Operations Center and Allegheny General Hospital, West Penn Allegheny Health System, Pittsburgh, Pennsylvania
    Search for more papers by this author

Abstract

BACKGROUND

The authors attempted to resolve the dilemma posed by the lack of unanimity concerning the optimal immunohistochemical (IHC) method for determining and scoring estrogen receptor (ER) and progesterone receptor (PR).

METHODS

Sections for IHC were prepared from paraffin embedded tumor samples from 402 patients with lymph node positive breast carcinoma who had biochemical receptor values (obtained with the dextran-coated charcoal [DCC] method) and who were enrolled in a prospective, randomized trial (National Surgical Adjuvant Breast and Bowel Project protocol B-09). IHC receptors were scored independently by two observers according to percent, intensity, and any-or-none algorithms. Results from these evaluations and from two computer-assisted evaluations, DCC, and common pathologic characteristics were analyzed for optimum splits for positive reactions in univariate and multivariate analyses using a tree-structured model. Concordance, sensitivity, and specificity were determined between the DCC method and all other methods.

RESULTS

Interobserver agreement and concordance between the DCC method and the other methods and among the methods were high. Univariate analyses revealed that a positive ER score obtained with all methods was related significantly to overall survival (OS) at 5 years and at 10 years. Results related to PR scores and disease-free survival and recurrence-free survival were less consistent. In multivariate analysis, it also was found that all methods for scoring ER predicted a better prognosis for OS in patients with an unfavorable lymph node status at 5 years and 10 years. Patients in a favorable lymph node status group were discriminated further by nuclear grade.

CONCLUSIONS

All IHC methods for scoring ER appeared valid as prognostic indicators of OS in patients with positive lymph nodes. The any-or-none IHC method, by virtue of its simplicity, represents an appropriate choice for practical use. Cancer 2005. © 2004 American Cancer Society.

Most of the information relating to prognosis and tumor response after the administration of tamoxifen to patients with invasive breast carcinoma has been derived from quantitative estimates of estrogen receptor (ER) and progesterone receptor (PR) by using the ligand-binding, dextran-coated charcoal (DCC) technique. However, during the past decade, the latter largely has been supplanted by microscopic immunohistochemical (IHC) methods with or without computer assistance for the detection and quantification of these receptors. The IHC methods have been preferred because of their relative simplicity, low cost, speed of performance, application to small samples, precise identification of reactive elements, simple methods of fixation and storage and, finally, their ability to be applied to archival material.

Strong concordance between the visual and computer-assisted IHC and DCC findings has been demonstrated amply. There also have been relatively large numbers of nonrandomized studies that, despite their limitations of relatively small numbers and varieties of treatment, have revealed univariate and, in some instances, multivariate associations between IHC receptors and tumor response to endocrine treatment and prognosis.1–19

Despite these purportedly favorable features of IHC assessments of receptors, it is surprising to recognize that surveys in the United Kingdom that also included 25 other countries,20–22 in Ireland,23 in the United States,24 and in the vast majority of recorded studies relating to IHC have disclosed not only technical differences but also a variety of methods used for scoring and dichotomizing the results as either positive or negative. These latter methods range from an infrequently used, simple “any-or-none” algorithm to more specific percentages of stained nuclei, or their intensity alone, or their sum or quotient. More complex algorithms consist of the quotient or the sum of the percentages of cells encountered for each intensity and a weighted factor of the latter or obtaining the sum of the average intensity and proportion, as noted above. It should be noted that the definitions of the techniques used for these purposes occasionally are vague. Indeed, in many reports, it appears uncertain whether the intensity represents the most intense, the predominant reaction, or its average. It is noteworthy that one group of investigators12 found that the results of ER and PR with all of six possible methods of determining IHC ER and PR, including the “any-or-none” method, were predictive of a tumor response to tamoxifen in a cohort of patients with metastatic carcinoma; with the latter method the most common paradigm used for determining outcome despite the lack of adequate controls. A National Institutes of Health consensus statement25 relating to adjuvant therapy for breast carcinoma in 2000 concluded that patients with tumors that exhibited “any extent” of receptors should be treated, implying the use of an any-or-none measurement. The literature also revealed that the splits between positive and negative IHC receptor status have been almost exclusively arbitrary and variable. We favor the use of the word “split(s)”26 rather than “cut-off” for this purpose, because the plural of the latter lacks a statistical or biologic meaning. To our knowledge, only one group,18 which provided an excellent overview of the IHC problem, has espoused the need to base splits on patient outcome. Both proportion and intensity were used to score ER and PR in most of their studies, but only proportion was used in one of their investigations.19 Other investigators1 based their IHC scores on the proportion and intensity of ER-stained cells in the abstract of their report but failed to mention that fact in the Materials and Methods section of the article. Splits for positive proportion scores were noted as ≥ 10% in 1 report11 and just > 10% in an earlier study6 by the same group.

In the current study, we attempted to resolve some of these issues by relating basic IHC methods (any or none, proportion, and intensity), computer-assisted and DCC estimates for ER and PR, and other pathologic characteristics from protocol B-09 of the National Surgical Adjuvant Breast and Bowel Project (NSABP) to overall survival (OS), disease-free survival (DFS), and recurrence-free survival (RFS). It should be emphasized that the current investigation was not intended to represent a clinical or pathologic update of NSABP B-09 or a measure of response to tamoxifen. B-09 is a prospectively randomized clinical trial that was initiated in 1977 to evaluate the merits of combined 1-phenylalanine and 5-fluorouracil with tamoxifen (PFT) and without tamoxifen (PF) in women with primary invasive breast carcinoma and positive lymph node status (NSABP, Stage 2; TNM, T1,T2,T3, N+, M0).27

MATERIALS AND METHODS

There were 1891 patients who, after providing informed consent, were enrolled in NSABP B-09 and randomized to receive either PF or PFT after undergoing radical or modified radical mastectomy. An additional 806 patients who received PFT were registered but were not randomized after entry was concluded in 1980.

Tumor ER was required for entry into the study. PR assays were added later during the first year of patient accrual. Specimens of the primary tumor were assayed for these receptors by using sucrose density gradient, DCC titrations with Scatchard analysis or DCC with a single saturating dose (both methods are referred to herein as DCC.) Tests for all patients in the study were performed according to an NSABP quality-control program.27 Results were expressed as femtomoles/mg (fm) of cytosol protein, and values ≥ 10 were considered positive.

Microscopic sections that were prepared routinely after formalin fixation of tumor and related tissues from all patients were forwarded to the NSABP pathology headquarters by pathologists at participating institutions for further study. IHC assessments of nuclear ER and PR were performed using available paraffin blocks of primary tumor. The current analysis included 402 patients with follow-up who were eligible for the study and whose pathologic material was considered adequate, including satisfactory internal controls. Of the 402 patients who were included in the analysis, 201 patients were randomized to the PF arm, 188 patients were randomized to the PFT arm, and 13 patients were registered to the PFT arm. The median potential time on study for these patients was 23.2 years.

All pathologic studies, including those related to IHC, were performed in a “blinded” manner without knowledge of any other pathologic or clinical information, including DCC status. The IHC data were generated from independent observations by two of the authors (E.R.F. and D.D.) and was based on the invasive carcinomatous elements only. Differences were resolved by joint examination of the slides with a two-headed microscope, and the final reconciled value was used in all statistical analyses.

The procedures utilized for the IHC demonstration of ER and PR were identical except for the antibodies used. Monoclonal mouse antihuman ER (clone 1D5; Dako Corporation, Carpinteria, CA) was used for ER, and monoclonal mouse antihuman progesterone clone 1A6 (Novocastra-Vector Laboratories Inc., Burlingame, CA) was used for PR. Sections from paraffin blocks that were prepared after routine fixation with formalin and processing were mounted on charged slides, deparafinized, and rehydrated. Antigen retrieval was performed with a microwave procedure utilizing a Citra solution (Bio Genex, San Ramon, CA). Slides were placed in a Bio Genex Optimax Plus automated stainer. Endogenous peroxidase was then blocked with 3% hydrogen peroxide. This was followed by a supersensitive detection system (link and label; Bio Genex). ER antibody was applied at 1:80 dilution, and PR antibody was applied at 1:60 dilution. Reactions were then colorized with diaminobenzidine. An ethyl-green counterstain was applied after removal of the slides from the stainer. They were rinsed, cleaned, and mounted. Each batch of IHC slides was accompanied by a positive and negative control, and reliance on an internal control was mandatory for unreactive sections, as noted earlier.

ER and PR were scored in the IHC sections by the following five procedures: 1) Any proportion of any positive degree of intensity was considered positive (the so-called any-or-none method). 2) The proportion of stained nuclei present was scored subjectively as 0 (absent staining), 1 (1–33% of nuclei stained), 2 (34–66% of nuclei stained), and 3 (67–100% of nuclei stained). 3) The intensity of the stain was scored subjectively according to the following categories; 0 (absent staining), 1 (weak staining), 2 (moderate staining), and 3 (strong staining). These estimates clearly reflected the ability to recognize the positive reactions under high-power, medium-power, and low-power microscopic magnifications, respectively. The predominate reaction represented the final score. 4) The product of the second and third methods produced a range from 0 to 9. 5) A score was obtained from the addition of the second and third methods, resulting in a range from 0 to 6.

Quantitative assessments were made by two different laboratories on the same sections that were used for the IHC evaluation for ER and one for PR. In one assessment, a computerized image system (CAS-200; Lombard, IL) was used for estimating both receptors; in the other assessment, an automated cellular imaging system (ACIS; Chromavison, San Juan Capistrano, CA) measured ER only. In the former assessment, scores were obtained for intensity and percentage alone as well as for a quantitative IHC indicator, which was represented by the product of the percentage and intensity divided by 10. Scores for the ACIS system represented averages for the percent and intensity of six foci that were selected from the most reactive area or “hot spot” of invasive carcinoma in the section. The highest estimate for these two determinations from the latter site also was scored.

Three endpoints—DFS, RFS, and OS—were considered in the current analyses. The times to these endpoints were calculated from the date of surgery. Events for the calculation of DFS were any breast carcinoma recurrences, occurrences of contralateral breast carcinoma, other primary carcinomas, or deaths before these events. Events for the calculation of RFS were breast carcinoma recurrences at local, regional, and distant anatomic sites. Contralateral breast carcinoma was not included as an event in the determination of RFS. All deaths from any cause were used for the calculation of OS. The current analyses were based on information received through December 31, 2001. Because the numbers of events and deaths after 10 years were relatively few, the analyses were truncated at 10 years.

The Spearman ρ statistic was used to calculate the degree of association among IHC, DCC, and computer-assisted methods of assessing ER and PR. Agreement was determined for instances in which dichotomous assessments were made (e.g., IHC any or none) by calculating their sensitivity and specificity with those obtained by the DCC method. For these calculations, the DCC method was considered the gold standard. Interobserver concordance of the “any-or-none” IHC scores also was analyzed by the use of the Cohen κ statistic.

In addition to ER and PR assessments, other pathologic and clinical factors were screened univariately for their relations to outcome at 5 years and at 10 years. These included the numbers of pathologically involved lymph nodes, macroscopic and clinical tumor size obtained from the original pathology and clinical reports, patient age, and treatment. Nuclear grade was assessed by the senior author (E.R.F.) according to criteria that have been described previously in many publications. Although we originally used a three-tiered system of grades, it became evident that there were too few tumors with nuclei that warranted a Grade 1 score. Because of this, the system was dichotomized to indicate only 2 grades, which were designated as “good” and “poor”, with Grade 1 and 2 considered good and Grade 3 considered poor. Briefly, a good nuclear grade consisted of tumor cells with nuclei that were only slightly larger than the nuclei seen in normal ducts. Their configuration was regular, and nucleoli, when present, were small. The nuclei of poor-grade tumors were larger, often irregular, and vesicular with more conspicuous nucleoli compared with the nucleoli in good-grade tumors. When both grades were found in a particular tumor, it was graded as poor. We were interested not only in determining which factors were related significantly to 5-year and 10-year outcomes but also in determining the value of each factor that yielded the greatest discrimination among outcomes. To accomplish this, we employed a screening technique, which is referred to sometimes as “tree-structured” modeling or “recursive partitioning.”28–30 Accordingly, the data for each factor were split into two mutually exclusive groups or “nodes”. For a given partition, the average times to outcome were calculated within each of the two “nodes”. This process was repeated for every allowable split until a “best” or “optimal” split was found. An optimal split for a single factor was defined by the value at which the outcome of interest had the greatest variability across the nodes or, conversely, the most homogeneity within each node. Optimal splits were obtained using a model implemented by Therneau and Atkinson.31 Their algorithm both “grows” and “prunes” a “tree” by adjusting a “cost-complexity parameter,” which limits the number of homogeneous nodes according to a risk-benefit criterion.32

When an optimal split was obtained univariately for each factor, a corresponding P value was obtained with a Cox proportional hazards model. If the P value was < 0.05, then the associated factor was considered a candidate for multivariate analysis. The variables that consistently were associated significantly with the 5-year and 10-year outcomes or that otherwise were important clinically were then entered into a multivariate tree-structured model. In some analyses, the cost-complexity parameter was adjusted to ensure that a parsimonious model was obtained. The multivariate tree-structured modeling approach allowed us to explore relations among factors and outcomes that are difficult to quantify using standard techniques, such as Cox proportional hazards models.

In the displays of the multivariate results, each “node” is depicted by either an oval or a rectangular box. The rectangular boxes represent “terminal nodes” in which no further splitting was warranted, whereas the ovals represent nodes in which further splitting was necessary to classify the various risk groups appropriately. Within each oval or rectangle, the number of patients who died, the number of patients at risk, and the resulting percentage mortality is displayed for the corresponding subgroup of patients, as described above the oval or rectangle. From these percentages, it can be seen that the patients who had better outcomes, on average, are represented on the left side of each split, and patients who had poorer outcomes are represented on the right side of each split.

RESULTS

Agreement of Patient Characteristics in the Pathologic Subset and the Total Cohort

There was no significant difference in patient age, numbers of lymph node metastases, clinical and pathologic tumor size, positive ER or PR status estimated by DCC, or mortality rates in the pathologic subset and in the total cohort.

Interobserver Agreement of IHC Estimates for ER and PR

Spearman coefficients of correlation between the two observers for the initial estimates of receptor status were 87% for both ER percent and ER intensity, and they were 84% and 86% for PR percent and PR intensity, respectively. The κ statistics for agreement between observers for “any-or-none” IHC methods were 89% for ER and 79% for PR. The differences between the observers most commonly appeared to be related to the presence of overlooked foci of faint reactivity.

Concordance, Sensitivity, and Specificity of Methods for Measuring ER and PR

It was found that the concordance of the dichotomous ICH method “any or none” with the DCC estimates, which traditionally are designated as positive at ≥ 10 fm/mg, was good, with a κ value of 50%, sensitivity of 91%, and specificity of 56% for ER and a κ value of 42%, sensitivity of 84%, and specificity of 57% for PR.

The Spearman coefficients of correlation between the DCC method and the other IHC and computer-assisted methods also were good, ranging from 48% to 58% for ER and from 42% to 50% for PR. High correlations were observed between the various IHC and computer-assisted methods and ranged from 47% to 87% for ER and from 64% to 75% for PR.

Significant Univariate Splits for Methods for Measuring ER, PR, and Other Variables and their Relation to Outcomes

Tables 1 and 2 reveal that a positive ER status established by the splits for all of the methods studied was related significantly to a favorable OS at 5 years and 10 years. The correlation between ER and OS was similar for RFS, except that no significant split was detected for the DCC and CAS percentage scores. These latter scores, as well as the IHC percentage scores, also were not significant for DFS at 5 years. Only the ACIS split for ER was related significantly to DFS and RFS at 10 years.

Table 1. Univariate Significance of Optimal Splits for Predictive Variables of Overall Survival, Disease-Free Survival, and Recurrence-Free Survival through 5 Years
MeasureOSDFSRFS
  • OS: overall survival; DFS: disease-free survival; RFS: recurrence-free survival; ER: estrogen receptor; DCC: dextran-coated charcoal; NS: no split; I: intensity; CAS: computerized image system; Q: quantity; ACIS: automated cellular imaging system; Avg; average; Max: maximum; LN: lymph node.

  • a

    Measured in fm/mg.

  • b

    The two values separated by comas indicate the best split value using a tree-structured model.

  • c

    The value in parenthesis is the P value associated with the best split.

  • d

    No split found.

  • e

    Measured as (% × I) / 10.

  • f

    ≤ 1 = Good; > 1 = poor.

ER   
 DCCa≤ 11, > 11b (0.0001)cNSdNS
 None/any0, 1 (< 0.0001)0, 1 (0.0053)0, 1 (0.0014)
 Intensity (I)0, > 0 (< 0.0001)≤ 1, > 1 (0.0010)≤ 1, > 1 (0.0005)
 %0, > 0 (< 0.0001)NS0, > 0 (0.0030)
 % × I0, > 0 (< 0.0001)≤ 2, > 2 (0.0034)≤ 2, > 2 (0.0010)
 % + I≤ 3, > 3 (0.004)≤ 3, > 3 (0.0027)≤ 3, > 3 (0.0024)
 CAS %≤ 3, > 3 (< 0.0001)NSNS
 CAS Qe≤ 3.5, > 3.5 (< 0.0001)≤ 3.7, > 3.7 (0.0049)≤ 4.1, > 4.1 (0.0030)
 CAS I≤ 9, > 9 (< 0.0001)NS≤ 12, > 12 (0.0027)
 ACIS avg I≤ 16.67, > 16.67 (< 0.0001)≤ 80.67, > 80.67 (0.0013)≤ 25.7, > 25.7 (0.0001)
 ACIS max I≤ 94, > 94 (< 0.0001)≤ 85, > 85 (0.0010)≤ 94, > 94 (0.0001)
 ACIS avg %≤ 56.54, > 56.54 (< 0.0001)≤ 62.7, > 62.7 (0.0012)≤ 62.7, > 62.7 (0.0009)
 ACIS max %≤ 8.03, > 8.03 (< 0.0001)≤ 93.2, > 93.2 (0.0014)≤ 93.01, > 93.01 (0.0003)
PR   
 DCC≤ 41, > 41 (0.0045)≤ 8, > 8 (0.0009)≤ 155, > 155 (0.0056)
 Any/none0, 1 (0.0020)NSNS
 Intensity (I)≤ 2, > 2 (0.0004)NSNS
 %NSNSNS
 % × I≤ 1, > 1 (0.0051)NSNS
 % + I≤ 2, > 2 (0.0036)NSNS
 CAS %NSNSNS
 CAS QNSNSNS
 CAS INSNSNS
Others   
 Nuclear gradef≤ 1, > 1 (0.0003)NS≤ 1, > 1 (0.0045)
 No. positive LNs≤ 8, > 8 (< 0.0001)≤ 4, > 4 (< 0.0001)≤ 4, > 4 (< 0.0001)
 Tumor size (cm)   
  Clinical≤ 2.0, > 2.0 (0.0019)≤ 2.0, > 2.0 (0.0014)≤ 1.7, > 1.7 (0.0006)
  Pathologic≤ 3.3, > 3.3 (0.0038)≤ 2.2, > 2.2 (0.0014)≤ 1.6, > 1.6 (0.0012)
 Age at entry (yrs)NSNSNS
Table 2. Univariate Significance of Optimal Splits for Predictive Variables of Overall Survival, Disease-Free Survival, and Recurrence-Free Survival through 10 Years
MeasureOSDFSRFS
  • OS: overall survival; DFS: disease-free survival; RFS: recurrence-free survival; ER: estrogen receptor; DCC: dextran-coated charcoal; NS: no split; I: intensity; CAS: computerized image system; Q: quantity; ACIS: automated cellular imaging system; avg; average; max: maximum; LN: lymph node.

  • a

    Measured in fm/mg.

  • b

    The two values separated by comas indicate the best split value using a tree-structured model.

  • c

    The value in parenthesis is the P value associated with the best split.

  • d

    No split found.

  • e

    Measured as (% × I) / 10.

  • f

    ≤ 1 = Good; > 1 = poor.

ER   
 DCCa≤ 10, > 10b (0.0032)cNSdNS
 Any/none0, 1 (0.0008)NSNS
 Intensity (I)0, > 0, (0.0019)NSNS
 %0, > 0 (0.0029)NSNS
 % × I≤ 2, > 2 (0.0022)NSNS
 % + I≤ 3, > 3 (0.0153)NSNS
 CAS %≤ 3, > 3 (0.0003)NSNS
 CAS Qe≤ 3.5, > 3.5 (<0.0001)NSNS
 CAS I≤ 9, > 9 (0.0001)NSNS
 ACIS avg I≤ 25.7, > 25.7 (<0.0001)NSNS
 ACIS max I≤ 94, > 94 (0.0004)NSNS
 ACIS avg. %≤ 59.21, > 59.21 (0.0011)NSNS
 ACIS max %≤ 83.81, > 83.81 (<0.0001)NSNS
PR   
 DCC≤ 41, > 41 (0.0044)≤ 8, > 8 (0.0015)NS
 Any/noneNSNSNS
 Intensity (I)≤ 2, > 2 (0.0017)NSNS
 %NSNSNS
 % × INSNSNS
 % + INSNSNS
 CAS %NSNSNS
 CAS QNSNSNS
 CAS I≤ 43, > 43 (0.0047)NSNS
Others   
 Nuclear gradef≤ 1, > 1 (0.0017)NSNS
 No. positive LNs≤ 4, > 4 (<0.0001)≤ 4, > 4 (<0.0001)≤ 4, > 4 (<0.0001)
 Tumor size   
  Clinical≤ 2, > 2 (0.0003)≤ 2.3, > 2.3 (0.0003)≤ 2, > 2 (0.0010)
  Pathologic≤ 2.4, > 2.4 (0.0012)≤ 1.6, > 1.6 (0.0022)≤ 1.6, > 1.6 (0.0005)
 Patient age (yrs)NSNSNS

DCC and IHC (any or none, intensity, and the product or sum of percent and intensity, but not percent alone) dichotomized PR, revealing a significantly better OS at 5 years for a positive reaction. Only the positive split for PR by the DCC method was related significantly to DFS and RFS at for that observation period. Over 10 years, only positive DCC, IHC, and CAS intensities for PR were related significantly to OS. Only a positive PR status determined by the DCC method favorably discriminated DFS, but none of the methods were related significantly to RFS at for this longer observation period.

The splits for the numbers of positive lymph nodes and for clinical and pathologic tumor sizes were related significantly to all outcomes at both 5 years and 10 years. Similar correlations between nuclear grade and outcomes were found, except for 5-year DFS and 10-year RFS, for which no splits were evident for nuclear grade. No splits could be established for patient age or treatment for any of the outcomes at any period. It is interesting to note that the split values for the various methods of determining positive ER or PR status not only occasionally were different for the various outcomes but also were different for the 5-year and 10-year periods of observation.

Multivariate Analyses of Prognostic Variables for Outcome

The number of lymph node metastases was identified as the most significant discriminant for OS, DFS, and RFS, with the worst prognoses for patients who had > 8 involved lymph nodes at 5 years and > 4 involved lymph nodes at 10 years. Among 128 of 402 patients (32%) who had died at 5 years, 46 of 90 patients (51%) died in the group that had the worst lymph node status (> 8 positive lymph nodes), compared with only 26% of patients who had fewer positive lymph nodes (≤ 8 positive lymph nodes). Among the patients who died in the group with poor lymph node status, positive ER status, as determined by all IHC methods and by the DCC method, indicated a more favorable prognosis compared with negative ER status. The mortality rate ranged from 36% to 44% in the ER-positive group and from 64% to 67%. in the ER-negative group. For patients with more favorable lymph node status (≤ 8 positive lymph nodes), nuclear grade, but not ER status, further discriminated the 26% mortality rate. Among patients in the good nuclear grade group, only 18% died, compared with 40% of patients in the poor nuclear grade group. These observations are presented in Figure 1. Figure 2 reveals the tree-structured findings at 10 years. It is noteworthy that poor lymph node status discriminated the frequency of mortality in the 193 of 402 patients (48%) patients who died. In this group, the optimal split was for patients who had ≤ 4 positive lymph nodes versus patients who had > 4 positive lymph nodes. This was split further according to positive ER status in patients who had a high degree of lymph node involvement (>4 positive lymph nodes), ranging from 56% to 59% for all IHC methods and for the DCC method; whereas the range was from 75% to 76% among the patients with negative ER status. Nuclear grade was effective in delineating mortality in patients who had favorable lymph node status (≤ 4 positive lymph nodes), with a uniform mortality rate of 46% among patients who had tumors with poor nuclear grade compared with 30% among patients who had tumors with good nuclear grade.

Figure 1.

Frequency (%) mortality by “tree-structured” multivariate analysis for overall survival at 5 years according to all immunohistochemical methods (A–C) and the dextran-coated charcoal (DCC) method (D) for scoring estrogen receptor (ER). (A) ER intensity (I). (B) ER percent. (C) ER any or none (A/N). (D) ER dextran-coated charcoal. LNs: lymph nodes; NG: nuclear grade.

Figure 2.

Frequency (%) mortality by “tree-structured” multivariate analysis for overall survival at 10 years according to all immunohistochemical and dextran-coated charcoal (DCC) methods for scoring estrogen receptor (ER). (A) ER intensity (I). (B) ER percent. (C) ER any or none (A/N). (D) ER dextran-coated charcoal. LNs: lymph nodes; NG: nuclear grade.

The 5-year results relating to DFS (data not shown) also revealed a better prognosis for patients with poor lymph node status who had tumors that exhibited positive ER status regardless of the method used for ER measurement. It was found that patients with this positive ER status also had a better prognosis after receiving PFT rather than receiving only PF. ER status was not identified as a significant prognostic factor for DFS at 10 years. It also was found that treatment with PFT and a subsequent positive ER status, as determined using all methods, discriminated RFS favorably in patients who had poor lymph node status at 5 years, but not at 10 years. No variable was identified that had a significant split other than lymph node status at 10 years. The prognosis for DFS, as well as RFS, in the favorable lymph node group was better for patients who had tumors that measured ≤ 2.2 cm; and this tumor-size variable replaced nuclear grade, as noted above, for OS in patients who had favorable lymph node status.

The relation of PR status to OS generally was similar to that of ER but less consistent. All methods for PR further discriminated OS for patients in the worse lymph node group (> 8 positive lymph nodes) at 5 years. However, only PR status determined by the DCC method and tumor size had splits at 10 years. Treatment with PFT discriminated OS favorably in patients who had tumors that measured > 1.1 cm. However, those findings were “farther down the tree“; and, according to the tree-structured method, treatment would be considered a weak indicator of outcome.

DISCUSSION

The current findings reaffirm the results of others demonstrating acceptable concordance, sensitivity, and specificity between the results of the DCC method and the visual IHC and computer-assisted methods for dichotomizing positive and negative estimates of ER and PR. However, it is interesting to note that almost all previous studies were based on agreements between the DCC method and only one particular visual or computerized method rather than different methods for determining ER and PR status; in the current study, we also demonstrated that the different methods exhibited agreement with one another. Most notably, the splits used for all comparisons, as well as in our other analyses, were based on outcomes rather than subjective judgments. This information strongly suggests that all of the methods examined in this study for determining ER and PR status have similar prognostic value. This may account for the general historic commonality of end results relating to receptors despite the differences in methods used for scoring.

This possibility appeared to be substantiated by the results of the univariate analyses, which revealed positive ER status regardless of the method used for its demonstration, because a favorable OS was seen during 5 years and during 10 years of observation. In addition, positive ER status was only slightly less consistent for DFS and RFS during these observation periods. It is interesting to note that the DCC method for determining ER status was not identified as a predictor of DFS or RFS. The methods used for assessing PR status consistently were less predictive for OS, DFS, and RFS compared with the methods used for assessing ER status during both periods of observation. However, using the DCC method for determining PR status, unlike for ER status, was singularly predictive for DFS over 5 years and 10 years of observation and for RFS over 5 years of observation. Statements indicating that the IHC methods are superior to DCC may not be entirely correct. In the context of our data, the univariate analysis for ER seemed to represent a more consistent prognostic indicator for OS than PR. Our results indicate at the least that ER represents a unique but strong univariate predictor for both long-term OS and short-term OS.

The identification of receptors by multivariate tree-structured analyses further strengthens the value of all IHC methods as prognostic indicators. It should be recognized that a more traditional multivariate model initially failed to demonstrate any significance for receptors in this regard. It was believed at that time that this lack of effect may have been due to the overwhelming influence of positive lymph node status. The primacy of lymph node status as a prognostic indicator long has been appreciated. Indeed, as expected, the model utilized in this study for detecting prognostic variables by multivariate analysis ranked the number of positive lymph nodes as the most significant variable. That analysis also revealed that ER and, less consistently, PR were next in the hierarchy for discriminating OS among patients with the least favorable lymph node status (> 8 lymph nodes over 5 years of observation, and > 4 lymph nodes over 10 years of observation). More important, the prognostic value of ER was observed with all methods used for this purpose. Nuclear grade exerted a similar effect in patients who had a more favorable lymph node status during both observation periods. The tree-structured analyses allowed us to identify the effects of ER status and PR status more easily in the less favorable lymph node subgroups and to identify nuclear grade more easily in patients who had more favorable 5-year and 10-year outcomes than would have been obtained by traditional parametric or semiparametric statistical models. Using standard multivariate techniques, we also were able to validate that the variables and splits used in the analysis of our cohort of 402 patients were significant in the eligible B-09 patients with follow-up and sufficient pathologic measurements who were not included in our cohort.

It was not surprising that the long-term prognostic value of PR was not as consistent as ER in light of the univariate findings. Results from other investigators concerning the value of PR in this regard have been inconsistent.33–35 In the current investigation, the rank of PR was replaced by tumor size at 10 years for OS. In fact, there may be some uncertainty with regard to the optimal immunohistochemical antibody used for this purpose.36

The high degree of agreement for the initial scores for all of the basic IHC methods (any or none, intensity, and percentage) noted between the two observers indicates their reproducibility. Thus, the IHC methods, at least for determining ER status in lymph node-positive patients, appear to satisfy all of the requirements for representing a surrogate for the DCC method. The simplicity of the dichotomous any-or-none algorithm appears to be an appropriate selection for practical use and avoids the delusion of precision implied by the more complex techniques for receptor assessment. Its use also should result in more patients receiving antiestrogen therapy. A positive internal control was required to verify a true-negative reaction in this study, because materials were obtained from different sources, and there may have been variation in fixation and subsequent embedding techniques. This does not appear to be necessary in individual laboratories when external controls and test samples have been processed and stained identically. Finally, it is noteworthy that the IHC data were obtained in this study from paraffin embedded tissues that had been prepared 21 years previously.

It is uncertain whether our findings and conclusions relating to the IHC methods are applicable to patients who have negative lymph node status. There are very few studies in this regard, and their results either have failed to discriminate prognostic relations of receptors in cohorts of patients with both positive and negative lymph nodes37 or have indicated a role for receptors in lymph node-positive patients only.9

Ancillary