Can a gene-expression classifier with high negative predictive value solve the indeterminate thyroid fine-needle aspiration dilemma?

Authors

  • William C. Faquin MD, PhD

    Corresponding author
    1. Department of Pathology, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts
    • Department of Pathology, Warren 219, Massachusetts General Hospital, 55 Fruit Street, Boston, MA 02114;

    Search for more papers by this author
    • Fax: (617) 573-3389


Abstract

The recent study by Alexander et al validates the effectiveness of the Afirma test, and suggests a potential ancillary role for this unique test when appropriately applied in the evaluation of thyroid nodules classified by fine-needle aspiration as indeterminate. Cancer (Cancer Cytopathol) 2013. © 2013 American Cancer Society

Thyroid fine-needle aspiration (FNA) plays a pivotal role in the initial clinical evaluation of thyroid nodules.1–5 Each year, over 450,000 thyroid FNAs are performed in the United States; and, although thyroid FNA accurately classifies a majority of thyroid nodules as either “benign” or “malignant,” approximately 15% to 30% of thyroid FNAs fall into an indeterminate category (includes “atypical cells of undetermined significance/follicular lesion of undetermined significance” [AUS/FLUS], “suspicious for a follicular neoplasm/follicular neoplasm,” and “suspicious for malignancy”), representing a so-called gray zone with regard to clinical management.6–9 The recent study by Alexander and colleagues10 provides additional insight into a potentially useful new advance in the evaluation of the indeterminate thyroid FNA with the potential for significant impact in guiding the management of a subset of thyroid nodules in this category. Although the novel molecular test evaluated in their study is without question a significant advancement, it also raises some problematic issues pertaining to the test's application to thyroid FNAs and to its perceived cost savings.

The Veracyte Afirma Gene Classifier

In contrast to other molecular tests, including those championed by Nikiforov and colleagues and the Asuragen miRInform thyroid panel (Asuragen Inc., Austin, Tex), which are based on finding a specific mutation or gene rearrangement with a high positive predictive value for the detection of carcinoma (typically papillary thyroid carcinoma),11–13 the Afirma test (commercially owned by the Veracyte Corporation; South San Francisco, Calif) studied by Alexander and colleagues relies on a “benign gene expression fingerprint” to identify those indeterminate thyroid FNAs with a high negative predictive value (NPV), similar to the probability of malignancy for an initial “benign” cytologic diagnosis.10 The Afirma test is essentially a “rule-out” test for thyroid cancer, and the most recent validation study by Alexander et al represents the third and largest in a series of validation studies of this test.14,15 The Afirma test assesses gene expression from mRNA isolated from thyroid FNA samples by comparing the mRNA expression detected in a thyroid FNA against a panel of 167 molecular genes. It uses a multidimensional algorithm to identify those thyroid FNA samples with a benign gene expression pattern.10,14,15 Because the Afirma test's gene expression classifier is proprietary, the detail of the relative weights given to each of the genes in the panel remains unpublished. The test assigns a thyroid FNA to 1 of 2 categories—”benign” or “suspicious”—although, by the manufacturer's own definition, it may be better to call the latter category “indeterminate.” The test currently costs $3350 plus an additional charge for cytologic evaluation in some cases.

The recent validation study by Alexander et al consisted of a double-blind, prospective, multicenter investigation over 19 months at 49 sites.10 An important strength of the study is that it included samples from both community practices as well as those from large academic centers. In 4812 thyroid FNAs from nodules that measured ≥1.0 cm, 577 samples were classified as “indeterminate,” from which only 265 samples ultimately were selected to comprise the study's cohort of indeterminates (49% AUS/FLUS, 31% FN, and 21% suspicious for malignancy). In the study by Alexander et al, the overall NPV for the Afirma test was 93% with a sensitivity of 92% (Table 1).10 Based on histologic follow-up, 7 of 85 cancers overall (8.2% false-negative rate) were diagnosed incorrectly as “benign”; and, among the important “AUS/FLUS” and “suspicious for follicular neoplasm/follicular neoplasm” subsets of indeterminates, the false-negative rates were 9.7% and 10%, respectively (Table 1). The authors attributed these false-negative rates to insufficient RNA (secondary to insufficient cells) in the FNA sample used for gene expression analysis.10 Hopefully, Veracyte will be able to lower the false-negative rates by finding a better way to control for limited cellularity in their FNA samples; although, currently it seems that a false-negative rate of 9% to 10% for “AUS/FLUS” and “suspicious for follicular neoplasm/follicular neoplasm” specimens is to be expected for the commercial test.

Table 1. Summary of the Afirma Test Applied to the Indeterminate Bethesda Categoriesa
Bethesda CategoryNo. (% of Total)Sensitivity, %Specificity, %False-Negative Rate, %NPV
  • a

    Abbreviations: AUS/FLUS, atypia of undetermined significance/follicular lesion of undetermined significance; FN, follicular neoplasm; NPV, negative predictive value.

  • Summarized from Alexander EK, Kennedy GC, Baloch ZW, et al. Preoperative diagnosis of benign thyroid nodules with indeterminate cytology. N Engl J Med. 2012;367:705-715.10

AUS/FLUS129 (48.7)90539.795
Suspicious for FN81 (30.6)90491094
Suspicious for malignancy55 (20.8)94525.985
Total26592528.293

At 92%, the overall sensitivity of the Afirma test was very good. Not surprisingly, however, the overall specificity at 52% was much lower, reflecting that 48% of histologically benign thyroid nodules remained in the test's “suspicious” or “indeterminate” category. A key achievement of the Afirma test was the demonstration that the NPV for thyroid FNAs diagnosed as “AUS/FLUS” was 95%. However, it is not clear why the rate of malignancy within the study's AUS/FLUS group (24%) was higher than that reported in The Bethesda System for Reporting Thyroid Cytopathology (TBSRTC) (5%-15%) and was nearly identical to that of the “suspicious for a follicular neoplasm/follicular neoplasm” group (25%). The reason that TBSRTC distinguishes a thyroid FNA group as “AUS/FLUS” is because of its intermediate rate of malignancy in the 5% to 15% range. Of the 129 AUS/FLUS cases in the study's cohort, the Afirma test was able to reclassify 55 as “benign,” whereas 74 of the cohort's “AUS/FLUS” FNAs were classified as “suspicious”; and, in histologic follow-up, 46 of the latter were benign (false-positive rate, 62%). Such a high false-positive rate means that, even when using the Afirma test, a large number of surgeries still will be performed for benign disease. These data highlight the danger in using a molecular test reflexively, because it can lead to the treatment of a test rather than treating the whole patient. One cannot overemphasize the important role of clinical judgment over reflex testing in deciding which indeterminate thyroid nodules can be followed versus surgically resected.

For those FNAs in the “suspicious for a follicular neoplasm/follicular neoplasm” group, the NPV also was impressive at 94%. In contrast, the test did less well for those thyroid FNAs diagnosed as “suspicious for malignancy” (NPV, 85%). According to the study's authors and based on the test's overall low specificity, the Afirma test is not intended for use with thyroid FNAs diagnosed cytologically as “benign” or “malignant,” nor would it be appropriate for a nondiagnostic FNA. Also, based on its lower NPV for the “suspicious for malignancy” category, and as pointed out by Alexander et al, it does not seem to be a very reliable test for that Bethesda category either.

So, where does the Afirma test have the most potential to impact the indeterminate thyroid FNA? Based on the Alexander et al validation study, it appears to be most useful for those thyroid FNAs diagnosed initially as “suspicious for a follicular neoplasm/follicular neoplasm.” In TBSRTC, the typical management algorithm for thyroid nodules diagnosed by FNA as “suspicious for a follicular neoplasm/follicular neoplasm” is surgery, most often thyroid lobectomy.16,17 By using the Afirma test, 52% of thyroid FNAs in this indeterminate category were able to be reclassified as “benign,” thus potentially avoiding surgery for these patients. Unfortunately, among the entire group of indeterminate FNAs in the study by Alexander et al, the “suspicious for a follicular neoplasm/follicular neoplasm” category only represents approximately 30% of cases; and, based on findings from other groups, this subset of indeterminate FNAs represents only 2% to 8% of all thyroid FNAs.18,19

A significant and strategically important aspect omitted from many discussions about the role of molecular testing in thyroid cytology, and in particular representing a serious omission from discussions about the Afirma test, is the role of repeat FNA as recommended by TBSRTC in the decision-making process. Although Alexander and colleagues were able to reclassify 43% of thyroid FNAs diagnosed as “AUS/FLUS” as “benign,” 57% of those aspirates remained in their “suspicious” category. By comparison, several groups, including ours and those involving authors in the Alexander et al study, have demonstrated that cytologic evaluation alone can accurately reclassify ≥50% of nodules in the “AUS/FLUS” category as “benign” simply by performing a repeat FNA and without the use of any ancillary molecular tests.1,18,20,21 In addition, 1% or 2% of repeat FNAs from the “AUS/FLUS” category are diagnosed as “suspicious for malignancy” or higher and would go directly to surgery without the need for additional testing. Thus, for the “AUS/FLUS” group, which accounts for the largest proportion of indeterminate thyroid FNA cases, a repeat thyroid FNA appears to be at least as good if not better than the Afirma test for triaging patients; and, for the same result, repeat thyroid FNA represents a significant reduction in cost over ancillary molecular testing. One also may ask whether some form of targeted review of “AUS/FLUS” cases, either by an intralaboratory consensus or by pathologists experienced in difficult “AUS/FLUS” issues, might make cytologic interpretation even better than a reflex molecular test. Such a review of AUS/FLUS cases could be done immediately and at a significantly lower cost than reflex molecular testing. Conversely, if the increased cost of the Afirma test were not an issue, then it also would have the advantage of being able to triage patients based on the initial thyroid FNA rather than delaying this to a second FNA procedure. In fact, a significant problem with waiting for a second FNA of an “AUS/FLUS” case is that many patients will decide to go directly to surgery (often for a benign nodule) without waiting for further cytologic evaluation.20 An additional potential role for the Afirma test would be its application to the small subset of repeat “AUS/FLUS” cases; however, the cost effectiveness of this has yet to be examined.

Although they are outside the scope of the validation study by Alexander et al, several critical issues surrounding the implementation of the Afirma test by the Veracyte Corporation have the potential to significantly impact the cytology community and need to be mentioned. First is the unfortunate recommendation by the Veracyte Corporation (and tendency of some medical centers) that the Afirma test be performed reflexively for an indeterminate thyroid FNA diagnosis. Doing so takes the clinician out of the equation and leads to increased and unnecessary health care costs. One can envision various clinical scenarios in which direct surgical intervention rather than further testing might be the best approach to managing a patient with an indeterminate thyroid FNA based on other mitigating factors, such as a combination of family history, nodule size, mass effects, lymphadenopathy, and sonographic features. Second, although the Veracyte Corporation has permitted several academic centers, including ours, to perform their own cytologic interpretation before submitting a thyroid FNA for the Afirma test, other smaller, community-based and independent laboratories are required to submit their thyroid FNA samples directly to the Veracyte Laboratory (Austin, Tex) for cytologic analysis, thus bypassing the local cytopathologist altogether. Although it may be argued that this approach is necessary to control for potential variations in the diagnostic classification of thyroid FNAs, the plan nonetheless seems misguided. At several recent professional pathology and cytology conferences, I witnessed a fairly widespread and deep concern among cytopathologists as well as cytotechnologists at the prospect of completely losing their practice's thyroid FNAs. Given the expert panel of well known pathologists and cytopathologists involved in the recent validation study, the cytology community should be encouraged that these leaders in the field will be strong advocates for them as the Afirma test becomes more popular. Clearly, these are areas that will demand further discussion and action on the part of cytopathologists, cytotechnologists, and cytology organizations to ensure that cytology remains a central player in the evaluation and diagnosis of thyroid nodules.

In conclusion, the recent study by Alexander et al validates the effectiveness of the Afirma test and suggests a potential ancillary role for this unique test when appropriately applied in the evaluation of thyroid nodules classified by FNA as indeterminate. In a large subset of indeterminate cases, however, the Afirma test appears to be no more effective than performing a repeat FNA. The Afirma test does seem to be suited to dealing with the subset of thyroid FNAs classified as “suspicious for a follicular neoplasm/follicular neoplasm” when done in conjunction with clinical and sonographic findings, and not as a reflex test. Many questions remain, however, about the implementation of the Afirma test with regard to reflex testing, the role of both the cytopathologist and the clinician in the decision-making process, and overall cost reduction.

FUNDING SOURCES

No specific funding was disclosed.

CONFLICT OF INTEREST DISCLOSURES

The author made no disclosures.

Ancillary