Noninvasive genomic detection of melanoma

Background Early detection and treatment of melanoma is important for optimal clinical outcome, leading to biopsy of pigmented lesions deemed suspicious for the disease. The vast majority of such lesions are benign. Thus, a more objective and accurate means for detection of melanoma is needed to identify lesions for excision. Objectives To provide proof-of-principle that epidermal genetic information retrieval (EGIR™; DermTech International, La Jolla, CA, U.S.A.), a method that noninvasively samples cells from stratum corneum by means of adhesive tape stripping, can be used to discern melanomas from naevi. Methods Skin overlying pigmented lesions clinically suspicious for melanoma was harvested using EGIR. RNA isolated from the tapes was amplified and gene expression profiled. All lesions were removed for histopathological evaluation. Results Supervised analysis of the microarray data identified 312 genes differentially expressed between melanomas, naevi and normal skin specimens (P<0·001, false discovery rate q<0·05). Surprisingly, many of these genes are known to have a role in melanocyte development and physiology, melanoma, cancer, and cell growth control. Subsequent class prediction modelling of a training dataset, consisting of 37 melanomas and 37 naevi, discovered a 17-gene classifier that discriminates these skin lesions. Upon testing with an independent dataset, this classifier discerned in situ and invasive melanomas from naevi with 100% sensitivity and 88% specificity, with an area under the curve for the receiver operating characteristic of 0·955. Conclusions These results demonstrate that EGIR-harvested specimens can be used to detect melanoma accurately by means of a 17-gene genomic biomarker.

. Description of the 17-gene melanoma classifier Data S1. Details of Strategy for Melanoma Class Prediction Modeling Data S2. Assay of melanoma and nonmelanoma specimens by quantitative real-time reverse transcription-polymerase chain reaction using the 17-gene classifier recapitulates microarray results

Fig S1.
Photomicrographs of a melanoma, not identified on initial histopathological evaluation, which was detected by the 17-gene melanoma classifier The initial histopathologic diagnosis of a mid-lumbar skin lesion (A, B) was a Clark naevus. The 17-gene classifier, however, characterized the EGIR-harvested specimen as a melanoma (see specimen denoted by the arrow in Fig 3). Therefore, serial sectioning of the biopsy was performed and re-reviewed by both the primary and central dermatopathologists. Based on these additional sections (see representative photomicrographs shown in C and D) the pathology of the lesion was revised to malignant melanoma, superficial spreading type, Clark's level II and Breslow thickness 0.37 mm, arising in associated with a compound naevus, with moderate host response.
Photomicrographs of the biopsied mid-lumbar skin lesion: A & B) sections show skin with irregularly nested and single melanocytic naevus cells along the dermal-epidermal junction. Nests bridge between adjacent rete ridges and melanocytic naevus cells are also seen in the dermis. Dermal changes include fibroplasias, mild mononuclear cell inflammation and pigment incontinence. C & D) sections show focal areas of large atypical melanocytes with pleomorphic nuclei, abundant cytoplasm and granular pigment. Single and nested atypical melanocytes are prominent in the lower epidermis with single cells also seen in the mid-epidermis. There are focal areas of atypical epithelioid melanocytes invading downward below the junctional atypia into the papillary dermis, surrounded by a moderate lymphocytic host response.

Data S1. Details of Strategy for Melanoma Class Prediction Modeling
Array data containing 76 melanomas and 126 naevi were processed and normalized using GCRMA from Bioconductor (http://www.bioconductor.org). Gene targets with an expression value <100 across all 202 samples was filtered out from further consideration, leaving 22,526 genes for further analysis. The samples were then divided into a training dataset of 37 melanomas and 37 naevi and a test dataset of 39 melanomas and 89 naevi.
The training set data were then analyzed for differential gene expression by means of t-test with multi-testing correction (Westfall & Young Permutation method 1 )(p < 0.05, FDR < 0.05). Starting with the 22,526 genes, 422 differentially expressed genes were identified.
The 422 genes from the training dataset were further analyzed by stochastic gradient boosting method 2, 3 (TreeNet [Salford Systems, Inc.]) for class prediction modeling. The performance of each model was subsequently evaluated on the test dataset of 39 melanomas and 89 naevi. The following parameters were used for the class model building: learn rate of 0.001, subsample fraction of 0.5, influence trimming factor of 0.1, M-regression breakdown of 0.9, and cross entropy (likelihood) for the optimal logistic model selection criterion. The number of trees used for model building was set to 10,000, the maximum number of nodes per tree was set to 6, the minimum number of training observations in terminal nodes was set to 10, the maximum number of most-optimal models to save summary results for was set to 1 and a threshold of 0.5 was set for the classification modeling as the default setting for TreeNet.
With the TreeNet ranked variable importance cut-off value of >3.0, a class prediction model was generated from the training dataset that contained 168 of the 422 genes. This model correctly identified all 37 melanomas and 35 out of 37 naevi. The performance of this class prediction model was evaluated with the test data of 39 melanomas and 89 naevi. All 39 melanomas in the test dataset were accurately identified and 78 out of 89 naevi were called correctly by the 168-gene classifier, indicative of 100% sensitivity and 88% specificity.
We then sought to reduce the number of genes in this class prediction model while maintaining the model's high sensitivity and specificity. Starting with the 168 genes, the ranked variable importance cut-off value was set to >8.0. The resulting class prediction model contained 56 genes. Testing with the independent dataset revealed this 56-gene classifier to be 100% sensitive and 88% specific. This process was repeated using the 56-gene classifier in which the variable importance cutoff value was set to >10.0. The resulting 42-gene classifier, when tested, was again found to have a sensitivity of 100% and 88%, respectively.
To further reduce the number of predictors from the 42-gene model, we used a shaving method to generate by means of TreeNet progressively smaller models. Shaving removed the lowest ranked gene predictor from the existing TreeNet model such that the 100% sensitivity and 88% specificity was maintained upon testing. This shaving strategy eventuated in a predictive model containing the 17 genes that was found to have 100% sensitivity and 88% specificity upon testing.