Machine learning applied to atopic dermatitis transcriptome reveals distinct therapy-dependent modification of the keratinocyte immunophenotype

Background: Atopic dermatitis (AD) arises from a complex interaction between an impaired epidermal barrier, environmental exposures, and the infiltration of Th1/Th2/Th17/Th22 T cells. Transcriptomic analysis has advanced understanding of gene expression in cells and tissues. However, molecular quantitation of cytokine transcripts does not predict the importance of a specific pathway in AD or cellular responses to different inflammatory stimuli. Objective: To understand changes in keratinocyte transcriptomic programmes in human cutaneous disease during development of inflammation and in response to treatment. Methods: We performed in silico deconvolution of the whole-skin transcriptome. Using co-expression clustering and machine learning tools, we resolved the gene expression of bulk skin (n=7 datasets, n=406 samples), firstly, into unsupervised keratinocyte immune response phenotypes and, secondly, into 19 cutaneous cell signatures of purified populations from publicly available datasets. Results: We identify three unique transcriptomic programmes in keratinocytes, KC1, KC2, KC17, characteristic to immune signalling from disease-associated helper T cells. We cross-validate those signatures across different skin inflammatory conditions and disease stages and demonstrate that the keratinocyte response during treatment is therapy dependent. Broad spectrum treatment with ciclosporin ameliorated the KC17 response in AD lesions to a non-lesional immunophenotype, without altering KC2. Conversely, the specific anti-Th2 therapy, dupilumab, reversed the KC2 immunophenotype. Conclusion: Our analysis of transcriptomic signatures in cutaneous disease biopsies reveals the complexity of keratinocyte programming in skin inflammation and suggests that the perturbation of a single axis of immune signal alone may be insufficient to resolve keratinocyte immunophenotype abnormalities.


Abstract Background
Atopic dermatitis (AD) arises from a complex interaction between an impaired epidermal barrier, environmental exposures, and the infiltration of Th1/Th2/Th17/Th22 T cells. Transcriptomic analysis has advanced understanding of gene expression in cells and tissues. However, molecular quantitation of cytokine transcripts does not predict the importance of a specific pathway in AD or cellular responses to different inflammatory stimuli.

Objective
To understand changes in keratinocyte transcriptomic programmes in human cutaneous disease during development of inflammation and in response to treatment.

Methods
We performed in silico deconvolution of the whole-skin transcriptome. Using coexpression clustering and machine learning tools, we resolved the gene expression of bulk skin (n=7 datasets, n=406 samples), firstly, into unsupervised keratinocyte immune response phenotypes and, secondly, into 19 cutaneous cell signatures of purified populations from publicly available datasets.
3 disease stages and demonstrate that the keratinocyte response during treatment is therapy dependent. Broad spectrum treatment with ciclosporin ameliorated the KC17 response in AD lesions to a non-lesional immunophenotype, without altering KC2.

Conclusion
Our analysis of transcriptomic signatures in cutaneous disease biopsies reveals the complexity of keratinocyte programming in skin inflammation and suggests that the perturbation of a single axis of immune signal alone may be insufficient to resolve keratinocyte immunophenotype abnormalities.

Ps-Non
Non-lesional psoriatic skin S100 S100 protein family gene Alongside the immune skin infiltrate, spongiosis and keratinocyte hyperplasia are the cardinal features of epidermal changes in AD. In addition to the gene mutation mediated reduction of filaggrin expression, type 2 inflammation also reduces keratinocyte filaggrin expression thereby further damaging the skin barrier 8,9 .

7
The role of IL-17 and IL22 cytokines in regulating antimicrobial peptides such as s100 proteins and b-defensins is well established [17][18][19][20][21][22] and the importance of these pathways in psoriasis has been validated by clinical demonstration of effectiveness of inhibitory monoclonal antibody therapy 23,24 ; their precise function in AD is less clear. To study the key pathways driving AD, where targeted intervention may prove most fruitful, direct quantitation of the immune signals (e.g. cytokines) can be undertaken. However, as molecular quantitation of the cytokine transcripts does not predict the importance of a specific pathway in AD, it is necessary to study the outcome of the epidermal responses to different inflammatory stimuli to properly define their role. Thus, to characterise the keratinocyte immunophenotype it is necessary to be able to investigate skin transcriptome to a cellular resolution.
Single-cell analysis can offer an approach to this question but is limited by the technical challenge of achieving adequate encapsulation of enough cells of interest with minimal transcriptomic disturbance. Here we show that it is possible to employ machine learning to resolve the keratinocyte transcriptomic signal from the nonkeratinocyte skin transcriptome, revealing important insights to the pathogenesis of AD.
All rights reserved. No reuse allowed without permission.
the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Microarray data analysis
Microarray datasets were obtained from the Gene Expression Omnibus (GEO, NCBI) and were analysed from raw data in R and normalised according to platform specifics. For Affymetrix microarray platforms, classic microarray quality control was performed using the Bioconductor arrayQualityMetrics tool and were normalised to obtain expression values by GCRMA methods within the Affymetrix package.
Illumina platforms and other technologies were quantile normalised using the lumi or limma Bioconductor packages.

Unsupervised network clustering
The inflammatory skin disease datasets GSE32924, GSE36842 and GSE34248 were processed as above and subsequently merged and batch corrected using the COMBAT tool within the SVA Bioconductor package. Differential gene expression analysis between healthy controls and lesional skin, within disease lesional and nonlesional, and across disease lesional comparisons was conducted using a filtering of Benjamini-Hochberg adjusted p-value >0.05, log(2)-fold difference x1 using the LIMMA Bioconductor package. The expression values of 4620 probset-IDs, corresponding to 3066 unique genes were input into MIRU (now, GraphiaPro) for network analysis 25,26 . A transcript-to-transcript correlation matrix using Pearson correlation coefficient of r ³0.7 was created. The resulting network graph was then clustered into groups of genes using the MCL algorithm at an inflation value of 3.1 and minimum cluster size of 10 genes, giving 50 clusters. The gene list for each cluster was interrogated for gene ontology using the web-based analysis tool ToppFun within the ToppGene suite 27 . The REVIGO online tool was used to provide All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2019.12.14.19014977 doi: medRxiv preprint 9 a single biological process term for each cell-based cluster, selecting that with both the lowest B.H. p-value and a term dispensability of zero.

Reference cutaneous cell populations datasets
To curate cell profiles for input as CIBERSORT reference signatures, we collated seven datasets from GEO: GSE36287 (keratinocytes stimulated with IFNa, IFNg, IL-

Bulk skin datasets
Datasets of skin biopsies from inflammatory skin diseases were obtained from GEO.

Running CIBERSORT
Datasets of bulk skin samples were deconvoluted against the reference signature sets using the online version of the CIBERSORT 28 algorithm. Reference signature files were provided as the signature gene file, while normalised expression data of bulk samples were provided as the mixture file. All run settings were kept at default.
Output provided by CIBERSORT was downloaded as a .txt file, where relative abundance of each cellular signature was normalised as a percent of sample.
All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Gene co-expression analysis of inflammatory skin disease genes reveals immune and keratinocyte involvement.
Firstly, we set out to examine the transcriptomic signals from skin biopsies of lesional, regardless of chronicity, and non-lesional AD and psoriasis (AD-Les, AD-Non, Ps-Les, Ps-Non, respectively). In line with previous reports, unsupervised differential expression analysis identified 4620 genes 3,29 . Transcript-to-transcript (data not shown)). It is notable that genes in cluster 9 include KRT6A/B, KRT16, and All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2019.12.14.19014977 doi: medRxiv preprint the S100 and SERPIN encoding proteins which are known to be involved in epidermal perturbation from inflammation and hyper-proliferative barrier breach 2,32,33 , indicating that the cross-talk between immune inflammation and keratinocyte function is important for lesion pathogenesis.

Machine learning resolution of whole skin samples into constituent cellular profiles.
To resolve from bulk expression data the transcriptomic signatures of keratinocyte responses we utilised a machine learning approach. We trained an algorithm (CIBERSORT 28,34 ) to resolve gene expression profiles of purified cellular populations and tested this on whole tissue transcriptomic data from split skin (epidermis and dermis) to identify the relative proportion of cells. We then utilised laser captured dermal and epidermal regions from both healthy and AD skin 35 to demonstrate that the algorithm could reliably separate relative proportions of the keratinocyte and fibroblast composition of whole skin in uninflamed and inflamed settings (Figure 2a, b). We expanded this approach by utilising a training set of transcriptomes for melanocytes, resident regulatory, CD4 and CD8 T cells, steady-state and activated Langerhans cells, and CD11c + dermal dendritic cells to increase the resolution of the cellular skin components (Figure 2c).

In silico sorting reveals prevalence of KC2 and progression of KC17 immunophenotypes during the course of the inflammation in AD.
To identify sub-populations of keratinocytes showing a molecular response to a specific inflammatory cytokine, we further trained the algorithm to resolve keratinocytes responding to IFN-α and IFN-γ (KC1), IL-4 and IL-13 (KC2), IL-17A All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2019.12.14.19014977 doi: medRxiv preprint Compared to healthy controls, AD keratinocytes showed an IL-17 sensing signal in lesional as well as non-lesional skin, but this was only significant for chronic lesions (ANOVA p<0.0001) (Figure 3d). KC17 appeared to dominate in chronic AD lesions, suggesting an evolution in Th17 pathway over time.
All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2019.12.14.19014977 doi: medRxiv preprint the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2019.12.14.19014977 doi: medRxiv preprint Using machine learning on published datasets, we trained an algorithm to identify All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Discussion
The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2019.12.14.19014977 doi: medRxiv preprint different skin cell types and then cellular responses to different inflammatory responses of interest. We validated this algorithm on transcriptomic studies of microdissected healthy skin and psoriasis which has a known immune pathway dependence, showing that our approach is a powerful method for investigating transcriptomic signatures in skin samples of complex disease.
Applying such optimised machine learning-based analysis to existing datasets of AD disease stages and during treatment has confirmed the constitutive atopic skin phenotype in AD patients. An altered transcriptomic programme in keratinocytes was evident in all samples (lesional and non-lesional) which reflected Th2 sensing by keratinocytes which we termed "KC2", and similar findings have been reported by others 3,5 . Further, we show the immunophenotype shift characteristic of lesion progression modifies keratinocyte profiles to an IL-17 (KC17) and interferon (KC1) sensing phenotype. Thus, we could demonstrate that although acute AD lesions show a strong Th2 signal, and chronic lesions have Th1 and Th17 signals, the Th2related processes is amplified rather than a switch away from a type 2 cytokine response in chronic lesions.
Effective treatment of psoriasis with ciclosporin showed a reversal of the dominant KC17 profile of lesional skin to that of unaffected skin. However, in AD, remarkably, and despite resolution of skin inflammation as measured by disease severity, ciclosporin did not modify the KC2 profile of lesional skin. Instead, AD disease remission with ciclosporin correlated with loss of the KC17 signal. In contrast to this, with dupilumab, striking loss of the KC2 signal associated with disease remission, whereas the KC17 response was not observed. This underscores the critical role of All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2019.12.14.19014977 doi: medRxiv preprint the type 2 cytokines in AD and might suggest a strong role for IL-17 pathway in AD pathogenesis; it may also reflect the complex immunophenotype of the disease and potential immune mediator redundancy. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2019.12.14.19014977 doi: medRxiv preprint 18 rebalances KC17 subpopulation comparable to normal skin but does not modify type 2 cytokine sensing. Whereas, dupilumab therapy reverses the KC2 dominance in lesional AD. Taken together, these observations suggest that whilst type 2 cytokines appear to drive the biology of AD, the efficacy of ciclosporin in AD is likely to lie beyond the targeting of T cells with resultant Th17 inhibition and that other pathways modified by this effective therapy should be explored as potential therapeutic targets.
All rights reserved. No reuse allowed without permission. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2019.12.14.19014977 doi: medRxiv preprint the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Co-expression clusters from
the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
See file: Supplementary figure S1.pdf

Supplementary Table 1.
Gene ontology annotation of the cell-based co-expression clusters from whole skin microarray data.
Seven cell-based clusters were annotated from gene co-expression analysis ( Figure   1A). The clusters were annotated for biological process gene ontology (GOBP) using the ToppFun/ToppGene; the top 200 annotations with FDR Benjamini-Hochbergcorrected p-values <0.05 were selected and collapsed for redundancy using REVIGO. The most significant collapsed annotation from REVIGO was selected as the cluster biological process annotation, with the ToppGene rank shown. the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.

Genes
The copyright holder for this preprint (which was not peer-reviewed) is  See file: Table S4.xlsx All rights reserved. No reuse allowed without permission.
the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2019.12.14.19014977 doi: medRxiv preprint   the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
The copyright holder for this preprint (which was not peer-reviewed) is . https://doi.org/10.1101/2019.12.14.19014977 doi: medRxiv preprint  the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.