Deep Learning Identifies HAT1 as a Morphological Regulator in Esophageal Squamous Carcinoma Cells through Controlling Cell Senescence

Histopathology is a critical approach for diagnostic tasks and precision treatment. However, histopathological deep learning tools for auto‐identification remain poorly developed. Meanwhile, the interpretation of the computer vision attention into a cellular process is less efficient in a systematic way. Herein, it is identified that histone acetyltransferase 1 (HAT1) is an aging‐associated gene in the esophagus epithelium by machine learning. An interpretable deep learning model is developed to distinguish morphological changes with varied HAT1 expressions in esophageal squamous carcinoma cells (ESCC). The gradient‐weighted class activation mapping and prediction score analysis reveal that the computer's vision focuses on the nuclear sizes of ESCC. The hypothesized phenotype is verified in HAT1‐knockdown ESCCs. Finally, HAT1 regulating cell senescence by affecting the H3K27 acetylation and E2F transcription factor 7 (E2F7) expression is shown. Herein, the feasibility and benefits of applying histopathological deep learning assistance systems in routine practice scenarios and connecting phenotype and genotype for further genetic research are suggested.

Histopathology is a critical approach for diagnostic tasks and precision treatment. However, histopathological deep learning tools for auto-identification remain poorly developed. Meanwhile, the interpretation of the computer vision attention into a cellular process is less efficient in a systematic way. Herein, it is identified that histone acetyltransferase 1 (HAT1) is an aging-associated gene in the esophagus epithelium by machine learning. An interpretable deep learning model is developed to distinguish morphological changes with varied HAT1 expressions in esophageal squamous carcinoma cells (ESCC). The gradientweighted class activation mapping and prediction score analysis reveal that the computer's vision focuses on the nuclear sizes of ESCC. The hypothesized phenotype is verified in HAT1-knockdown ESCCs. Finally, HAT1 regulating cell senescence by affecting the H3K27 acetylation and E2F transcription factor 7 (E2F7) expression is shown. Herein, the feasibility and benefits of applying histopathological deep learning assistance systems in routine practice scenarios and connecting phenotype and genotype for further genetic research are suggested.
or immunological checkpoints. [8] In addition, the subjective interpretation of histopathologic features is time consuming and suffers from large variations. There is an urgent need to integrate histopathologic data, such as H&E images with a deep learning approach.
Convolutional neural networks (CNN) are technologies that are extremely useful for a variety of computer vision problems. On a wide range of medical images, including X-rays, magnetic resonance imaging (MRI) scans, computed tomography (CT) scans, and tissue slides, CNN have been utilized for pathological region segmentation and disease categorization. [9] In medical picture segmentation tasks, U-net is commonly employed, and the densenet is a popular choice for image-classification jobs since it requires fewer parameters and allows for feature reuse. [10] However, how the computer vision is generated, and the classification criteria remain hidden, making it untrustworthy for clinical use and preventing researchers from gaining useful insights from CNN. Currently, the most commonly used post hoc interpretation method is by presenting example regions with pixel patterns that show the importance of a region for prediction. [11] These methods can only reveal "where" a network is looking to make predictions rather than telling "what" identities are important. The information interpreted from this method is considered to be fragile. Thus, understanding how CNN works is critical for basic medical translation and clinical application. [12] Therefore, experimental validation is crucial to verify the hypothesis. Recently, one study showed that the changes in nuclear morphology during cellular senescence could be recognized as reliable features for deep learning. [13] Esophageal cancer is one of the most lethal malignancies in the world. [14] The two most common subtypes of esophageal cancer are squamous cell carcinoma and adenocarcinoma, which can be distinguished by marker gene expression, genomic mutations, and histopathology. Esophageal squamous-cell carcinoma (ESCC) accounts for over 90% of all esophageal cancer occurrences worldwide. Esophageal squamous dysplasia is a precursor lesion for squamous cell carcinoma, whereas Barrett's esophagus is a precursor lesion for adenocarcinoma. [15] In this study, we identified histone acetyltransferase 1 (HAT1) as an aging marker in the human esophagus and a morphological regulator of ESCC by deep learning models. Our framework successfully predicts HAT1 expression in esophageal tumor tissues based on H&E images. The model attention visualization and prediction probability score analysis suggest that CNN models use nuclear sizes of esophageal tumor cells as human interpretable tumor features, which is verified by wet lab experiments.

HAT1 Is an Aging-Related Gene in both Normal and Cancerous Esophageal Tissue
The incidence of esophageal cancer increases with age. [16] The aging-related genes may function as a tissue development controller or programed cell death moderator. However, the hub genes associated with aging in esophageal epithelial cells over a lifetime have received little attention. [17] Meanwhile, the aging-related gene expression regulation is crucial for mammalian development and homeostasis, where the abnormal upregulation of the normally silenced gene in aging may promote carcinogenesis. Thus, we sought to identify age-related gene signatures by mining transcriptomics data from normal esophageal tissue and esophageal cancer. Raw expression data accompanied with clinical information from The Cancer Genome Atlas (TCGA) repository in FPKM format normalized by up quantile and log transformation. The patients with either missing clinical information or transcriptomics were excluded. LASSO regression analysis was applied to establish an agerelated model using the expression profile of all the genes in the transcriptomics (n = 160). Age was defined as the days from birth to diagnosis. A 19-gene signature (Table S3, Supporting Information) was identified based on the optimal value of λ ( Figure 1A,B). The prediction accuracy was measured by paired Wilcoxon test and receiver operating characteristic (ROC) curve. No statistical difference was found between prediction results and real diagnostic age (paired Wilcoxon test, p value > 0.9). Among the 19-gene signature, the HAT1 showed the strongest spearman correlation with the diagnostic age ( Figure S1, Supporting Information). This signature can successfully distribute the patients into the HAT1 high group and HAT1 low group based on the mean HAT1 expression value ( Figure 1C, AUC = 0.8817). The age correlation of each gene among the 19-gene signature identified above was assessed by the Pearson correlation test, and the data distribution was assumed to be normally distributed. The HAT1 expression was significantly negatively correlated with the age of patients with esophageal cancer ( Figure 1D, cor = À0.4677, p value<0.05).
HAT1 has been found to be upregulated in various cancers and is associated with poor prognosis. [23] However, the role of HAT1 in esophageal cancer remains to be elucidated, and its age-relevance expression is unclear in normal esophageal tissue development. Therefore, we used normal esophageal transcriptomics for exploration from GTEx, which is an ongoing project storing the transcriptomics from a wide range of normal organs and tissues from healthy human. [18] Notably, the HAT1 expression in normal esophageal tissue was gradually downregulated with the increase of age ( Figure 1E). PCA analysis revealed that the patients were separated into two groups coincident with HAT1 expression levels, indicating the distinct transcriptomics between HAT1 high and low groups. Thus, HAT1 could be considered as a biomarker to classify esophageal cancer ( Figure 1F).
To further study the HAT1 expression in mammalian esophagus during aging, we collected esophagi from 1, 10, 20, 6, and 18 months old mice. Immunohistochemistry analysis revealed that HAT1 is highly expressed in the basal layer of esophagus squamous epithelium in young mice but lost in the aged mice ( Figure 1G). p63 plays a critical role in the development of normal esophagus, [19] and its expression was declined in the aged esophagus squamous epithelium ( Figure 1G). Furthermore, immunofluorescence staining showed that HAT1 is colocalized with p63 in the esophagus squamous epithelial cells, which indicates the potential role of HAT1 in esophageal epithelium renewal and development ( Figure 1H). In addition, we found that KRT5, a basal cell marker, is still expressed in the aged esophageal epithelium ( Figure 1H). Together, the above data suggest that HAT1 expression is negatively correlated with age or development state in both normal and cancerous esophageal epithelium. www.advancedsciencenews.com www.advintellsyst.com

HAT1 Upregulation is a Poor Prognosis Factor for Patients with Esophageal Cancer
Two separate microarray profiling data sets were merged and compared to investigate the expression level of HAT1 in esophageal cancerous tissue versus normal tissue. In esophageal squamous carcinoma, HAT1 is significantly upregulated (Figure 2A, Wilcoxon, p = 4.8e-12). Patients (n = 23) in the Second Affiliated Hospital of Zhejiang University (SAHZU) cohorts have differential HAT1 expression levels in the cancerous region ( Figure 2B) so as in the tumor micro-array. Importantly, HAT1 expression is significantly related to tumor pathological grades in tissue microarray cohorts ( Figure 2C,D, Kruskal-Wallis rank sum test, p = 0.005345). And the upregulation of HAT1 expression shows a poorer prognosis-free survival rate, though no statistical significance was obtained ( Figure 2E). Next, the patients in the TCGA cohort were stratified into a HAT1 high group (n = 80) or a HAT1 low group (n = 80) according to the median expression value. To elucidate the biological functions and pathways that are associated with the HAT1 expression, the differentially expressed genes (DEGs) ( Figure 2F,P. adjust < 0.05, FC > 1.5) between the HAT1 high-expression and low-expression groups were used to perform GO enrichment analyses. DEGs enrichment analysis revealed ontologies related to epithelium morphogenesis and DNA metabolic process ( Figure 2G. adj.P < 0.05). Meanwhile, the HAT1 expression is negatively correlated with CD4 þ effector memory T cell and positively correlated with CD4 þ Th2 T cell in ESCC ( Figure 2H), which suggests that HAT1 may function as an immune suppressor. The ssGSEA enrichment analysis indicated that HAT1 expression is significantly related to DNA replication, G2M checkpoint, collagen formation, and negatively related to the p53 pathway ( Figure 2I). Therefore, our analysis suggests that the upregulation of HAT1 is a poor prognosis factor for the prognosis and immunotherapy of esophageal cancer.

Deep Learning Segments Cancerous Regions on Whole-Slide Images
Since HAT1 is related to patient prognosis and treatment, the prediction of HAT1 expression is crucial. Meanwhile, the gene ontology analysis indicated the potential morphological differences caused by HAT1. Therefore, we wanted to employ deep learning for HAT1 expression prediction. The transcriptomics data sets were accompanied by whole-slide images (WSI) in the TCGA cohort, which motivated us to explore whether histopathology of esophageal cancer is correlated with HAT1 expression level. The WSI is composed of cancerous tissue and noncancerous tissue. Both cancerous and normal tissues exhibit HAT1 expression at transcript level ( Figure 2A) The bulk-seq value denoting the HAT1 expression cannot be separated between cancerous or normal region. Moreover, the tissue regions presenting on H&E slides might be different from the tissues used for sequencing. However, at least, the cancerous region is the major contributor to the oriented HAT1 expression value. Therefore, we first sought to develop a segmented tool to quantify and qualify pixel standing for cancerous tissue regions of interest. Then, we would test how the classification model would deal with these noncancerous patches. The TCGA WSI was sliced into 1024 Â 1024 pixel tiles. The processing speed can be increased by slicing WSI into smaller patches. The color was normalized to remove the bias caused by the difference in H&E staining ( Figure 3A). The blank tiles were discarded.
Pathologists manually labeled the cancerous regions on the tiles ( Figure 3B). A U-net-based segmentation model with five times down-sampling and five times up-sampling was created ( Figure 3C). The model was trained until the loss was at a steady rate and the segmented region matched with the visual inspection. It was clearly shown that muscular and connective tissues were effectively excluded from the esophageal squamous carcinoma region identified in red ( Figure 3D). The mean value of the patches with a red pixel ratio was larger than 80% for all the patches generated ( Figure 3E), which ensured the success in the further classification task.

Deep Learning Indicates That HAT1 Instigates Esophageal Cancer Cell Morphology
Due to the fact that the esophageal adenocarcinoma and the esophageal squamous carcinoma have different morphological identities, the models for HAT1 expression classification were separately trained for esophageal adenocarcinoma and esophageal squamous carcinoma. The 1024 Â 1024 slide patch was divided and adjusted into 128 Â 128 pixel tiles. Because the slice is smaller, the algorithm is forced to focus on cell morphological differences rather than tissue structure morphological differences. The tiles are distributed into the HAT1 high group and HAT1 low group based on patient transcriptomics. Then, those tiles were used for supervised deep learning with Densenet121 architecture ( Figure 3F). 70% of the tiles were used for training and 30% of the tiles were used for testing. The model for HAT1 expression classification in esophageal squamous carcinoma was successfully obtained ( Figure 3G). The model was trained for a total of 150 epochs. The model was respectively trained five times with different randomized selection of training cohort and testing cohort. After each training, the loss was at a steady rate and the test accuracy was higher than 90%, whereas the validation cohort accuracy was higher than 80% ( Figure 3H). After that, all tiles were retrospectively validated with the model with best accuracy ( Figure 4A, AUC = 0.98). Meanwhile, we visualized the HAT1 expression changes across whole images. Briefly, the color was indicting the pseudo-HAT1 expression generated by the probability of belonging to the HAT1 high group minus the probability of belonging to the HAT1 low group ( Figure 4B).
Next, we validated the compatibility and accuracy of the above method with the TMA (tumor micro-array) images obtained by two different microscopes ( Figure 4C,D). The predicted HAT1 expression classification probabilities in each tile from a single common slide were aggregated to create the heatmap, which was visualized by the tiles' raw position on the slide. We found that our model can directly visualize the HAT1 expression on different pathological regions with morphological information ( Figure S2, Supporting Information). Taken together, the model prediction was consistent with the immunohistochemistry (IHC) staining results.   The successful training indicated that there lay morphological differences between HAT1 high group and HAT1 low group tiles. However, whether those differences can be interpreted as human understandable features still need to be explored. Therefore, we were wondering where the computer attention is and what identities the neuron network is looking for. The gradientweighted class activation mapping method was used to determine which histopathological features were most important in our algorithm to identify the HAT1 expression level. By examining the weight heatmap on various types of patches, it is possible to  www.advancedsciencenews.com www.advintellsyst.com conclude that nucleus attention has risen ( Figure 4E). However, the Grad-CAM can only indicate the computer vision attention points but cannot tell what the human understandable difference is. The LIME indicated that both the nucleus and the extracellular matrix region were labeled to be related to the prediction. Nagarajan P had already observed the tissue organization-level www.advancedsciencenews.com www.advintellsyst.com difference on HAT1-knockdown mice. [20] Therefore, the study of the nucleus changes falls into our current study scope. Nevertheless, the probability of indications may provide clues for morphological difference interpretation. Therefore, we randomly sampled 500 tiles from the HAT1 high group and 800 tiles from the HAT1 low group for exploration ( Figure 4F). In both groups, the false-positive prediction and true predictions were presented. By analyzing the prediction probability against the image changes, especially the nuclei, we found that the true prediction of the HAT1 high group is characterized by clear and smaller nuclear morphology, and the true prediction of the HAT1 low group is characterized by vague and larger nuclei. The tSNE reduction of the identities for each tile demonstrates Densenet's ability to differentiate different types of tiles with varying levels of the cancerous region and other tissue types ( Figure 4G). And the wrong predicted tiles are often the tiles mixed with high percentage of muscles or noncancerous tissues.

HAT1 Regulates Nuclear Size in Esophageal Cancer Cells
To test whether HAT1 expression level is correlated to nuclear size, we knocked down HAT1 in an esophageal cancer cell line KYSE30 using lentivirus shRNA. Immunoblot and immunofluorescence were used to validate that HAT1 protein levels in HAT1-knockdown KYSE30 cells ( Figure 5A). The nuclear area of KYSE30 cells grown on the slip was measured with ImageJ according to the DAPI staining. As predicted, the HAT1-knockdown KYSE30 cells displayed significantly larger nuclei than control cells ( Figure 5B). Furthermore, we generated a 3D tumor sphere system to mimic in vivo scenarios. [21] The tumor sphere formed after two days in a 3D culture assay ( Figure 5C). H&E staining of HAT1-knockdown KYSE30 tumor spheres revealed that the cells in the HAT1-knockdown tumor sphere have a larger nuclear size ( Figure 5D), so as examining under the confocal microscopy ( Figure 5E), which is consistent with our deep learning findings and a previous study. [22] Thus, the HAT1-mediated nuclear size difference was validated with experimental approaches in esophageal cancer cells.

HAT1 Regulates Cell Senescence and H3K27 Acetylation
Cell senescence is widely recognized as a fundamental process in aging. [23] Importantly, the enlarged nuclei in HAT1-knockdown esophageal cancer cells prompted us to explore the relationship between HAT1 and cell senescence. Therefore, we performed an RNA-seq analysis using 3 different shRNAs targeting HAT1 gene ( Figure 6A-C). We used the cellular senescence markers (ID: hsa04218) obtained from the KEGG database to perform the gene set enrichment analysis. As predicted, the cell senescence ontology displayed a significant correlation with reduced HAT1 expression ( Figure 6D). Consistently, P21, a senescence marker, was found to be upregulated in HAT1-knockdown www.advancedsciencenews.com www.advintellsyst.com KYSE30 cells ( Figure 6C). The senescence cells are characterized by cell cycle arrest, senescence-related secretory phenotype, and dysregulation of metabolism. P21 can inhibit the activation of the E2F family leading to irreversible cell cycle arrest. [24] Intriguingly, we found that E2F7 was downregulated in HAT1-knockdown KYSE30 cells ( Figure 6C, adj.p < 0.05, FC > 1.5). Quantitative reverse transcription polymerase chain reaction (qRT-PCR) was used to confirm the differential expression of E2F7 ( Figure 6E, welch t-test, p < 0.05). In addition, re-expression of a shRNA-resistant HAT1 rescued E2F7 expression in HAT1-knockdown KYSE30 cells ( Figure S3A, S3B, Supporting Information). Interestingly, the motif enrichment analysis identified that E2F transcription factors prefer to interact with HAT1 ( Figure 6F). Next, we used flow cytometry and clone formation assay to validate the impact of HAT1 on the cell cycle and cell proliferation. The flow cytometry analysis revealed that HAT1-knockdown KYSE-30 cells were arrested at the G2/M phase ( Figure 6G). Furthermore, the clone formation assay showed the reduced proliferation in KYSE30-knockdown cells ( Figure 6H). In addition, the gene set enrichment analysis (GSEA) assay was used to investigate the functional changes in the HAT1-knockdown KYSE-30 cells. We found that the cell cycle pathway was enriched ( Figure 6I,J). Immunoblots confirmed the differential expression of CyclinB1 in HAT1-knockdown cells ( Figure 6K). HAT1 was identified as a histone acetyltransferase. [25] Consistent with the previous study, [23] we found that the histone H3 protein levels were downregulated in HAT1-knockdown KYSE30 cells ( Figure 6K). Since which residue of Histone H3 was modified by HAT1 is unclear, we analyzed a public ChIP-seq data set to elucidate the mechanism of HAT1-mediated acetylation changes on histones. The data set includes HAT1 ChIP-seq, H4K5ac ChIP-seq, and H4K12ac ChIP-seq in the control sample and HAT1-knockdown sample. The colocalization analysis of HAT1 and H4K5ac or H4K12ac fails to get any significant enrichment ( Figure 6L). The acetylation level on histone H3 in the human esophageal squamous epithelium was analyzed with data from ENCODE project (H3K18ac: ENCSR834WQD, H3K27ac: ENCSR318ANI, H3K9ac: ENCSR900TZD). The peaks were visualized with an integrative genomics viewer. The results indicated the potential possibility of H3K9 and H3K27 acetylation on the E2F7 promoter ( Figure 6M). Therefore, we analyzed the H3K27ac, H3K18ac, and H3K9ac levels in the control and HAT1-knockdown KYSE30 cells by immunoblots. We found that only the H3K27ac levels were reduced in HAT1-knockdown cells when normalized with total histone H3 proteins ( Figure 6N). Thus, these results indicate that HAT1 might regulate E2F7 expression through promoting the acetylation of H3K27 at the promoter region of E2F7.

Discussion
In our study, we used CNN models to quantify cancerous regions and predict HAT1 expression in esophageal cancer based on H&E images. Although pathological information derived from H&E staining slides provide strong evidence for ESCC grading, it cannot link morphological feature with genetic information. Using deep learning methods, our study revealed that HAT1 functions in regulating esophageal squamous cancer cell shape. The accuracy predicated by CNN model was measured by both matchiness testing and pseudo-gene expression matchiness with IHC results. The CNN model attention was visualized. Human interpretable tumor features on nucleus were assumed. The CNN model attention visualization and prediction probability score analysis suggest that CNN models use nuclear size as human interpretable tumor features. Finally, the nuclear difference was validated with transcriptomics analysis and 3D tumor spheres by knocking down HAT1 in esophageal cancer cells. Thus, HAT1 gene was identified as an aging marker in human esophagus and a morphological regulator of esophageal squamous cell carcinoma. To the best of our knowledge, this is the first study that successfully validates the CNN model-oriented hypothesis by using biological evidence. We translate the clinicalobtained medical images and transcriptomics into potential scientific advances. The biomarkers we found may help with the clinical usage. The CNN model is capable of easily connecting phenotype and genotype. The methodology developed by us may pave the way for further genetic research.
In this research, we present classic but robust segment and classification models to estimate HAT1 expression from H&E-stained slides, which is similar to previous medical CNN models.
It has been appreciated that CNN was used as a diagnostic assistant tool, [26] with the goal of increasing diagnostic accuracy while decreasing the doctor's workload. [27] However, the biological insights inspired by deep learning results are still limited. Previous studies have left the CNN model classification detail unexplained. The unexplained model also made entry into the clinical standardized operation procedure impossible. Commonly, the recognized CNN model with high interpretability was tested by comparing the human understanding of model decisions with prediction results and the attention of the model. [28] However, the model interpretation correctness is assessed based on human understanding of the disease. Meanwhile, there is no comprehensive way to describe the general logic of the CNN model in an accurate way. The interpretation can only reach local fidelity, which means the crucial features at global level are not necessarily important for prediction in local patches or representative situations, and vice versa. Though the CNN model cannot currently be fully explained, computer vision attention can be explained in part using grade weight tools and LIME. The Grad-CAM algorithm was designed to highlight the details that were critical to the prediction results. The Grad-CAM is a significant step forward in improving the interpretability of the CNN model. [29] Meanwhile, the LIME works by disturbing the input images with super pixel masking and identifies the features that have the greatest impact on the prediction results. Here, the LIME should have better ability in analysis images with repetitive pattern, whereas the Grad-CAM tends to highlight a wide range of pixels. Furthermore, the inside logic of the CNN can be partially understood except the correct predicted results and analysis of the wrongly predicted results.
The interpretation of Grad-CAM and LIME results, on the other hand, is generally based on visual inspection. As a result, the hypothesized results must be validated through biological experiments. Here, we hypothesized that cells with lower HAT1 expression would have larger nuclear sizes based on a www.advancedsciencenews.com www.advintellsyst.com visual inspection of the Grad-CAM generated heatmap and prediction scores. The larger nuclei were observed in the HAT1-knockdown tumor spheres, and the cellular mechanism responsible for the larger nuclei is likely due to the cell senescence. TCGA samples have validated histological and molecular labels identified by many experts and clinicians. The TMA test set labels were also a grand truth validated by IHC. However, as tile labels were assigned according to their corresponding slide labels, within-slide heterogeneity would lead to divergence of label and ground truth. Therefore, the tile's label may not match its true status. The performance can be further improved if more detailed annotations exist on the slides. The bulk RNA-seq usually uses mixed tissues with cancerous tissue and adjacent normal tissue, which may affect the gene expression interpretation, but the classification integrity is preserved. From the visualization results, we noticed that our models were more likely to give nontumor tissue tiles ambiguous prediction scores. With the development of the spatial profiling, the spatial transcriptomics assigned with pathological region of interest images may facilitate as an important mean for relevant CNN model development.
The CNN have been used in various classification and segmentation tasks and exhibited superior performance. However, the interpretability is always a great obstacle for CNN development. Currently, understanding the chaotic convlayers with graphical or symbolic logic representations is emerging to interpret the CNN. Moreover, based on the natural language processing, the DALL·E 2 can create images from a text description. Therefore, combining the natural language processing and the histological image analysis might further advance the study on cancer pathology.
Our study indicates that the nuclear size is a promising marker that can be used for deep learning model training. Apart from our results, deep learning has been used to study cell senescence-related nuclear size changes in fibroblasts. [13] They found that the deep senescent human fibroblasts displayed diminished DNA damage foci, but the checkpoint capacity to oxidative stress was retained. However, in esophageal cancer cell, it seems that the downregulation of senescent-related markers will lead to the increase of DNA damage and break the checkpoint capacity. [30] Excluding irrelevant nontumor tissue, such as muscular tissues, may significantly enhance the overall performance of CNN models related to genomic mutation classification, since the noncancerous tissue does not contribute to the mutation status. However, the single gene expression can be contributed by multiple factors including normal tissue at the bulk-seq level. Therefore, by integrating the single cell transcriptomics from the same biopsy or region to develop a linear or nonlinear relationship of gene expression and cancerous tissue ratio may help with our model accuracy.
In summary, we developed a deep learning-based system to gene expression status by integrating transcriptomics data and H&E stained histopathological images. Our analysis using machine learning and biochemical studies identified HAT1 as an aging-associated gene and cell senescence regulator. We showed that knocking down HAT1 expression causes cell cycle arrest, nuclear morphological change, and cell senescence. Our study successfully validates the CNN model-oriented hypothesis by using biological evidence. Our workflows suggest the feasibility and benefits of using histopathological artificial intelligence assistance systems in routine practice scenarios and biological study.

Experimental Section
Patient and Mouse Esophagus Samples: The acquisition and analysis of patient samples were authorized by The Second Affiliated Hospital, School of Medicine, Zhejiang University (SAHZU). All 23 participating patients provided written informed consent for tumor collection and analysis. ESCC-diagnosed patients in the Department of Thoracic were queried from the clinical information database. A retrospective examination of their available clinical data, including patient age, clinical stage, treatment modalities, and patient responses. The immunotherapy-treated patients were excluded from the study. For pathological validation, corresponding CT and endoscopic pictures were provided. All tumor samples were collected via surgical excision and embedded in paraffin. The preparation of tissue slides was performed as stated below. C57/BL6 mice were housed in specific pathogen-free conditions with a 12 h light/dark cycle and free access to food and water at the Zhejiang University Laboratory Animal Center. The esophagi were dissected from mice at ages 1, 10, 20 days, 6, and 18 months. The mouse esophagi were rinsed with PBS, fixed with 4% paraformaldehyde (PFA) at 4°C for 8 h, and then embedded in paraffin. The paraffin section was sliced to a thickness of 5 μm. All mouse experiments were approved by the Institutional Animal Care and Use Committee of Zhejiang University.
Tissue Microarray Information and Possession: The tissue microarray of esophageal cancer was purchased from Kede Biotech Co., Ltd (Guilin, China). The Kede Biotech Co., Ltd declared the tumor collection and microarray assembly to be ethically sound. On the microarray, 5 normal samples and 97 tumor samples from patients with ESCC were displayed. Each patient's pathological stage and TNM categorization were provided by Kede Biotech Co., Ltd.
Public Database Resource Acquisition and Age-Related Gene Selection: The RNA-seq data are in Fragments Per Kilobase of exon model per Million mapped fragments (FKPM) format and correspond to the whole tissue slide (WSI). To identify genes associated with aging, the least absolute shrinkage and selection operator (LASSO) were used. [31] The number of days between a patient's birth and the diagnosis was described as their occurrence age. Log transformation was applied to the gene expression value (log2(FKPM þ 1)). Using the glmnet software, LASSO regression was carried out. All genes chosen by LASSO regression were tested for differential expression on two external RNA-sequencing data sets derived from ESCC patients (GSE45670, GSE20347). Normalized and preprocessed data received from the GEO database (http://www.ncbi.nih.gov/geo). To normalize data sets, preprocessCore packages were utilized. The batch effect was eliminated using the limma package's removeBatchEffect function. A rank-sum test was used to evaluate the statistical difference. The correlations were evaluated using Pearson's correlation coefficient. A p-value below 0.05 was regarded as statistically significant. Unless otherwise specified, all statistical analyses were performed using R software. Images of esophageal cancer patients were obtained from the cancer genome atlas (TCGA) official website (https://portal.gdc.cancer.gov/) using its official downloading software GDC-client (https://gdc.cancer.gov/access-data/gdc-data).
Segmentation Model Construction: The segmentation model is modified from U-net. The standard U-net is a state-of-art biomedical image segmentation convolutional network. Its distinctive U-shaped structure and skip connection make it appropriate for medical picture analysis, as its structure is generally stable and the majority of its properties contribute to prediction. Based on these benefits, their work motivated us to perform more down-sampling and up-sampling, which may enable the network to be more efficient and capture more features. In our model, we performed both down-sampling and up-sampling five times; other parameters are determined according to U-net. [32] The tissue slides were meticulously labeled and independently evaluated by two clinical specialists. The www.advancedsciencenews.com www.advintellsyst.com architecture was constructed utilizing the Torch backbone and Python 3.6. The model was trained at the High-Performance Computing Facility at the Zhejiang University Life Sciences Institute. Classification Model Implementation: Densenet121 is a convolutional network architecture, which requires fewer parameters and allows feature reuse. [33] In tissue slide analysis, the parameters and features may be limited compared with natural photos. Therefore, we implanted this compact model for tissue slide patch classification. The architecture was built based on the TensorFlow backbone and python 3.6. The model was trained at High Performance Computing Facility at the Zhejiang University Life Sciences Institute. Densenet121's attention to representative slides was visualized with Grad-CAM. [34] Briefly, this method uses the gradients of any target concept, flowing into the final convolutional layer to generate a heat map highlighting important regions in the image contributing to the prediction. To assess the weight of the feature in our trained model, we employed the Local Interpretable Model-agnostic Explanations (LIME). [35] Basically, the histological features in the image are generally composed of one or several consecutive pixel blocks. The so-called super pixel refers to an irregular pixel block with certain visual significance composed of certain histological features. LIME randomly retains some super pixels, hides the other super pixels, and finds several super pixels that have the greatest impact on the classification results.
Cell Culture and ShRNA-Mediated HAT1 Knockdown: KYSE-30, a human ESCC cell line, was obtained from MeissenCTCC Co., Ltd (Hangzhou, China) and cultured in the RPMI 1640 medium mixed with 10% fetal bovine serum (Gibco) and 1% penicillin/streptomycin. The cultures were kept at 37°C with 5% CO2. For 3D cell culture, Tapered Stencil for Cluster Culture (TASCL, Cymss-bio Co., Ltd, Aichi, Japan) device was employed to create an esophageal tumor sphere in vitro as instructed. [36] Oligos encoding shRNA targeting HAT1 were inserted into a GFP-tagged pLKO.5 lentiviral vector. Sequencing was used to validate all plasmids. Table S1, Supporting Information, contains shRNA sequences.
Immunofluorescence Staining and Immunohistochemistry: The KYSE-30 cells were grown on sterile glass slips overnight for immunofluorescence staining. The slips were cleaned with PBS and fixed with 4% PFA. The cells were permeabilized with 0.3% Triton-X100 and blocked with 1% BSA in PBST. A citrate-based solution was used for antigen retrieval. Table S2, Supporting Information, lists the primary and secondary antibodies used for staining, as well as their dilutions.
SDS-PAGE and Immunoblot Analysis: Sodium dodecyl sulfate-poly-acrylamide gel electrophoresis (SDS-PAGE) was used to separate an equal amount of protein from each well. After blocking in 5% milk or 3% BSA, the membrane was incubated with the primary antibody overnight at 4°C. After washing, the membrane was incubated for 2 h at room temperature with the HRP-conjugated secondary antibody. Table S2, Supporting Information, lists the primary and secondary antibodies used in immunoblot, as well as their dilutions.
Microscopy Imaging and Whole Tissue Slides Screening: A tissue segment pathological scanner at the Zhejiang University Life Sciences Institute Core facility screened the H&E stained tissue slides. The fluorescent images of the cells were taken using an STED confocal microscope. Nikon microscope was used to capture the tissue immunofluorescent pictures (Nikon, Japan). ImageJ was used to process the image.
Colony-Formation Assay: The KYSE-30 cells were infected with lentiviral shRNAs targeting HAT1. Following two weeks of cell culture, the surviving colonies were fixed with methanol for 15 min and stained with 0.1% crystal violet for 20 min. ImageJ was used to process the image.
Flow Cytometric Analysis for Cell Cycle: KYSE-30 cells were infected with lentivirus shRNA and analyzed 5 days later after infection. The successful HAT1 knockdown in KYSE30 cells was verified with immunoblot. EdU incorporation for cell proliferation assay was performed using an EdU kit (BeyoClick EdU Cell Proliferation Kit with Alexa Fluor 488, Beyotime, China). Briefly, cells were incubated with EdU for 3 h, fixed with 4% PFA for 15 min, and permeabilized with 0.3% Triton X-100 for 5 min. The cells were incubated with the Click Reaction Mixture for 30 min at room temperature in a dark place and then incubated with DAPI for 10 min. Cells were quantified by a flow cytometer (Beckman CytoFlex S).
RNA Extraction and Quantitative RT-PCR: Total RNA was isolated from cultured cells using the TRIzol Reagent (Invitrogen, USA) according to the manufacturer's protocol. Two micrograms of total RNA were reverse transcribed using the Reverse Transcript Kit (Vazyme, China). Quantitative PCR was performed using SYBR Green 2X PCR Master Mix (Vazyme, China) on a Bio-Rad quantitative polymerase chain reaction (qPCR) system (Bio-Rad, USA). Ct values for target gene threshold cycles were normalized to glyceraldehyde-3-phosphate dehydrogenase (GAPDH) as an endogenous control. The primer sequences are listed in Table S1, Supporting Information.
RNA-Sequencing and Data Analysis: FastQC-0.11.8 was used to assess the quality of the raw FASTQ sequence files. Trimmomatic-0.38 was used to trim the raw data sets because no predetermined adaptor information was provided. To cut the known Illumina adaptors, the ILLUMINACLIP function was used. Low-quality reads were filtered out using SLIDINGWINDOW. The SLIDINGWINDOW option was used to examine the sequence of each five bases. The sub-20 quality base was removed. Another two options were used to remove repeated "N" at the start or end of the sequence. The hisat2 program was used to align the reads to the reference genome exon by exon. Then, we were able to convert and sort the SAM generated by the hisat2 file to a BAM file using samtools. Finally, the counts were output by htseq. To normalize the expression and identify differentially expressed genes, the DESeq2 (fold change of >2, Padj of 0.05) was used. For visualization, clusterprofiler [37] and ggplot2 were used. GSEA was carried out with the help of GSEA 2-2.2.4 from the Broad Institute. [37] All processes were completed using CentOS 7 on the Zhejiang University Life Sciences Institute High Performance Computing Facility and R studio, unless otherwise stated.
ChIP-seq Analysis: The ChIP-seq data were obtained from Gruber et al. (GSE117472). [38] ChIP-Seq analysis was performed as previously described. [39] MACS2 compared all peaks in each file to the Input file to identify the true peaks, eliminating false positives caused by random antibody binding. The overlapped peaks were identified using bedtools-2.28.0. The data were then visualized using the R package ChIPseeker. The ChIPseeker package's annotatePeaks program annotated the closest genes to peaks as well as the location of peaks relative to genes. The annotations for hg39/mm10 were obtained from UCSC by the "GenomicFeatures" package. The Homer findMotifsGenome program was used to analyze all narrow peak files. All possible motifs and transcription factors were found in the union set of homer de novo motif results and known motif results. All possible false positives were eliminated.
Statistics and Reproducibility: No statistical method was used to predetermine the sample size. All comparisons between groups of samples were stated in each experiment accordingly. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.