Biomarkers of connective tissue disease‐associated interstitial lung disease in bronchoalveolar lavage fluid: A label‐free mass spectrometry‐based relative quantification study

Abstract Background The pathogenesis of connective tissue disease‐associated interstitial lung disease (CTD‐ILD) is unclear. This study aims to identify differentially expressed proteins (DEPs) in CTD‐ILD to determine the potential role of these DEPs that may play in the pathogenesis of CTD‐ILD and to offer potential therapeutic targets. Methods Bronchoalveolar lavage fluid (BALF) samples were collected from four patients with CTD‐ILD and four patients without CTD‐ILD. Label‐free mass spectrometry‐based relative quantification was used to identify the DEPs. Bioinformatics were used to determine the potential biological processes and signaling pathways associated with these DEPs. Results We found 65 upregulated DEPs including SFTPD, CADM1, ACSL4, TSTD1, CD163, LUM, SIGLEC1, CPB2, TGFBI and HGD, and 67 downregulated DEPs including SGSH, WIPF1, SIL1, RAB20, OAS3, GMPR2, PLBD1, DNAJC3, RNASET2 and OAS2. The results of GO functional annotation for the DEPs showed that the DEPS were mainly enriched in the binding, cellular anatomical entity, cellular processes, and biological regulation GO terms. The results of KEGG analyses showed that the pathways most annotated with the DEPs were complement and coagulation cascades, metabolic pathways, pathways in cancer, and PPAR signaling pathway. COG analyses further informed the functions associated with these DEPs, with most focused on signal transduction mechanisms; posttranslational modification, protein turnover, chaperones; intracellular trafficking, secretion, and vesicular transport; amino acid transport and metabolism; and lipid transport and metabolism. Conclusions DEPs identified between patients with vs. without CTD‐ILD may play important roles in the development of CTD‐ILD and are potential new biomarkers for early diagnosis of CTD‐ILD.


| INTRODUC TI ON
Interstitial lung disease (ILD) is the most common complication associated with connective tissue diseases (CTDs), and most of these patients have poor outcomes. Nearly 15% of patients with CTD have secondary ILD, and approximately 15% of patients with ILD have potential CTD. 1 The most common CTDs involving the lung and associated with ILD include scleroderma, mixed CTD, systemic lupus erythematosus, dermatomyositis/polymyositis, Sjögren's syndrome, and rheumatoid arthritis. The median survival time for patients with CTD-associated ILD (CTD-ILD) is approximately 6.5 years. The number of individuals with CTD who die of ILD is 123.6 per 1000 person-years. 2,3 CTD-ILD is usually characterized by occult progressive dyspnea and intermittent cough, with symptoms often obscured by extrapulmonary manifestations such as arthritis or muscle weakness. Without effective treatment, patients will experience respiratory failure.
The etiology of CTD-ILD is unclear. At present, environmental pathogens are hypothesized to be a cause of pathogenic inflammation. Environmental pathogens may cause inflammatory cells to invade interstitial and alveolar spaces, eventually leading to damage of the alveolar epithelial. The extent of eventual recovery of the lung structure and function is likely determined by two factors: the intensity of the process and the level of disruption of the normal lung extracellular matrix, especially the layers that define the alveolar structure. 4,5 Another hypothesis for the pathogenic inflammation is that some CTD subtypes occur with lung injury, causing local inflammation and inducing autoantigen expression, which leads to the production of autoantibodies in the lung. This may continue through a subsequent combination of disease-related autoantibodies and antoantigens, resulting in fibrosis and lung inflammation. 6 Therefore, it is urgent to determine the underlying pathological mechanisms and find new therapeutic targets for CTD-ILD.
Next-generation transcriptome sequencing and highly sensitive mass spectrometry (MS) instrumentation have been advancing in recent years, improving both genomic and proteomic technologies. Therefore, studies are increasingly being conducted to reevaluate correlations between disease states and gene or protein expression using recently generated data sets. In proteomics, evaluating a mixture of proteins using liquid chromatography-mass spectrometry/ mass spectrometry (LC-MS/MS), sometimes called shotgun proteomics, is still the first choice for large-scale protein identification.
For protein quantification, a relative quantitative strategy based on label-free MS is becoming an increasingly favored alternative to the label-based method. 7 The two major approaches for the relative quantification of proteins using label-free MS are data-independent acquisition strategies and data-dependent acquisition experiments.
Bronchoalveolar lavage fluid (BALF) was extracted from the lungs with a bronchoscope. The biochemical components of BALF include mainly proteins and phospholipids, followed by nucleic acids (e.g., mRNA, DNA, and miRNA). These components mirror the pathophysiological state of the patients; therefore, they are considered rich sources of biomarkers, with some biomarkers established for use in clinical applications. 8 In the present study, we used label-free LC-MS/MS analyses to identify the protein composition and their relative levels in BALF specimens obtained from patients with CTD-ILD and control patients with community-acquired pneumonia (CAP). Using bioinformatics analyses, we aimed to identify potential biomarkers of CTD-ILD and the key signaling pathways involved in the occurrence of CTD-ILD.

| Experimental design
The study design is shown in the flowchart in Figure 1.

| Patients and diagnoses
From August to September 2020, BALF samples were collected from eight patients who had not received hormone or immunosuppressant treatment and had been admitted to the Second Affiliated Hospital of Anhui Medical University. The samples were collected from four patients with CTD-ILD (experimental group) after their diagnosis but before hormone or immunosuppressant treatment, and from four patients with -CAP but without immune system disease (control group; samples obtained from the healthy lung side). The clinical diagnosis and some sociodemographic characteristics for each of the eight patients are given in Table 1.
F I G U R E 1 Flowchart of the experimental design. BALF represents bronchoalveolar lavage fluid; CTD-ILD, connective tissue disease-associated interstitial lung disease; COG, Clusters of Orthologous Groups; GO, Gene Ontology; KEGG, Kyoto Encyclopedia of Genes and Genomes; LC, liquid chromatography; MS/MS, tandem mass spectrometry; and SDS-PAGE, sodium dodecyl sulfate-polyacrylamide gel electrophoresis The ILD diagnosis was based on the guidelines described in chapter 92 of Goldman's Cecil Medicine, 24th Edition. 9 The diagnostic criteria for CTD-ILD were based on the recommendations of the American College of Rheumatology and the American Rheumatism Association. [10][11][12][13] In the present study, all patients were examined by respiratory specialists and rheumatologists.
The study was approved by the Medical Ethics Committee of the Second Affiliated Hospital of Anhui Medical University (approval number YX2020-106), and all participants provided written informed consent in a manner consistent with the Declaration of Helsinki.

| BAL procedure and immediate sample processing
Bronchoalveolar lavage (BAL) is a part of a routine diagnostic examination. BAL was performed according to the European Respiratory Society guidelines. 14 In brief, patients were restricted from drinking and fasted for 6 h before the procedure. Midazolam and lidocaine (2%) were used for local upper airway anesthesia.
Bronchofiberoscopy was performed using an Olympus BF-H190 bronchoscope (Olympus). A sterile isotonic sodium chloride solution at 37°C was dripped into the right middle or left lung lingual side in equal portions of 20 ml through a sterile syringe. The total lavage volume was 4 ml/kg of body weight. The fluid was immediately aspirated by gentle suction after each aliquot. If the recovery volume exceeded 50% of the drip volume, it was considered a representative sample. The sample was processed immediately after BAL. The supernatant was absorbed and centrifuged at 3000 × g for 10 min, and stored at −80°C until it was used in the experiments.

| BALF protein extraction
Each BALF sample for analysis was diluted 1:4 with a trichloroacetic acid solution and mixed well. The samples were precipitated at 4°C overnight. The intermixture was centrifuged at 15,000 × g at 4°C for 15 min. The supernatant was discarded. We then added pre-cooled acetone solution to precipitate the samples for 30 min, centrifuged the samples at 15,000 × g at 4°C for 15 min, and removed the supernatant. This step was repeated another time. After the resulting precipitate was dried for 5 min, protein lysate (AP0601-50; Beijing Bangfei Biotechnology Company) was added, and the samples were incubated overnight at 4°C. The protein solution was obtained after being centrifuged at 10,000 × g at 4°C for 3 min.
The total protein in each BALF sample without protein degradation was resolved by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE). Coomassie blue staining showed clear, complete, and uniform protein bands. and centrifuged, and the supernatant was discarded. Trypsin was added to the protein in a ratio of 1:50, and enzyme digestion was proceeded at 37°C for 12-16 h.

| LC-MS/MS analysis
Using an LC-MS/MS system, we identified the differentially expressed proteins (DEPs) between patients with or without CTD-ILD. 15 Each BALF sample was separated using a high-performance liquid chromatography system with nanoliter flow rates. The chromatographic column was equilibrated with 95% of mobile phase A. Mobile phase A was 0.1% formic acid in the water, and mobile phase B was 0.1% formic acid in acetonitrile. The phase B gradient was linearly increased at 0-2 min to 6%-9%; 2-10 min to 13%; 10-50 min to 26%, 50-70 min to 38%; and 70-71 min to 100%; and maintained at 100% for 71-78 min. Each sample was separated by capillary high-performance liquid chromatography and analyzed by label-free MS with an Orbitrap Fusion Tribrid mass spectrometer (Thermo Scientific).

| Enzyme linked immunosorbent assay (ELISA)
ELISA was performed according to the instructions (Elabscience Biotechnology Co., Ltd). After the process for BALF samples, optical density (OD) value was measured at a wavelength of 450 nm using a microplate reader. The protein concentrations of BALF samples were calculated using a standard curve. All experiments were repeated for three times.

| Statistical analysis
GraphPad Prism software (version 5.0, San Diego, California) was used to analyze the experimental results of Mann-Whitney tests (two-tailed). Fisher's chi-square test was used to compare the categorical variable (only sex). Values are expressed as means ± SEM, and p < 0.05 was considered statistically significant.

| Participants
There was no significant difference detected between the participants in the experimental group and control group for age (mean ± SEM, 62.3 ± 6.9 vs. 44.8 ± 8.3 years; n = 8, p = 0.16) or sex (4 women in the experimental group vs. 3 men and 1 woman in the control group: p = 0.92).

| SDS-PAGE
BALF samples obtained from four patients in the experimental group and 4 patients in the control group were each separated using SDS-PAGE. In the molecular weight range of 15-220 kDa, total proteins from the eight samples were effectively separated without protein degradation. The protein levels were sufficient for subsequent experiments ( Figure 2).  Table 2 showed the top 10 upregulated and downregulated proteins. We plotted the magnitude of the FC (log10 FC)

| LC-MS/MS analysis and identification of DEPs
to obtain a volcano plot ( Figure 3A). The cluster analysis of the DEP expression showed that the expression patterns in patients with CTD-ILD differed from those in patients without CTD-ILD, and the protein expression within each group was clustered together ( Figure 3B).

| GO functional annotation and enrichment analysis
GO analyses provides information for three functional domains-the biological processes in which the proteins participate, the cellular locations where the proteins are present, and the molecular functional roles that the gene products play-and organizes these functional concepts as a directed acyclic graph. GO enrichment provides not only candidate sets for differential protein screening but also the functional enrichment of foreground gene sets and potential functions of the proteins. Therefore, GO enrichment results increase the reliability of research focused on determining pathogenesis.

| KEGG metabolic pathway analysis
KEGG is a database integrating genome, chemistry, and system function information to provide a genetic and chemical blueprint.
The most important signal transduction pathways and biochemical metabolic pathways associated with the genes of the DEPs can be determined using KEGG analysis. KEGG database analysis was used to assess the DEPs in BALF samples between patients with or without CTD-ILD. The results showed that KEGG pathways annotated with the DEPs included complement and coagulation cascades, metabolic pathways, pathways in cancer, and the peroxisome proliferator-activated receptor (PPAR) signaling pathway ( Figure 6).

| COG protein functional analysis
The COG protein database is designed to enable classification of proteins from completely sequenced genomes based on phylogenic lineages. 19 It is useful for predicting the function of a single protein and the functions of proteins in newly sequenced genomes. By using the web-based COGNITOR program, a protein can be compared with proteins in the COG database, and it can be classified into appropriate clusters. The COG database provides retrieval and query of COG classification data. By using COG database analyses, we predicted the potential functions of the DEPs between patients with or without CTD-ILD and performed functional classification statistics (Figure 8). The results showed that the functions of these DEPs were mainly focused in signal transduction mechanisms; posttranslational modification, protein turnover, chaperones; intracellular trafficking, secretion, and vesicular transport; amino acid transport and metabolism; and lipid transport and metabolism.

| Concentration of top DEPs in BALF
To confirm the DEPs in the LC-MS/MS experiment, we measured top three upregulated and downregulated DEPs using ELISA experiment. Our data showed that the expressions of SFTPD, CADM1 and ACSL4 were significantly upregulated in BALF samples of CTD-ILD group compared to control group. On the contrary, the expressions of SIL1, WIPF1 and SGSH were significantly downregulated in BALF samples of CTD-ILD group compared to control group (Table 3 and Early studies reported that GALNT1 is highly expressed in a variety of tumors, promotes tumor growth and metastasis, and is associated with poor prognosis. 39 However, later studies showed that GALNT1deficient mice exhibit a bleeding disorder and lack B-cell maturation. 40  Phosphorylation of the IKK complex leads to increased activation of the NF-κB signaling pathway to inhibit apoptosis as well as leading to T-and B-cell functional defects. 48 Therefore, we speculate that the upregulation of IKBKB detected in the BALF of patients with CTD-ILD may be related to immune hyperfunction.
PPARγ is a nuclear transcription receptor with extensive tissue and cell protection. PPARγ ligands are known to inhibit TGFβ-induced myofibroblast differentiation by targeting the PI3K/Akt pathway in the treatment of fibrosis. 49 Overall, our findings are consistent with previous studies from other groups. Thus, through KEGG analyses, we may provide several potential mechanisms underpinning the development of CTD-ILD and study direction for CTD-ILD. However, in the present study many findings are based on proteomics and bioinformatics analysis. It is hard to focus on one specific point to investigate deeply. Future experimental study should be helpful to clarify the exact molecular mechanism involving the occurrence and development of CTD-ILD.

| CON CLUS IONS
We

CO N FLI C T O F I NTE R E S T
The authors report no conflicts of interest in this work. Dahai Zhao https://orcid.org/0000-0003-1500-5608