Identification and quantification of degradome components in human synovial fluid reveals an increased proteolytic activity in knee osteoarthritis patients vs controls

Synovial fluid (SF) may contain cleavage products of proteolytic activities. Our aim was to characterize the degradome through analysis of proteolytic activity and differential abundance of these components in a peptidomic analysis of SF in knee osteoarthritis (OA) patients versus controls (n = 23). SF samples from end‐stage knee osteoarthritis patients undergoing total knee replacement surgery and controls, that is, deceased donors without known knee disease were previously run using liquid chromatography mass spectrometry (LC‐MS). This data was used to perform new database searches generating results for non‐tryptic and semi‐tryptic peptides for studies of degradomics in OA. We used linear mixed models to estimate differences in peptide‐level expression between the two groups. Known proteolytic events (from the MEROPS peptidase database) were mapped to the dataset, allowing the identification of potential proteases and which substrates they cleave. We also developed a peptide‐centric R tool, proteasy, which facilitates analyses that involve retrieval and mapping of proteolytic events. We identified 429 differentially abundant peptides. We found that the increased abundance of cleaved APOA1 peptides is likely a consequence of enzymatic degradation by metalloproteinases and chymase. We identified metalloproteinase, chymase, and cathepsins as the main proteolytic actors. The analysis indicated increased activity of these proteases irrespective of their abundance.


INTRODUCTION
Osteoarthritis (OA) is a degenerative joint disease and a common cause of joint damage. It is especially common in middle-aged and elderly people and is estimated to affect more than 25% of adults [1]. Currently there are no therapies which can cure or prevent the disease. The pathological changes that characterize knee osteoarthritis (OA) are typically the gradual breakdown of articular cartilage, low-grade inflammation of the synovium, bone alterations, and meniscus degeneration [2]. Although many factors contribute to the disease progression, proteases, and inflammatory cytokines are considered a main contributor to the catabolic processes of OA, including interleukin 1 (IL-1 ), tissue necrosis factor-(TNF-), IL-6, IL-12, IL-15 contributing to the increased expression of matrix metalloproteinases (MMPs), which in turn drive extracellular matrix (ECM) degradation [3][4][5][6].
Using targeted approaches limited to specific proteases have previously been reported to be involved in the progression of OA [4,7].
However, a screening approach that analyses all the active proteases has to the best of our knowledge not been reported. Thus, the relative abundance of peptides in OA and its associated protease activity remains understudied. Further research in this area is needed to better understand the role of proteases in OA disease progression.
Synovial fluid (SF) is in direct contact with cartilage, synovium, and meniscus in the joint cavities where it acts like a lubricant by reducing friction and has additional metabolic and regulatory functions [8].
Peptidomics can provide information regarding the proteolytic activity that generated the observed endogenously cleaved protein fragments [18]. The use of quantitative peptidomics allows studies on relative peptide levels as a result of an increased or decreased proteolytic activity. The standard peptidomics approach is to isolate peptides from the rest of the protein bulk, for example, by ethanol precipitation, where smaller peptides stay in the supernatant while larger peptides and proteins end up in the protein pellet, or by ultrafiltration where the filtrate contains the peptidome. This usually results in very few stable peptides that could be further used in a quantitative manner [19,20]. Studying the cleaving sites of relatively intact proteins may be seen as a lost treasure that is usually overlooked in proteomics due to the need for artificial cleavages, usually by trypsin, to be able to detect peptides with a bottom-up mass spectrometry (MS) method.
We hypothesized that endogenously cleaved protein fragments in SF could reflect the changed protease activity and provide important new insights about the OA disease process. Thus, our aim was to characterize the degradome of human knee SF in knee OA versus controls. Specifically, we aimed to (1) identify and quantify endogenously cleaved peptides potentially involved in the OA process, (2) identify proteases whose proteolytic activity may impact the host proteins of these peptides.

Significance of the Study
This study provides a broad array of differentially abundant endogenously cleaved peptides and their potential cleaving actor in human SF. This study demonstrates that the proteolytic activity of the predicted proteases extends beyond the extracellular matrix of the surrounding tissues and can also affect factors such as chylomicron assembly potentially leading to a hampered homeostasis.

Human synovial fluid samples
In this study we reused the raw MS data files from a previous study [14] to extract semi-and non-tryptic peptide data. Briefly, the raw files were generated from SF obtained from end-stage medial compartment knee OA patients undergoing knee arthroplasty in the year 2017 in the Skåne region, Sweden, (n = 11, age range 55-80 years, eight women and three men) and deceased (between year 2017 and 2018, from the same geographical area as the patients) human donors without known chronic knee disease (n = 12, age range 19-79 years, five women and seven men) [14]. The latter group will hereafter be referred to as controls. Informed consent has been obtained for all samples included in this analysis. The sample collection and analysis have been approved by the ethical review committee of Lund and have been carried out in accordance with relevant guidelines and regulations by the Declaration of Helsinki principles. Only SF samples in which we did not detect visible blood contamination were included. One of the original 13 control samples was discarded due to a random error during the identification search in Peaks Studio X.

Statistical (quantitative) analysis
The inclusion criterion for the quantitative analysis was for a peptide to be quantified in at least seven samples in each group. After filtering for missing values, 1008 peptide sequences remained and were included in the quantitative analysis ( Figure 1). We conducted the statistical analysis using mixed linear regression models in R using the lme4 package on base-2 log-transformed intensity as the response.
A separate model was fitted to each protein, including all peptides from this protein. Age, sex, disease status, and peptide were used as fixed effects terms with interactions between disease status and peptide. The subject was included as random effects term. Contrasts between groups (OA vs. controls) were specified using the emmeans package and are reported with 95% confidence intervals based on restricted maximum likelihood estimates using the Kenward-Rogers method for estimation of degrees of freedom. Peptides that had a 95% confidence interval of the base 2 log fold-change not spanning zero were considered differentially abundant. Although the comparison was made on the peptide level, much of the biological meaning is found on the protein level. Therefore, we mainly describe and emphasize the "host protein," that is, the annotated protein representing a peptide used in the analysis. Given the exploratory nature of the study and the use of mixed models that minimize the multiplicity problem [22], we did not apply any further corrections for multiplicity, but rather we report all derived estimates to inform future studies and metaanalyses.

Qualitative analysis
In the set of peptide sequences with fewer than seven quantified samples in either group, we identified 683 peptides (115 proteins) of interest that were used in a qualitative analysis ( Figure 1). The peptides were ranked based on the difference in the number of samples they had been quantified in, that is, having a non-missing value. For example, a sequence quantified in eleven OA samples and only in one control sample, or vice versa, would have a high rank. Host proteins of peptides with an absolute difference in the number of quantified samples greater than or equal to seven were defined as qualitatively differentially abundant.

Protease mapping
We used the protease database MEROPS [23]   were extracted from the MEROPS database were further searched against the current dataset to be able to identify which proteases that actually could be detected in this cohort. These proteases will be from here on referred to as "found in SF."

R package "proteasy"
To facilitate the workflow and reproducibility of this and future analyses, the first author (MR) developed an R-package, which is available on GitHub (https://github.com/martinry/proteasy) and was accepted to Bioconductor on June 28 th , 2022. The main function, findProteases, takes a set of UniProt accessions, peptide sequences, and name of studied organism as input, and outputs the equivalent of Tables S1-S3 of this manuscript, detailing the proteases, substrates, and cleavages of potential relevance for the input. Full documentation of the package is available in the package manual, with examples provided in the accompanied vignette.

Pathway and network analysis
Functional analysis of pathways was conducted using a local database containing pathway data from REACTOME [24] and using a custom script for pathway analysis from a previously published study [25]. Inclusion criteria for pathway analysis was host proteins whose peptides were differentially expressed, as previously defined. The "background proteome" was set to host proteins of all identified peptides in the dataset. Only sets of differentially abundant proteins larger than three were included.
Proteases and substrates were also visualized as an interaction network using the igraph [26] library in R. Here, we included all proteases found in SF in the current dataset, the substrates they cleave, and annotated (where applicable) whether the protein's abundance increased or decreased, according to the definitions stated above.

Qualitative analysis
In the qualitative data analysis, the largest difference was observed for HRG, plasminogen (PLG), FN1, AHSG and inter-alpha-trypsin inhibitor (ITIH2) that were found in OA but not control samples, and decorin (DCN) and extracellular superoxide dismutase (SOD3) found in control samples but not in OA samples (Table S5).

Protease mapping
We identified entries corresponding to 954 proteolytic events (Table   S3) based on our mapping of peptide ends to the MEROPS database.
A majority, 736 cleaved at the N-terminal position. One hundred ninety-two proteases (57 reviewed) were identified as potentially cleaving actors (Table S2). Out of those, 11 proteases were also found in SF (Figure 2), one of which complement factor B (CFB) contains peptides increased in both the quantitative and qualitative analysis.
We found 33 reviewed proteases potentially acting on substrates of differentially abundant peptides (Table S7) (Table S7).

Pathway analysis
Pathway analysis of host proteins of peptides with increased levels in OA group resulted in 65 pathways with three or more differentially abundant proteins in each set (Table 1). These results suggested OA protein activity predominantly in pathways related to immune system (particularly complement activation), transport of small molecules, and hemostasis. The pathways with complete overlap between differentially abundant proteins and background sets were "Terminal pathway of complement" and "Alternative complement activation."

Comparison with tryptic data
We assessed whether increased levels of a peptide was due to elevated enzymatic degradation of its host-protein in OA or greater abundance of the intact host-protein in OA, by contrasting our quantitative results with our previous study that compared the same late-stage OA and healthy controls [14], but in which protein abundances were calculated from intensities of tryptic peptides (Figures S1 and S2). We found that 49 host-proteins of differentially abundant semi-and non-tryptic peptides in the current study were not differentially abundant in the previous study (Tables S4 and S5). For example, proteins APOA1, ITIH1 and C3 were not differentially abundant in the previous study but are host-proteins of highly increased peptides in the current analysis.

DISCUSSION
In this study of endogenously cleaved peptides in human endstage knee OA versus knee-healthy controls, we identified 69 host TA B L E 1 Pathway results for host proteins of differentially abundant peptides.

Pathway Category Ratio
Terminal pathway of complement Note: Ratio is the ratio of proteins in the set of host proteins of differentially abundant peptides overlapping with the background proteome set. All host proteins of peptides (non-tryptic, semi-tryptic, and tryptic) identified in SF were used as background proteome, and only pathways with more than three proteins in a set were included.
proteins of 429 differentially abundant semi-or non-tryptic peptides, a majority of which were increased in OA, suggesting an increased proteolytic activity of the proteases that potentially cleave these peptides.
Strongest support for such cleaving event to have happened exists for those proteases that we also could identify in SF (Figure 3). The majority of these proteases were metalloproteinases, cathepsins, and carboxypeptidases.
To facilitate retrieval of cleavage sites, we developed an R-package In both the qualitative and quantitative comparisons, we found peptide levels of HRG to differ the most between the OA and the control group (Table S5 and S6). HRG has been reported to be increased in OA in previous proteomics studies [27,28]. HRG is an abundant protein in plasma and has been referred to as "the Swiss army knife" of plasma due to its involvement in many biological processes and ability to interact with multiple ligands simultaneously [29]. We also observed increased protease activity of AHSG, another glycoprotein, known to influence the mineral phase of bone [30]. The extensive proteolysis these proteins undergo may negatively impact the regulatory functions they are involved in. These two proteins, together with KNG1, form the type 3 subgroup within the human cystatin superfamily of cysteine protease inhibitors [31]. This subgroup has been implicated in angiogenesis [32], the formation of new blood vessels and may be useful in treatment of diseases with extensive angiogenesis. In OA progression, vascular growth is increased in the synovium, osteophytes and menisci which contributes to the development of synovitis, osteochondral damage, osteophyte formation and meniscal pathology [33].
We examined whether increased levels of a peptide was due to elevated enzymatic degradation of its host-protein in OA, or if the increase was a result of greater abundance of the intact protein in OA, by contrasting our results with a previous study conducted on the same samples, but where quantification was done solely on tryptic peptides. We hypothesized that a protein which is increased/upregulated in OA versus controls in both studies is likely due to greater quanti-ties of the intact protein, while an increase of semi-and non-tryptic peptides alone is likely a consequence of degradation. We found that one such protein, APOA1, was not upregulated in the study based on tryptic peptides, but the semi-tryptic peptides for which it was annotated in the current study were highly increased. Further, we found CMA1 as a possible cleaving protease matching APOA1 peptides (Table S7).
The two largest subnetworks from the 32 differentially cleaved host proteins belongs to lipoprotein particles complexes, such as VLDL and LDL and the complement factor cascade. We found the proteases and substrates acting on the lipoprotein particle complexes and complement cascade to be important factors in OA development (Table 1).
Two pathways with involvement in transport of small molecules; "Chylomicron remodeling," "Chylomicron assembly" ranked highly due to the high representation of apolipoproteins. The proteases that potentially cleaved these apolipoproteins were MMP7 and MMP12 (Table   S6) that are commonly known to cleave ECM proteins. MMPs have previously been shown to be able to inhibit apolipoproteins functionality by cleaving them [34]. This suggests that the MMPs do not only act as degrading enzymes of the surrounding tissues but could also inhibit the lipid metabolism in that environment [34,35]. Chymase has also been identified to be an important protease cleaving apolipoproteins [36]. The main producer of chymase is mast cells [37]. Mast cells have previously been detected to be in a higher range in synovium of OA in comparison to RA [38] and to be associated with radiographic damage in OA [38]. These findings suggest an increased proteolytic activity not just against the surrounding tissues but also plasma proteins that are essential to maintain the homeostasis of the joints.
Most of the differentially cleaved proteins (Table S6) were proteins involved in the extracellular cellular matrix assembly; fibronectin, cartilage acidic protein 1 (CRTAC1), clusterin (CLUAP1), aggrecan and actin (ACTB). Also plasma proteins; C3, albumin (ALB) were differentially cleaved. Proteases that were found to actively cleave some of these proteins were MMPs, chymase, caspase, elastase, granzyme, but no known cleaving site was detected for CRTAC1 [39]. CRTAC1 is known to be abundant in cartilage and CRTAC1 as a protein of an interest to be detected both in early and late-stage OA that have showed a trend of increase in late-stage OA [39]. The semi-tryptic peptides that were found to be differentially expressed in this study are fragments from the same part of CRTAC1, which means a potential to multiple cleaving activities in this specific region of the protein.
A study by Abji et al. examined the proteases present in SF using flow cytometry [40]. They identified 42 proteases identified in psoriatic arthritis, rheumatoid arthritis, and OA patients. Comparing these results to our findings, 11 of these proteases were also identified in the current study, a majority of which were MMPs.  [45].
These protocols enhance the detection of smaller peptides, generally enabling detection of a few hundred up to a thousand peptides. To cover a higher cleaving action usually multiple enrichment protocols need to be addressed. Therefore, our strategy in this study was to use a protocol that eliminates enrichment selection of a specific subgroup of peptides. By studying the fragments/peptides of the of the proteins that remain in the pellet after ethanol precipitation, we were able to identify a larger number of peptides that was not cleaved by trypsin. In our analysis, we identified 12,098 peptides with at least one non-tryptic end site. Studying a larger set of peptides could then better reflect the proteolytic activity. This strategy comes at a cost in that small protein fragments are eliminated during the precipitation stage. Moreover, endogenous proteases that target tryptic cleavage sites cannot be distinguished from trypsin.
Missing data on peptide abundance are a known issue in MS analyses. Not to discard these data, we performed a quantitative analysis based on the patterns of missingness. We acknowledge that our definition of qualitatively differentially abundant peptides is arbitrary, but we believe it provides more information than discarding the missing data. For transparency, we include the information on missing data pattern for all identified peptides, including those not suitable for statistical analysis (Table S8). But even with such a drawback still the amount of quantified semi-tryptic peptides that were detected is to our knowledge the highest number of identified semi-tryptic peptides identified differing between OA and controls [20].
Some limitations for the pathway analysis are that ranking was done by presence of differential peptides in relation to annotated pep-tides in the background set, a metric which does not necessarily take into account the complex relationships present in biological pathways.
Another limitation is the somewhat abstract concept of a pathway, which is not always representative of the underlying biology, and is subject to interpretation [46].
Finally, many cleaved peptides are not identifiable from existing databases. This may have several explanations. The quality of peptide identification relies on estimating a false discovery rate using a decoy-target approach, but may still include false positive identifications. Moreover, incomplete annotation is still a limitation for protease databases. For each proteolytic event, MEROPS provides a list of selected references, and currently contains over 10,000 such references [23]. Inevitably, the cost of avoiding annotation error by availability of peer-reviewed references limits the coverage of the knowledgebase.
In summary, we have performed a discovery-based profiling of the SF peptidome in OA and healthy subjects, and studied the role of proteases and inflammatory cytokines which are the potential cleaving actors. We developed an R-package to facilitate analyses of proteases and substrates. Our findings suggest the increased proteolytic activity associated with OA catabolism is not restricted to the ECM of the surrounding tissues, but may also implicate homeostasis by external factors through mechanisms such as chylomicron remodeling and assembly. University, Sweden. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

CONFLICT OF INTEREST STATEMENT
The authors have declared no conflict of interest.

ASSOCIATED DATA
The raw MS files has been deposited to ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD023708.

DATA AVAILABILITY STATEMENT
The raw MS files has been deposited to ProteomeXchange Consortium via the PRIDE partner repository with the dataset identifier PXD023708.