ER, Mitochondria, and ISR Regulation by mt‐HSP70 and ATF5 upon Procollagen Misfolding in Osteoblasts

Abstract Cellular response to protein misfolding underlies multiple diseases. Collagens are the most abundant vertebrate proteins, yet little is known about cellular response to misfolding of their procollagen precursors. Osteoblasts (OBs)—the cells that make bone—produce so much procollagen that it accounts for up to 40% of mRNAs in the cell, which is why bone bears the brunt of mutations causing procollagen misfolding in osteogenesis imperfecta (OI). The present study of a G610C mouse model of OI by multiple transcriptomic techniques provides first solid clues to how OBs respond to misfolded procollagen accumulation in the endoplasmic reticulum (ER) and how this response affects OB function. Surprisingly, misfolded procollagen escapes the quality control in the ER lumen and indirectly triggers the integrated stress response (ISR) through other cell compartments. In G610C OBs, the ISR is regulated by mitochondrial HSP70 (mt‐HSP70) and ATF5 instead of their BIP and ATF4 paralogues, which normally activate and regulate ISR to secretory protein misfolding in the ER. The involvement of mt‐HSP70 and ATF5 together with other transcriptomic findings suggest that mitochondria might initiate the ISR upon disruption of ER‐mitochondria connections or might respond to the ISR activated by a yet unknown sensor.

. Identification of ISR activation pathway(s) in G610C mouse OBs; schematic study layout. 1. Candidates are identified by scRNASeq as ISR-related pathways upregulated by increased synthesis of type I procollagen in G610C OBs isolated from mouse femurs and parietal bones. 2. Potential effects (artifacts) of cell isolation for scRNASeq are eliminated by validating the findings with scRNASeq-like sequencing of mRNA from thousands of regularly spaced 55 µm spots in fresh frozen tibia sections (srRNASeq). 3. Upregulation of key marker genes is further validated with subcellular resolution by mRNA-FISH in fixed femur and tibia sections. 4. The corresponding key proteins are analyzed by Western blotting in primary OB cultures after validating the cell culture approach by comparing bulk RNASeq of cultured OBs with sc-and sr-RNASeq.
Supplementary Figure 2. OB identification in scRNASeq data. A. Localization of OBs (defined as shown in Fig. 1 1) were tuned to simultaneously maximize the separation between the cells and the number of OBs at each differentiation stage. Importantly, some expression of Acan, Acta2, Clec3b, Pecam1, and to a much smaller extent Cd33/Cd68 in OBs may be expected (particularly at early differentiation stages). Figure 3. OB malfunction. The plots show correlations between expression of selected genes and Col1a1 transcription in E18.5 and P5 OBs. They represent 20-point running average for cells sorted in the order of increasing Col1a1 expression. The vertical dotted lines separate subpopulations of early (eOB), differentiating (dOB), and mature (mOB) osteoblasts based on Col1a1 transcription as described in Fig. 2. Small black dots on top of the plots mark significantly different gene expression (p < 0.01, **) in Hom vs. WT (E18.5) and Het vs. WT (P5). The significance was estimated by the Wilcoxon test for each pair of running average windows centered at the same or closely matching Col1a1 expression. Circles with error bars show mean transcription in mOBs and mean value of the standard deviation (SD) for the running average across the full range of Col1a1. The G610C mutation clearly affects Wnt (top row), IGF (Igfbp5/6), and other (Pdgfa, Vdr) signaling pathways crucial for OB differentiation and function. ECM synthesis (Col15a1, Dcn), cell migration (Pdpn, Avil), cell metabolism (Adm2, Shmt2), and cell cycle (Cdkn1a, Rcc2) are also significantly altered. The latter effects of the G610C mutation appear only at high collagen expression and coincide with upregulation of ISR markers (Fig. 3b), suggesting that they are caused by procollagen misfolding inside the cell rather than by signals from outside (ECM and other cells). Changes in the expression of Cthrc1 and Pdgfa seem to have the same etiology. In contrast, effects of the mutation on Wif1 and Vdr appear to be independent of the osteoblast differentiation stage and level of Col1a1 transcription, suggesting that they are caused by altered cellular environment rather than cell-intrinsic factors. The assay sensitivity is not sufficient for similarly distinguishing the etiology of changes in Lrp5, Cpz, and Igfbp5/6. Strong upregulation of Atf5 (Fig. 3B), Ddit3, Eif3c, Eif4ebp1, Nupr1, and Trib3 at high Col1a1 expression unequivocally demonstrates ISR activation by misfolding of G610C procollagen. Minimal change (≤ 1 SD) or downregulation of Hspa5 (Fig. 3B) and other UPR genes with increasing Col1a1 is consistent with the lack of UPR upstream of ISR in G610C osteoblasts, which we reported before. [2] Transcription of Hspa5 and other general ER chaperones (Hsp90b1, Canx, Calr, Pdia4, P4hb) is upregulated by all UPR pathways. [3] Increased transcription of Atf6, Xbp1, Ern1 (IRE1), and Eif2ak3 (PERK) has been reported in UPR as well. [4] Increased Atf6 transcription may maintain cellular homeostasis when ATF6 is translocated to Golgi and cleaved. Increased Xbp1 transcription may be a similar response to its alternative splicing. However, upregulation of Ern1 and Eif2ak3 may not be a common feature of the corresponding IRE1 and PERK pathways, although reported in HeLa cells treated with thapsigargin. [4] IRE1 and PERK functions do not involve cleavage or degradation of these proteins.

Supplementary
Increased transcription of Creb3l1 in Het P5 OBs occurs in the same range of Col1a1 expression as ISR activation, but no effect of the mutation on Creb3l1 in Het E18.5 and Hom E18.5 OBs indicates that CREB3L1 is not essential for activating ISR in G610C OBs. Creb3l1 encodes an ATF6-like ER membrane sensor of ER disruption (CREB3L1/OASIS). Nearly perfect correlation with Col1a1 points to its transcriptional regulation. It is one of OI genes, which is known to regulate Col1a1. [5] CREB3L1 is cleaved like ATF6 upon ER disruption, [5] yet Creb3l1 transcription is not affected by the G610C mutation in E18.5 OBs despite particularly severe ER disruption and ISR in Hom cells. Figure 5. Correlation between expression of Serpinh1, Atf5, and Hspa9 in different cells. UMAP plots of all cells extracted from E18.5 femurs and tibia (top row) as well as P5 parietal bones (bottom row) are shown. Each dot represents an individual cell. The intensity of the red is proportional to logarithm of the UMI count. Serpinh1 encodes an HSP47 chaperone, transcription of which parallels that of collagens. [6] Therefore, Serpinh1 transcription reveals most collagen-producing cells in bone and surrounding tissues. Figure 6. Expression of key mitochondrial UPR (mt-UPR) genes. Dependence of the expression of genes known to be transcriptionally upregulated in canonical mt-UPR on the expression of Col1a1 (for Hspa9 and Atf5 see Fig. 3B). Similar to Fig. 3B and Supp. Fig. 4, the curves show 20-point running averages, black dots show gene upregulation with p < 0.01 within each running average window, and circles with error bars show mean expression in mOBs and standard deviation averaged over the full range of Col1a1. Only Asns increases with Col1a1, is highly upregulated in mOBs, and is therefore affected by the ER disruption upon G610C procollagen misfolding. Much smaller upregulation of Hspd1 (mt-HSP60) and Hspe1 (mt-HSP10) is independent of Col1a1 expression and therefore likely associated with extracellular effects of the secreted mutant procollagen that alters the ECM and cell-ECM interactions. Dnaja3 (mt-DNAJ) and genes encoding mt-UPR proteases (Clpp and Lonp1) do not appear to be upregulated. The few groups of cells with p < 0.01 in Hom E18.5 (5-10 Col1a1 spots) are likely stochastic outliers since ~ 5 outlier spots are expected due to multiple comparison effects on the statistical analysis (see Methods). However, low expression of the latter genes precluded their accurate quantification by scRNASeq (no UMI counts in many OBs). Overall, the observed mitochondrial response does not appear to be consistent with canonical mt-UPR.

Supplementary Methods: Normalization of Relative Expression in different RNA sequencing approaches
The problem of RNASeq data normalization has been extensively discussed (see, e.g., Ref. [7] and references therein). One of the key confounding issues underlying this problem is that NGS cDNA libraries are prepared and amplified by PCR, the efficiency and quality of which vary with the nucleotide sequence. As a result, the number of reads per cDNA fragment -which is the starting point for all analysis -may depend not only on the abundance of the corresponding mRNA but also on its sequence. This and other RNASeq technicalities discussed below are expected to have particularly strong effects on transcripts from low expression genes. Some of the data normalization procedures (e.g., LogNormalize and SCTransform for scRNASeq or DESeq2 for bulk RNASeq) attempt to address the resulting uncertainties in quantifying transcription of low expression genes through different mathematical means. However, these models have been developed based on RNASeq data from cell populations other than OBs. Therefore, we examined whether they are applicable to our study or other approaches to OBs were needed.

scRNASeq
In 10X Genomics scRNASeq and srRNASeq assays used for the present study, the PCR problem is partially resolved by utilizing 3' sequencing with UMIs (see manufacturer's protocols). Specifically, each cDNA is reverse transcribed from the 3' end of mRNA by capturing the mRNA vis hybridization with oligonucleotides that contain poly(dT) for the hybridization and unique molecular identifier (UMI) barcodes. This allows each mRNA molecule to be counted only once based on the UMI regardless of the number of copies produced by the PCR, reducing PCR artifacts. This partially solves the problem, at least for genes that produce enough mRNA for reliable reverse transcription and PCR amplification.
Nonetheless, 10X Genomics and all other UMI-based scRNASeq assays have another crucial limitation affecting data analysis. It is related to a finite number of the UMI-tagged oligonucleotides available for reverse transcription of mRNA from each cell, N UMI. In the 10X Genomics assay, NUMI is the number of oligonucleotides on each gel bead that is captured within an emulsion droplet with its cell. Because of the technicalities of the bead manufacturing process, NUMI is inherently not the same for different beads (and therefore cells) and may be highly variable. It is one of the factors contributing to the large variation in the sequencing depth per cell.
Ideally, the goal of scRNASeq would be to accurately measure the actual number of transcripts for each gene i (mRNAi) in each cell, but the current technology does not allow it. Indeed, when the total number of transcripts in a cell, mRNAcell becomes comparable to or exceeds NUMI, the relationship between the number of reverse transcribed UMIs per gene i (UMIi) and the number of reverse transcribed UMIs per cell (UMIcell = Σi UMIi) become nonlinear and dependent on NUMI. Not only NUMI varies from bead to bead, but mRNAcell varies from cell to cell. The resulting cell-to-cell variation in UMIcell (further confounded by the sequencing depth) would be difficult to account for even if mRNA hybridization with the oligonucleotides followed an equilibrium binding isotherm. (The equilibrium binding is not likely either given high activation energy and therefore practically infinite time needed for equilibrating mRNA hybridization.) In other words, the general relationship between the measured relative count

=
(1) and the actual fraction of the transcript in the cell (mRNAi /mRNAcell) is: (a) not linear, (b) not trivial, and (c) not known. Even if we were to assume that the subsequent PCR and sequencing were ideal, there would be no statistically rigorous way to account for this effect without establishing the efficiency of hybridization of different mRNAs with the oligonucleotides, the statistics of NUMI, and the statistics of mRNAcell. Because of their large number, highly expressed genes get to "choose" the oligonucleotides once the cell is lysed. For these genes, the approximation RCi ≈ (mRNAi /mRNAcell) may be reasonable. For low and potentially even moderate expression genes, it may not be the case. To avoid overinterpreting and/or misinterpreting scRNASeq data, it is crucial to at least acknowledge this limitation. With this in mind, %UMI used in the present study, provides the simplest possible approach to data normalization that accounts for all sources of variations in the sequencing depth (including NUMI) without attempting to parametrize the unknown relationship between RCi and (mRNAi /mRNAcell). A scaled relative count, is essentially the same approach implemented in Seurat, except it utilizes a scaling factor sf that can be set by the user (Seurat's default is sf = 10,000). For instance, %UMIi can be used for estimating differential expression of the gene i between cells that have similar overall mRNA transcription (mRNAcell), e.g., G610C OBs vs. WT OBs with the same level of Col1a1 mRNA. In this case, the unknown relationship between the measured RCi and the fraction of the transcript in the cell would be the same for the two types of cells and would not alter the conclusions. Unfortunately, the same cannot be guaranteed for other normalization procedures that parametrize the data and incorporate additional model assumptions and/or nonlinear transformations, e.g., SCTransform and LogNormalize procedures in Seurat. Regardless of potential benefits of these procedure for some applications, we do not find them to be justified for differential gene expression analysis in OBs and potentially other highly secretory cells.
A key feature of mature OBs (mOBs) is that collagen I mRNA (Col1a1 + Col1a2) overwhelms all other transcripts, accounting for up to 40% of all mRNA molecules detected by scRNASeq in the cell (Supp. Figure 7A). To keep up with the massive collagen synthesis, the cell must selectively ramp up transcription of multiple other genes (e.g., ribosomal proteins), resulting in at least ~ 2-fold increase in the total mRNA/cell during OB maturation (Supp. Figure 7B-D). In the end, mOBs may produce ~ 5 times more mRNA/cell than non-secretory cells (e.g., macrophages, Supp. Figure 7B). As noted above, this feature of mRNA transcription in mOBs does not affects the analysis based on nonparametric normalization with %UMIi (or sRCi), yet it is incompatible with models based on nonlinear transformations or existing parametrized models of data dispersion.
For instance, consider the SCTransform data parametrization model implemented in Seurat, which is rapidly gaining popularity. [8] In SCTransform, the data are normalized by extracting the normalization parameters from negative binomial regression of UMIi vs. UMIcell (followed by subsequent regularization to avoid data overfitting). Regardless of the general merits of ad hoc assumptions built into this approach, such parametrization may severely distort differential expression results for mOBs because it is based on regression across all cells rather than just the cells with the same mRNA transcription patterns (e.g., mRNAcell). As discussed above, just a change in the mRNAcell should already be expected to alter the relationship between UMIi and the actual transcript fraction in the cell (as well as between UMIi and UMIcell). In our opinion, the ~ 5-fold or larger difference in mRNAcell between OBs and other cells and at least 2-fold difference in mRNAcell within the OB subpopulation (Supp. Figure 7B) preclude utilization of SCTransform for analyzing differential expression of low and moderately expressed genes in our case. Figure 7. Transcription of mRNA in OBs. A. Col1a1+Col1a2 fraction of total mRNA molecules in mOBs (defined as OBs with Col1a1 > 3.5% UMI, see Fig. 2A). B. Dependence of the total mRNA/cell on the level of collagen synthesis (eOBs = Col1a1 ≤ 0.75% UMI, dOBs = 0.75% < Col1a1 ≤ 3.5 % UMI, see Fig. 2A). Cd68+ cells are monocytes (mostly macrophages) with Cd68 > 3 (UMI count). C. Correlation between the amount Rpl41 mRNA (encoding ribosomal protein L41 essential for collagen translation) and Col1a1 mRNA in mOBs. D. Correlation between Actg1 mRNA (encoding actin G1) and Col1a1 mRNA. A strong correlation between Col1a1 and gene transcripts involved in collagen synthesis combined with the lack of such correlation for other transcripts indicates that OBs selectively ramp up transcription of the former genes to support collagen production.

Supplementary
Note that E18.5 cells were used for panels B-D because of inherent limitations on NUMI in the 10X Genomics scRNASeq assay we utilized. In this assay, mRNAcell was not only larger than NUMI for mOBs but also close to NUMI even in those P5 cells that produced little or no collagen (~ 30,000 transcripts per average macrophage). As a result, we could detect the change in mRNAcell only in E18.5 cells (<10,000 transcripts per average macrophage and <20,000 transcripts per average eOB).

_____________________________________________________________________________________
Another approach implemented in various scRNASeq data analysis packages is logarithmic normalization. For instance, the default Seurat procedure is LogNormalize, in which the log-normalized gene expression (LNEi) for gene i is calculated from the raw UMIi count for the gene as e.g., to avoid ln(0). For differential gene expression analysis in Seurat v4 (FindMarkers function), this normalization involves another adjustable parameter pc (pseudocount) that can be set by the user (Seurat's default is pc = 1), which effectively replaces 1 in a logarithm similar to Eq. (4) when calculating mean fold-change in gene expression. The choice of sf and pc allows the user to suppress variations in LNEi for low expression genes, but fold-change in gene expression then becomes dependent on both sf and pc. Whatever the benefits of LogNormalize might be, RCi in our data set varies from RCi ~ 2•10 -5 at UMIi = 1 and UMIcell ~ 50,000 to RCi ~ 0.1 at UMIi ~ 500 and UMIcell ~ 5,000, causing strongly nonlinear dependence of fold-change in expression of most genes on sf and pc. Not only the results become highly dependent on the choice of sf and pc, but the effects of this choice become very different for genes with different expression level. The dependence on sf and pc can be eliminated only by setting sf >> max(pc•UMIcell), but that is an arbitrary choice as well. Without a validated justification for the choice of sf and pc that is based on objective criteria grounded in experimental facts rather than on the desired appearance of the results, such an approach may not be advisable. Unfortunately, we are not aware of such a justification.
Therefore, we normalized and reported all data as %UMI to avoid biasing the outcome of our analysis.

Bulk RNASeq
Like SCTransform, the DESeq2 approach claims to address the data normalization problem in bulk RNASeq through advanced mathematical modeling and regression procedures based on ad hoc assumptions (e.g., the negative binomial model). [9] In our opinion, utilization of this approach for special cell populations like OBs may not be advisable for essentially the same reasons as SCTransform. When combined with PCR-related effects and inherent lack of a linear relationship between expression of some genes and mRNAcell, large systematic rather than random variations in total mRNAcell (Supp. Figure 7B) may introduce even bigger problems for data parametrization. The assumptions built into DESeq2 are inconsistent with our data set and may thereby lead to unreliable analysis outcome.
In other common data normalization procedures used for traditional bulk RNASeq, e.g., FPKM and TPM, the PCR problem is ignored for the lack of a better solution. [10] FPKM and TPM are not based on data parametrization, but these approaches may also be problematic in our case because of PCR bias. Indeed, as shown in Supp. Figure 7A, mOBs -our main cells of interest -have disproportionately high content of Col1a1 and Col1a2 mRNA. The corresponding cDNA is rich in GC base pairs and repetitive sequences, presenting a challenge for PCR. Under these conditions, reduced transcription of Col1a1 and Col1a2 by Het vs WT cells (Figure 2) may introduce a systematic bias into the analysis.
To verify whether this is indeed the case, we compared expression of several housekeeping genes in Het and WT cells using just the minimally necessary normalization for the sequencing depth. First, we utilized our scRNASeq dataset for P5 OBs to choose and validate housekeeping genes that are truly independent of the OB differentiation stage. We chose this dataset because the cultured primary cells were extracted from the same P5 parietal bones as used for the scRNASeq experiments. Figure 8. Housekeeping genes. The plots show dependence of housekeeping gene expression in Het (yellow) and WT (green) cells on Col1a1 expression determined by scRNASeq of P5 parietal bone cells (20-cell running average calculated as described in Fig. 3B). Geometric Mean is the geometric mean for the 4 genes.

Supplementary
After examining ~ 50 housekeeping gene candidates previously studied by others, [11] we found that Actg1, Actb, Mrfap1, and Sdha met all the criteria (Supp. Figure 8). (a) Their expression was the same in Het and WT OBs at the same level of Col1a1 expression. (b) Their %UMI remained constant at low Col1a1 expression and decreased at high Col1a1 expression in mOBs, which was expected for genes unrelated to collagen synthesis once the total mRNA/cell began to increase due to increasing Col1a1 transcription. (c) The relative dispersion of %UMI for these genes was within the average range for OBs.
We then confirmed the Het vs. WT bias in bulk RNASeq by observing that the geometric mean of the depth-normalized expression for Actg1, Actb, Mrfap1, and Sdha in bulk RNASeq was noticeably higher in Het vs WT cells (Supp. Figure 9, last panel). Like the genes themselves, the geometric mean %UMI for these genes exhibited no difference in %UMI between Het and WT at the same level of Col1a1 expression (Supp. Figure 8). We also observed the same bias for each of the 4 genes individually (Supp. Figure 9). As noted above, this bias could be caused by higher PCR quality and efficiency because of the lower average Col1a1 mRNA content in Het cells (Supp. Table 1). The bias could also be a simple arithmetic consequence of lower average mRNAcell in Het vs. WT and therefore higher normalized counts for all genes in Het OBs.
Supplementary Figure 9. Bias in housekeeping gene expression after sequencing depth normalization of bulk RNASeq. The plots show dependence of housekeeping gene expression in primary cells isolated from P5 parietal bones on days in culture determined by bulk RNASeq (expression was normalized to sequence depth and scaled). Circles, squares, and triangles show 3 Het (yellow) and 3 WT (green) replicates. Error bars in the geometric mean plot show standard deviation for the 3 replicates. The observed bias is caused by lower average Col1a1 expression in Het vs. WT cells. Unlike scRNASeq in Supp. Figure 8, this bias cannot be removed by data stratification based on Col1a1 expression in individual cells since such information is not available in bulk RNASeq.
To resolve this bias problem, we utilized a well-tested empirical solution known since the early days of qPCR, which is normalizing the data with the geometric mean of housekeeping gene expression. Specifically, we divided the sequencing-depth-normalized counts for the genes of interest by the geometric mean of the housekeeping gene counts, yielding expression values relative to the housekeeping genes. By utilizing the same, validated Actg1, Actb, Mrfap1, and Sdha genes, we could then compare the expression of various genes in Het and WT cells similar to the ΔΔCT approach in qPCR. Additional normalization for the gene length (akin to FPKM) could be utilized as well, but it was not necessary in the context of our study since we were interested only in comparing relative expression of the same gene in Het and WT rather than in comparing relative expression of different genes. Given that this normalization was not needed and that it would be based on a questionable assumption of similar PCR efficiency across the gene length and between different genes, we decided to avoid it.