Transcriptional characterization of human megakaryocyte polyploidization and lineage commitment

Megakaryocytes (MKs) originate from cells immuno‐phenotypically indistinguishable from hematopoietic stem cells (HSCs), bypassing intermediate progenitors. They mature within the adult bone marrow and release platelets into the circulation. Until now, there have been no transcriptional studies of primary human bone marrow MKs.


| INTRODUC TI ON
Megakaryocytes (MKs) comprise <0.01% of all nucleated cells in the human bone marrow. These are large, fragile, up to 150 μm in diameter, and highly polyploid (average 16 N) cells. 1 The current knowledge about their transcriptional landscape is largely based on gene expression array data of in vitro differentiated MKs cultured from CD34+ cells obtained from fetal liver, cord blood, and adult blood; all having an average ploidy of 2 N and with severely constrained capacity for platelet production. [2][3][4][5][6] It is therefore likely that some drivers of differentiation, ploidy change, and maturation of in vivo MKs are yet to be identified.
MK differentiation from hematopoietic stem cells (HSCs) via the classic route of intermediate progenitors has been challenged and been shown that they can originate directly from the multipotent HSC compartment as the first lineage bifurcation. [7][8][9][10][11] Lineage tracing experiments have established a direct HSC origin, independent from other lineages, for at least one-half of all MKs. 12 MK-primed HSCs have been identified in mouse by their expression of genes for the coagulation protein von Willebrand factor 13 and glycoprotein (GP) 2b (CD41) of the integrin receptor for fibrinogen (integrin alpha2-beta). 14 The primary physiological function of MKs is thrombopoiesis, where each cell produces up to 6000 platelets that play a pivotal role in hemostasis. During steady-state thrombopoiesis, the circulating platelet mass is maintained constant and an inverse relationship between platelet count and volume exists. 15 Platelet consumption results in the acute release of larger platelets to maintain overall circulating platelet mass and aberrations in platelet count and function result in thrombotic or bleeding disorders. 16 Evidence suggests that stemlike MK-committed progenitors are activated upon acute inflammatory stress. 17 Moreover, sepsis drives changes in transcription and translation in platelets and transcriptional changes in murine MKs. 18 Stress thrombopoiesis is also thought to occur in myocardial infarction, based on the observation of elevated mean platelet volume and reticulated platelet fraction in the acute setting suggesting an upregulation of the HSC-MK axis. 19 Here, we characterize MKs and HSCs from human bone marrow, down to single-cell resolution, by full-transcript RNA sequencing.
Primary MK transcriptome analysis showed that low ploidy states have a high expression of archetypal platelet genes, which are then downregulated as the ploidy increases in favor of genes implicated in translation and protein localization. The origin of MKs was also investigated by performing single-cell RNA sequencing of human bone marrow HSCs from the same individuals. Within the HSC compartment, we identified two subpopulations of cells representing MK-primed HSCs. Finally, by sequencing MKs from individuals undergoing coronary artery bypass grafting following myocardial infarction, we showed changes in primary MK gene expression supporting a role for stress thrombopoiesis in this pathological state.

| MK and HSC isolation
Bone marrow for HSC and MK sorting was obtained from individuals undergoing cardiac surgery at Barts Health NHS Trust, London, after informed consent and ethical approval from London -City & East, REC 13/LO/1760 (BAMI Platelet Sub-study). Inclusion criteria transcriptional signatures of a single HSC, we identify two MK-biased HSC subpopulations exhibiting unique differentiation kinetics. We show that human bone marrow MKs originate from these HSC subpopulations, supporting the notion that they display priming for MK differentiation. Finally, to investigate transcriptional changes in MKs associated with stress thrombopoiesis, we analyzed bone marrow MKs from individuals with recent myocardial infarction and found a specific gene expression signature.
Our data support the modulation of MK differentiation in this thrombotic state.
Conclusions: Here, we use single-cell sequencing for the first time to characterize the human bone marrow MK transcriptome at different levels of polyploidization and investigate their differentiation from the HSC.

K E Y W O R D S
megakaryocytes, hematopoietic stem cells, platelets, single cell RNA-seq, thrombosis

Essentials
• Full transcript single cell RNA sequencing of primary human megakaryocytes and hematopoietic stem cells.
• Megakaryocyte of different ploidy levels exhibit distinct transcriptional states.
• Two megakaryocyte-biased hematopoietic stem cell populations were identified.
• Megakaryocytes from individuals with myocardial infarction exhibit a specific gene signature.
were: (1) heart valve replacement with no evidence of coronary artery disease on coronary angiography or (2) coronary artery bypass grafting in the context of recent myocardial infarction (within 6 months). Myocardial infarction was defined as acute presentation with chest pain, deviation of ST segments on electrocardiogram (either ST elevation myocardial infarction or non-ST elevation myocardial infarction), and rise in levels of troponin T assay. Exclusion criteria were: presence or history of hematological malignancy and abnormal platelet count and hemoglobin levels <85 g/L. Bone marrow scrapings following median sternotomy were collected into an ethylenediaminetetraacetic acid Vacutainer tube containing Dulbecco's phosphate buffered saline containing 10% human serum albumin.

| RNA sequencing and computational analysis
Single-cell libraries were prepared using the G&T-seq protocol 20  Adapters were trimmed <32 bp using TrimGalore! Reads were mapped to the human reference genome (GRCh37) using STAR. 21 Quality metrics were assessed for each sample. Expression in all cases was normalized to library size and ERCC spike-ins.
For HSCs, low-quality cells were filtered using a previously described support vector machine (SVM) approach 22 and HSC cDNApositive for GAPDH. For MKs, filtering of low-quality cells was performed using the same features, but in a five-round training of random forest models. 23 The most highly variable genes were filtered above technical noise using the Scater package 24 using previously described methods. 25 ERCC spike-ins were used to model the trend in technical variability. 26 For both HSC and MK single cells, unsupervised clustering of single cells was performed in a principal component analysis (PCA) with robustness of clusters determined by the Silhouette index. 27 Using SC3, cluster marker genes were identified based on their predictive value to separate each cluster from the rest with a p value (<0.001) determined using the Wilcoxon signed-rank test. 28 Cells were ordered into differentiation trajectories using the Monocle 2 single-cell analysis toolset 29,30 using five clusters, a maximum of three dimensions, and otherwise default parameters.
To assess expression of known hematopoietic gene signatures within single-cell clusters, gene signatures for each cell type in the DMAP dataset 31 were created using the LIMMA package. 32 Genes that were considered to be part of the gene signature log2 fold change of >1 and false discovery rate (FDR) <0.05. DESeq2 33 was used to find genes that were differentially expressed between different datasets in single MKs and MK pools with comparisons made between 4 N and 32 N ploidy levels or between disease and non-CAD control. Here, the Wald test was used for hypothesis testing.

| Data-sharing statement
First, we inspected the highly expressed genes. Among those, we found enrichment for mitochondrial and cell metabolism genes such as MT-RNR1, MT-RNR2, and CYTB and the gene encoding cyclooxygenase-1 (PTGS1), the protein acetylated by aspirin (Table S2), as previously described in platelet transcriptome studies. 38 In the 100 highest expressed genes from MK pools sorted by ploidy level, we observed largely a bimodal distribution in expression, 15 genes were mostly expressed in 4 N and 8 N MKs and were enriched GO terms related to platelet function (Table S3- GO terms enrichment analysis found GO terms associated with platelet degranulation, coagulation, hemostasis, wound healing, and vesicle-mediated transport in the lower ploidy MK transcriptome, whereas, with increasing polyploidization, we found GO terms related to translational initiation, elongation and termination, protein localization, and cellular protein complex disassembly ( Figure 2B and Tables S13, S14). Comparable results were found in the single-cell differential gene expression/GO analysis (Tables S15, S16).
To investigate signals from the bone marrow niche that might contribute to MK maturation in vivo, we annotated genes upregulated with accumulating ploidy in terms of their subcellular localization using the Ensembl database 39 (Table 1)

| MK lineage priming in ex vivo human bone marrow HSCs
To investigate the origin of MKs and gain insight into megakaryopoiesis in vivo, we characterized the HSC transcriptional landscape and early fate commitment events. We sequenced 884 single phenotypic HSCs (CD34+ CD38-CD45RA-CD90+ CD49f+ [ Figure S2], the compartment most enriched in long-term repopulating HSCs 9 ) isolated from fresh bone marrow harvested from a further five individuals undergoing sternotomy for heart valve replacement (Table S17). We demonstrated that these phenotypic HSCs do contain cells able to confer long-term hematopoietic reconstitution (Supplementary Methods and Table S19). From the sequenced HSCs, raw reads were filtered to exclude cells that gave low-quality libraries ( Figure S3) with a SVM learning method trained on a subset of samples made from cDNA positive for GAPDH, as measured by quantitative reverse transcriptase polymerase chain reaction (Table S18). Nineteen single HSCs were taken forward for transcriptome analysis based on quality control measures.
To identify subpopulations of cells within the HSC compartment, we used unsupervised hierarchical clustering of the cells' Pearson correlation coefficients in the PCA space to generate clusters ( Figure 3A) where total silhouette score was used to select the number of clusters tested ( Figure S4A). The robustness of our clustering strategy was confirmed by inspecting the first four principal components ( Figure S4B,C). The cells within the HSC population formed five distinct clusters. Marker genes for each cluster were then identified based on their predictive value to separate that cluster from F I G U R E 2 Development and polyploidization of bone marrow residing MKs. (A) Number of genes expressed related to specific GO terms at each MK ploidy level; data shown based on 100 most abundantly expressed genes. GO terms: platelet activation, blood coagulation, response to wounding, translation, protein localization. The full GO analysis may be found in Table S10-S14. (B) Two distinct transcriptional states in MK differentiation. Upper panel: Overrepresented GO terms in downregulated and upregulated genes with increasing ploidy, data shown are based on differential expression analysis between 32 N and 4 N MK 20 cell pools. Full GO analysis may be found in Tables S13,S14. GO, gene ontology; MK, megakaryocyte

| Developmental trajectories of early MK lineage priming in HSCs
To investigate if HSCs exhibited lineage priming potential, we performed trajectory analysis, independent of the previous clustering, using Monocle2. 30 We obtained potential developmental trajectories and five clusters, very similar to the previous ones, were found. This analysis showed two distinct branching points, with clusters 1 and 4 stemming from the same developmental trajectory ( Figure 4A).
To determine whether these trajectories could delineate early fate priming in HSC, we derived gene signatures from the highly purified hematopoietic populations in a previously published study 31 and visualized these signatures onto the trajectory plot. The signature for MK was found enriched in cells localized around branching point 2 and belonging to clusters 1 and 4 ( Figure 4B), whereas the gene expression signature for the megakaryocyte erythroid progenitor (MEP) was found enriched only in cells belonging to cluster 1 after the branching point 2 ( Figure 4C). These signatures are largely not overlapping and no single gene appears to drive enrichment ( Figure S5).

| Functional potential of MK primed clusters
We then retrospectively analyzed FACS index data and found that
We also compared MK transcriptional signatures in myocardial infarction as a model for stress thrombopoiesis with non-CAD controls. Despite finding no significant differences in platelet parameters and MK ploidy in myocardial infarction, our data showed that a number of genes upregulated in MKs from individuals with severe coronary disease and myocardial infarction compared with non-CAD controls were related to thrombus formation, such as PPBP, the gene encoding chemokine CXCL7, which plays a critical role in leukocyte migration through thrombi by binding its receptor CXCR1/2 on neutrophils and other leukocytes. 59,60 Similar higher levels of thrombospondin 1 (THBS1) have been found to be associated with peripheral artery disease and RAP1B that encodes a small GTP-binding protein involved in outside-in signalling via the collagen receptor integrin alpha2-beta1 (CD49b/29) on platelets; it also plays an important role in the cross-talk between alpha2-beta1 and integrin alphaIIb/ beta3 (CD41/61), the receptor for fibrinogen on MKs and platelets. 61 Hence, RAP1B has been suggested as a potential therapeutic target for a novel class of platelet inhibitors in myocardial infarction. 62 There is now compelling evidence for enhanced megakaryopoiesis as a pathogenic driver for atherosclerosis and myocardial infarction. 63,64 The clinical trial CANTOS 65 demonstrated improved cardiovascular outcomes in patients with myocardial infarction with Canakinumab, a monoclonal antibody targeting IL1B, a known driver of megakaryopoiesis both in vitro and in vivo. 66 We found that the gene encoding the AMPA glutamate receptor, GRIA1, was upregulated in MKs in individuals with myocardial infarction. As glutamate serum levels are increased in thrombosis 67 and interruption of glutamate binding to the NMDA glutamate receptor in megakaryocyte cell lines resulted in impaired megakaryopoiesis 68 ; this observation raises the possibility of a positive feedback mechanism of glutamate signaling leading to increased platelet production perpetuating a prothrombotic state. Therefore, our data also support a pathological role of stress thrombopoiesis in acute coronary thrombosis.
Limitations of this transcriptional study should be acknowledged, including the baseline characteristics of the patient populations.

CO N FLI C T O F I NTE R E S T
There are no conflicts of interest to declare.