Novel gene variants in patients with platelet‐based bleeding using combined exome sequencing and RNAseq murine expression data

Essentials Identifying genetic variants in platelet disorders is challenging due to its heterogenous nature. We combine WES, RNAseq, and python‐based bioinformatics to identify novel gene variants. We find novel candidates in patient data by cross‐referencing against a murine RNAseq model of thrombopoiesis. This innovative combined bioinformatic approach provides novel data for future research in the field.


| INTRODUC TI ON
The UK Genotyping and Phenotyping of Platelets (GAPP) 1 study previously reported a cohort of 55 patients with whole exome sequencing (WES) recruited from hemophilia centers nationwide with a significant clinical history of bleeding. 2 In this group, candidate variants and a genetic diagnosis of platelet-based bleeding was provided in 40 patients, yielding a detection rate of 72.72% candidate variants in this initial cohort using the "manual" filtering method of Johnson et al 2 ( Figure 1A).
More recently, the GAPP study has expanded patients with available WES data to include 129 individuals with evidence of clinical bleeding. Of these, 55 individuals have had a definitive genetic diagnosis to identify novel disease-causing genes (eg, SLFN14) 3  Here, we assess these individuals through a series of iterative bioinformatic filtering approaches to identify novel and "known" candidate disease-causing variants.

| Patient recruitment and testing
Patients were consented and recruited to the GAPP study from multiple collaborating hemophilia centers across the United Kingdom and Ireland, as previously described, 1 and approved by the UK National Research Ethics Service by the Research Ethics Committee of West Midlands (06/MRE07/36). Peripheral blood was collected from patients and platelet phenotyping using lumi-aggregometry or flow cytometry was performed on platelet-rich plasma, as previously described. 6,7 Our study cohort consisted of 129 patients with a strong history of bleeding and suspected of having a platelet function disorder of unknown cause as previously described. 1,2 The ISTH Bleeding Assessment Tool (BAT) showed a mean score of 9.825 overall (range 2-23). Platelet counts for the patient group ranged from 43 to 428 × 10 9 /L and the average platelet count was 232 × 10/L. The mean platelet volume in patients tested ranged between 8.3 fL and 15.1 fL.

| RNA Sequencing
RNA sequencing (RNAseq) data was provided by KRM and generated as reported in Machlus et al. 8 Briefly, two independent isolations of mouse megakaryocytes (MK; at the round MK, proplatelet, and preplatelet/releasate stages) were performed (C57b1/6, 1 male and 1 female pool, 4-8 mice). Sequencing and analysis were performed as described by Rowley et al, 9 using the Useq analysis package (applying DESeq's negative binomial test). 10,11 For analysis, a P value < .05 was used and a false discovery rate of 5%, to yield a total of 7094 (3235 up-regulated and 3859 down-reg-

| Analysis approaches
WES analysis was performed in patient genomic DNA, as previously reported. 2 To improve filtering of candidate genetic variants, further bioinformatic analysis was performed using python in the Pycharm IDE. Exomes were first uploaded as pandas data frames, before filtration to identify rare (as in previous work this is classified as below a frequency of 0.0001 in exome variant server and the 1000 genomes) and novel nonsynonymous variants as described by Johnson et al. These were cross-referenced against a known list of genes that have been previously implicated in inherited causes of bleeding. Relevant gene panels used were Inherited Bleeding WES to identify candidate variants forming the basis of future study in a significant number of undiagnosed patients.

K E Y W O R D S
bleeding, platelets, megakaryocytes, genetics, RNAseq

Essentials
• Identifying genetic variants in platelet disorders is challenging due to its heterogenous nature.
• We combine WES, RNAseq, and python-based bioinformatics to identify novel gene variants.
• We find novel candidates in patient data by crossreferencing against a murine RNAseq model of thrombopoiesis.
• This innovative combined bioinformatic approach provides novel data for future research in the field. and further details are available on request. proplatelet-forming MK population (y-axis, Figure 2C) and the proplatelet-forming MK versus released proplatelets (x-axis, Figure 2C).

| RE SULTS AND D ISCUSS I ON
Therefore, these data represent an array of genes, which are up-and down-regulated during and throughout thrombopoiesis.
We next filtered this expression data set against the array of We use a murine MK model of thrombopoiesis as a model RNAseq data set against which we filter our unknown cohort of patient genomic data. To our knowledge, this is the first such attempt to interrogate a clinical cohort of patients with bleeding using a combined WES and RNAseq bioinformatics approach. Although murine models of thrombopoiesis are well established and extensively used in the study of platelet-based bleeding because of the high degree of homology between the species, restricting our analysis to this data set is a likely limitation of the study. Future work will focus on applying this approach in parallel to human models of thrombopoiesis, including CD34-and iPSC-derived MKs.
By applying our methods, we generated a large pool of candidates that will become the focus of future studies as we attempt to dissect a well-studied but poorly understood cohort of patients.
Further study is needed to definitively prove the involvement of candidate genes and as such, future work will focus on the re-recruitment of patients for mechanistic investigation.

CO N FLI C T O F I NTE R E S T
The authors report no conflicts of interest. Novel Note: Individuals with established platelet defects (secretion and thrombocytopenia) and significant BAT scores are reported with newly uncovered candidate mutations in genes expressed in megakaryocytes, but previously unstudied. Frequencies reported according to the latest data on gnomAD (https://gnomad.broad insti tute.org/)