A comprehensive targeted next‐generation sequencing panel for genetic diagnosis of patients with suspected inherited thrombocytopenia

Abstract Background Inherited thrombocytopenias (ITs) are a heterogeneous group of disorders characterized by low platelet counts and often disproportionate bleeding with over 30 genes currently implicated. Previously the UK‐GAPP study using whole exome sequencing (WES) identified a pathogenic variant in 19 of 47 (40%) patients of which 71% had variants in genes known to cause IT. Aims To employ a targeted next‐generation sequencing platform to improve efficiency of diagnostic testing and reduce overall costs. Methods We have developed an IT‐specific gene panel as a pre‐screen for patients prior to WES using the Agilent SureSelectQXT transposon‐based enrichment system. Results Thirty‐one patients were analyzed using the panel‐based sequencing, of which; 10% (3/31) were identified with a classified pathogenic variant, 16% (5/31) were identified with a likely pathogenic variant, 51% (16/31) were identified with variants of unknown significance, and 23% (7/31) were identified with either no variant or a benign variant. Discussion and Conclusion Although requiring further clarification of the impact of the genetic variations, the application of an IT‐specific next generation sequencing panel is an viable method of pre‐screening patients for variants in known IT‐causing genes prior to WES. With an added benefit of distinguishing IT from idiopathic thrombocytopenic purpura (ITP) and the potential to identify variants in genes known to have a predisposition to hematological malignancies, it could become a critical step in improving patient clinical management.


| INTRODUCTION
Inherited thrombocytopenias (ITs) are a heterogeneous group of disorders characterized by a sustained reduction in platelet count often manifesting as a bleeding diathesis. Since the discovery of disease inheritance patterns in disorders such as Bernard Soulier Syndrome (BSS), genetic studies of thrombocytopenia have been a vital tool in determining megakaryocyte and platelet physiology. 1 As a result of parallel whole exome and whole genome sequencing over the past 5-10 years, we are discovering increasing numbers of novel genes and variants with a critical role in platelet production, physiology, and function. [2][3][4][5] To date, there are 30 genes suspected to cause 26 separate forms of inherited thrombocytopenia making genetic diagnosis complex. 6 However, until recently, IT remained underdiagnosed with previous studies only providing a genetic diagnosis in just over 50% of individuals. [7][8][9] A genetic diagnosis provides clinical benefits for the patients. Some patients with a reduced platelet count have had unnecessary treatments and procedures such as immunosuppression and splenectomies and therefore establishing that they have an inherited component to their disease etiology would prevent this. In the case of suspected ITP this may be treated with steroids or immunosuppressive drugs with many side effects. Therefore, if such patients are proven to have an inherited thrombocytopenia, then these treatments are unnecessary. Some of the gene mutations in patients, eg, RUNX1, result in patients having a predisposition to hematological malignancies and once a genetic defect is proven, the information can be used to monitor the patients' hematological parameters more closely. These all highlight the need for a definitive genetic diagnosis and development of a targeted gene-specific sequencing platform will provide a quick and cost effective screening for patients with IT.
As new sequencing library-capture methods are developed, the speed of sample preparation time is vastly reduced. Thus, the recently released capture methods, Illumina Nextera Rapid Custom Methods: We have developed an IT-specific gene panel as a pre-screen for patients prior to WES using the Agilent SureSelect QXT transposon-based enrichment system.

Discussion and Conclusion:
Although requiring further clarification of the impact of the genetic variations, the application of an IT-specific next generation sequencing panel is a viable method of pre-screening patients for variants in known IT-causing genes prior to WES. With an added benefit of distinguishing IT from idiopathic thrombocytopenic purpura (ITP) and the potential to identify variants in genes known to have a predisposition to hematological malignancies, it could become a critical step in improving patient clinical management.

K E Y W O R D S
bleeding, gene mutations, targeted panel sequencing, thrombocytopenia Essentials • Inherited thrombocytopenias are a heterogeneous group of disorders with over 30 causative genes identified to date.
• We have developed an IT-specific gene panel to screen patients using the rapid Agilent SureSelect QXT transposon-based enrichment system.
• Candidate gene variants were observed in previously implicated IT genes in 77% of individuals; 10% of patients had a classified pathogenic variant, 16% had a likely pathogenic variant, 51% had a variant of unknown significance and 23% had no or a benign variant.
• Accurate genetic diagnosis could improve the clinical outcome for this group of patients with disproportionate bleeding for their reduced platelet count.
Capture Enrichment and Agilent SureSelect QXT , both propose an improvement in sample preparation without limitations in sequence depth, coverage, and accuracy. 10 When applied to small-scale custom gene panels, the preparation time can be reduced to one day. In addition, DNA input is also reduced allowing for amplification from <50 ng of DNA. 11 Due to the high percentage of variants within known IT genes as identified by whole exome sequencing (WES) in a previous study, 12 and the increasing advances in custom panel next generation sequencing, an IT-specific next-generation sequencing (NGS) panel was designed and included within the UK-GAPP patient workflow.
Incorporating a small custom panel prior to WES has the potential to filter out variants with a genetic etiology of disease within known ITcausing genes. Coupled with the Agilent SureSelect QXT transposonbased system of sample preparation, an increase in the efficiency of genetic diagnosis, as well as a reduction in the overall cost, can potentially be achieved.
Therefore, in this study we aimed to implement a NGS panel in the UK-GAPP patient workflow. The panel was designed to incorporate all genes known to be previously associated with IT, effectively pre-screening patients before WES. The targeted panel also takes advantage of a rapid sample preparation technique allowing for quick genetic diagnosis following patient phenotyping and improving overall diagnosis of recruited patients.

| Patients
Patients were recruited from participating UK hematology centers.
All patients had a bleeding history taken at the point of examination and inclusion into the study. Most patients suffered from mild bleeding symptoms including cutaneous bruising, bleeding, and epistaxis in addition to more severe bleeding symptoms in some cases. Detailed patient clinical symptoms related to bleeding that were available are displayed in Table 1

| Platelet counts and morphology
Platelet counts and morphology were measured from patients in whole blood using the Sysmex XN-1000 (n = 31). The PLT-F channel was used to determine platelet counts in whole blood and the immature platelet fraction (IPF). Mean platelet volume (MPV) was determined from the impedance PLT-I channel. All samples were processed in tandem with travel controls.

| Platelet preparation and platelet function testing
This study focuses on a subset of patients with a reduction in platelet count. Previous studies by the UK-GAPP study group have demonstrated the applicability of using light transmission aggregometry (LTA), including lumiaggregometry, for investigation of PRP samples having platelet counts exceeding 1 × 10 8 /mL 13 and an in-house flowcytometry assay to assess platelet function in patients having platelet counts in PRP of less than 1 × 10 8 /mL. 12

| Thrombocytopenia-specific panel sequencing
A thrombocytopenia panel was designed for use as an initial NGS (NGS) sequencing/pre-screen before whole exome sequencing in collaboration with the Regional Genetics laboratory at Birmingham Women's Hospital. Sequencing baits were designed with 2x density so that each desired region was covered by at least two overlapping probes. Baits were also designed with the strictest masking stringency settings possible. SureDesign masks repetitive sequences dependent on three masking tools: RepeatMasker, WindowMasker, and Uniqueness 35 track. The design software uses combinations of all three tools to create three masking stringencies which vary in their inclusiveness of repeat regions. If baits could not be found in the highest stringency possible, stringency was decreased until they could be found.
Eighteen genes were covered entirely using the highest stringency setting, eight genes were covered by a combination of high and moderate stringency settings, and the remaining four genes were covered by baits using a combination of all three stringency settings.   Dynabeads MyOne Streptavidin T1 magnetic beads were used for hybrid capture (ThermoFisher, #65601). Index tags were added using the SureSelect QXT P7 and P5 dual indexing primers. All thermocycling steps were performed using a Bio-Rad DNA Engine Tetrad 2 Thermal Cycler (Bio-Rad, UK). Magnetic separation was achieved using a DynaMag-96 Side magnet (ThermoFisher).
Samples were then pooled for multiplexed sequencing so that each index-tagged sample was in equimolar amounts in the pool.
For each sample the following formula was used to determine the amount of index sample to use. where

| Bioinformatics pipeline to determine candidate variants
Sequence data generated using the IT-specific NGS panel was analyzed using an adaptation of the pipeline developed for the analysis of WES data. 12 Variants were initially filtered on frequency, excluding variants with a MAF ≤0.01 in the 1000-G database. Synonymous variants not predicted to change the amino acid sequence in the protein coding transcripts were then excluded. As the panel was designed to include the 5′ and 3′ UTRs, variants were additionally Average platelet count = 88 × 10 9 /L (normal range to 2 SD 147-327 × 10 9 /L, n = 40). Average MPV = 11.1 fL (mean normal range to 2 SD 7.8-12.69 fL, n = 40). IPF was available for 20 patients and varied between 1.8-59.4% (normal range 1.3-10.8%, n = 40). Patients with an observed macro and micro thrombocytopenia are denoted by a + and -, respectively, following their most recent analyzed MPV. Secondary qualitative defects are abbreviated to the following; (CD41) reduction in the resting cell surface levels of CD41, (CD42b) reduction in resting cell surface levels of CD42b, (ADP) reduction in response upon ADP stimulation indicating a possible defect in the Gi pathway, (AA) reduction (cyclooxygenase pathway defect), (Adr) reduction (Thromboxane receptor pathway defect), (GPVI) reduction in surface GPVI quantity, (P-selectin) reduction (platelet alpha-granule/secretion defect), (fibrinogen) reduction in the binding of fibrinogen to activated platelets, (ATP secretion) reduction in ATP secretion upon stimulation with PAR-1 peptide 100 μmol/L. Bleeding diathesis of each individual is summarized under bleeding phenotype. AA, arachadonic acid; ADP, Adenosine diphosphate; Adr, adrenaline; ATP, Adenosine triphosphate; GPVI, Glycoprotein VI; IPF, immature platelet fraction; LTA, light transmission aggregometry; MPV, mean platelet volume. + denotes an elevated MPV/IPF; NA indicates parameter was tested but results were inconclusive; NT indicates parameter was not tested due to degraded or limited sample; UNK indicates the parameter was not known.

| Validation of IT-specific NGS panel
Validation of the IT-specific NGS panel was performed by analyzing the panel's sensitivity in detecting eight variants identified previously by WES analysis and confirmed by Sanger sequencing. Variants, at the time of validation, were likely candidate variants and include variants in genes not known previously to cause IT (see Table 2). All variants, excluding a previously identified frameshift causing insertion in TUBB1; c.1080_1081insG, p.Leu361Alafs*19 previously identified using WES in patient 31, 12 were successfully identified, presumably due to the sequence context around this genomic region.
All known candidate variants tested were the only candidate variants following bioinformatics analysis of panel sequencing results in each patient.

| Candidate variants observed and variant prevalence in 31 new patients
In total, DNA samples from 31 new patients were analyzed by an IT-specific NGS panel. All patients, with the exception of 64, were single affected cases. Patient 64 forms part of a pedigree of four affected family members which will be discussed in more detail in the discussion section. Following post-sequencing bioinformatics analysis candidate variants previously implicated in IT genes were observed in 77% of individuals (Table 3). In total, 37 variants were Of the 37 variants, 11 (30%) were novel and not previously identified in any of the databases scrutinized. Twenty-six variants have been observed previously and the prevalence of these variants in the ExAC database, unless otherwise stated, is displayed in Table 3.
When comparing all previously observed variants an average MAF of 0.00256 is noted. All variants were observed at a frequency of less than 0.01 and all previously identified variants, with the exception of rs111527738 which was present within the latest build of dbSNP. Four pathogenic or likely pathogenic variants were identified that are previously known to cause IT. These were found in pa- proband who suffers from easy bruising. 16 The patient's platelet counts varied between 86 to 94 × 10 9 /L at different times of testing and no other differences in hematological cell numbers were noted.
The patient was initially sequenced due to the presence of a rare X-

| Conservation, pathogenicity prediction, and variant classification
Conservation at the site of variation was determined by PhyloP and PhastCons in silico software. Conservation scores for all variants occurring within known IT-causing genes in the 31 patients are shown in Table 3. Average scores of 3.32887 and 0.829571 were observed across all variants in PhyloP and PhastCon analysis, respectively. The majority of variants occurred at sites of high conservation and the two methodologies used were in agreement in all instances.
Pathogenicity was predicted using in silico prediction software as displayed in Table 3. Classification often varied amongst the software used for each variant indicating the benign potential of the variants observed.
In total, of the 37 total variants noted across all patients investigated, three variants were classified as "pathogenic" and five "likely

| DISCUSSION
An IT gene-specific NGS panel was developed in order to pre-screen patients prior to WES. The aim was to filter out patients with variants in known IT-causing genes allowing subsequent focus on WES for patients who may harbor variants in novel genes. In addition, the cost implications were an important consideration given that the WES was more than four times as expensive compared with targeted panel sequencing. With a GC content of 73%, GP1BB often suffers from a reduction in coverage, which is why in WES analysis the gene was manually TA B L E 3 Variants identified by analysis of the IT-specific next-generation sequencing pane  Interestingly a reduction in cell surface expression of CD42b, encoded by GP1BA, was noted in patient 50, who harbors a potentially deleterious large deletion of GP1BB that spans two previously reported disease-causing variants. 23,24 Although not occurring in the encoded gene, the variant, due to the detrimental effect of a frameshift causing deletion, may have propensity to disrupt the stability of the receptor complex leading to a reduction in cell surface expression.
As with variants determined by WES analysis, the variants observed following the application of the IT-specific NGS panel require further conformational work to be determined disease causing.
Further work would focus around this point mainly, utilizing many of the biomarkers of disease attributed to variants in certain genes and recruiting related affected family members of previously analyzed patients. This will strengthen any initial genetic variants that may be indicative of disease through segregation analysis but it also has the propensity to spread disease awareness of an under recognized and under-diagnosed genetic disorder.
A possible lack of genotype-phenotype correlation shown in patients harboring variants in ITGA2B, GP1BA, and MYH9 in particular is an interesting observation, however, further work would be needed to validate this. The possibility that these variants are disease causing rests on the functional confirmation of the effect of variation. However, if causative, the patients represent a unique subset of each individual disease that does not share the typical phenotypic presentation of previous cases. The likelihood that patients exist without the secondary symptoms and qualitative defects in platelet function attributed to these disorders is therefore relatively high.
Seven patients in total were observed without any variants in genes of the IT-specific panel. The sequencing panel employed did not look at Copy Number Variations (CNVs) which could be present in the remaining patients studied. Due to the absence of variants within the panel of 30 genes, there is a high chance that the genetic etiology of disease is due to variants in novel genes not previously implicated in IT. Analysis of these patients in particular may progress our current knowledge of IT through the determination of novel causative genes. 25