A comprehensive bioinformatic analysis of 126 patients with an inherited platelet disorder to identify both sequence and copy number genetic variants

Inherited bleeding disorders (IBDs) comprise an extremely heterogeneous group of diseases that reflect abnormalities of blood vessels, coagulation proteins, and platelets. Previously the UK‐GAPP study has used whole‐exome sequencing in combination with deep platelet phenotyping to identify pathogenic genetic variants in both known and novel genes in approximately 40% of the patients. To interrogate the remaining “unknown” cohort and improve this detection rate, we employed an IBD‐specific gene panel of 119 genes using the Congenica Clinical Interpretation Platform to detect both single‐nucleotide variants and copy number variants in 126 patients. In total, 135 different heterozygous variants in genes implicated in bleeding disorders were identified. Of which, 22 were classified pathogenic, 26 likely pathogenic, and the remaining were of uncertain significance. There were marked differences in the number of reported variants in individuals between the four patient groups: platelet count (35), platelet function (43), combined platelet count and function (59), and normal count (17). Additionally, we report three novel copy number variations (CNVs) not previously detected. We show that a combined single‐nucleotide variation (SNV)/CNV analysis using the Congenica platform not only improves detection rates for IBDs, suggesting that such an approach can be applied to other genetic disorders where there is a high degree of heterogeneity.

to investigate the molecular mechanisms of this group of disorders, it is often best to address the gene(s) already implicated in these bleeding disorders in the first instance, and then specifically to investigate how the genetic variants can disrupt the gene function (Nurden et al., 2012;Peyvandi et al., 2006). An increasing number of new genes and their variants have been discovered, which are implicated in megakaryocyte differentiation and/or platelet production and function .
The UK Genotyping and Phenotyping of Platelets study (UK-GAPP; https://www.birmingham.ac.uk/research/cardiovascularsciences/research/platelet-group/platelet-gapp/index.aspx) has recruited over 1000 patients based on a history of suspected bleeding disorders of unknown cause from over 25 collaborating hemophilia care centers across the United Kingdom. Recruited patients underwent a combination of platelet phenotyping and genotyping to determine the likely causative genes attributable to their specific defects (Jones et al., 2012;Watson et al., 2013). Gross hematological analysis and light transmission aggregometry and/or flow cytometry were used to identify thrombocytopenia (low platelet counts), platelet function, and cell signaling defects. Following this, targeted genetic analysis was employed and revealed variants, both novel and known, to be causative of bleeding in patients.
High-throughput sequencing technologies including wholeexome sequencing (WES) and whole-genome sequencing are valuable tools used to uncover novel variants in platelet-specific genes.
Over the past 10 years, such techniques have revealed many causative variants, therefore assisting in providing a clear diagnosis for some patients with severe bleeding disorders (Bastida et al., 2018;Daly et al., 2014;Downes et al., 2019;Leinøe et al., 2017). In addition, targeted next-generation sequencing (NGS) panels can be used to highlight platelet-specific genes that have been previously implicated in bleeding disorders. NGS panels can be employed in a clinical diagnostic setting or used for prescreening, filtering out patients with variants in known genes, and subsequently employing WES for those who may harbor variants in novel genes (Johnson et al., 2018;Simeoni et al., 2016). This approach was applied in the UK-GAPP study where patients with known mutations in hemophilia A and B or coagulation mediated genes, known to cause bleeding were eliminated. However, many of these panels do not search for copy number variations (CNVs), and indeed we, and others have not found definitively causative variants in approximately 40%-50% patients despite a strongly indicative inherited component for their bleeding (Bastida et al., 2018;Johnson et al., 2018;Johnson, Lowe, et al., 2016;Leinøe et al., 2017;Lentaigne et al., 2016). In this study, we address this by applying a newly established, comprehensive genetic analysis software that detects both single-nucleotide variations (SNVs) and CNVs. Congenica (https://www.congenica.com) is an automated clinical decision support platform that was used to analyze and rapidly interpret the WES data of 126 patients recruited to the UK-GAPP study. Users are able to prioritize and review genetic variants, as well as assign pathogenicity, after which the software calculates overall pathogenicity based on the American College of Medical Genetics and Genomics (ACMG) guidelines (Richards et al., 2015). It collates all essential information to make an informed and robust decision for the identification of causal genetic variants.
The Congenica platform is primarily applied for genetic diagnostics and is routinely used in clinical laboratories for variant validation and reporting. For the first time, we show its utility in interrogating a large cohort of patients recruited to the UK-GAPP research study. Using this approach, we perform a robust and comprehensive analysis to find both known and novel genetic variants with plausible association with disease, including rare CNVs not previously detected. Combined with extensive patient phenotypic studies, this provides a potent tool for the dissection of the genetic causes of bleeding in a cohort which, thus far, remains genetically unresolved despite an extensive clinical presentation of familial bleeding.

| Hematological evaluation and platelet phenotyping of patients
To initially classify patients as having a platelet defect and determine their platelet defect subtypes, they underwent an initial hematological workup and extensive platelet function testing workflow. These methods can be seen in detail in the Supporting Information Methods section.

| WES
WES was performed on the genomic DNA of 117 patients in this study as previously described (Johnson, Lowe, et al., 2016 then be used for sequence alignment and variant calling of SNVs, small insertion/deletion (indels), CNVs (Figure 1), and coverage (Table S3).
The analytical pipeline for the detection of CNVs in genes involved in the IBDs panel was employed using the ExomeDepth coverage approach. The exome read depth of the target patient's sample was compared against the read depth of a reference panel (up to 10 WES samples of each gender) to detect regions with different coverage which could represent a CNV event.
Using the Congenica software, the lower limit that the Exome-Depth calling software uses for CNV calling is ≥20 sequence reads.
This ensures that ExomeDepth does not consider low quality reads when comparing the reference samples to the target patient.

| Platelet phenotyping
Recruited patients were subjected to an initial hematological analysis and extended deep platelet phenotyping using the previously published workflow (Johnson, Lowe, et al., 2016). Phenotyping outcomes can be seen in detail in the Supporting Information Results section (Figures S1 and S2; Table S2).

| Validation of WES analysis with known variants
Validation of the WES analysis in the GAPP study was performed using Congenica software. Five different known genetic variants were identified previously by WES in nine patients with a suspected IBD (Fletcher et al., 2015;Johnson, Lowe, et al., 2016; Table 1). We used two trios (one parent and two affected children) and three unrelated individuals, all with known or likely pathogenic variants in platelet-or megakaryocyte-related genes. This analysis was conducted in a blind manner to assess the reliability and robustness of the software in correctly highlighting all known genetic variants in these patients. Using panels of genes implicated in IBDs, the first trio  I G U R E 1 Congenica pipeline overview for processing of whole-exome sequencing (WES) data. Adapted from https://www.congenica.com/. The informatic strategy shown is used to incorporate both single-nucleotide variation (SNV) and copy number variation (CNV) analysis. The raw WES data are inputted as either FASTQ, or BAM files followed by alignment to the reference genome. SNV calling is then performed to generate VCF files and subsequent in silico tools to determine the pathogenicity of variants. Simultaneously, CNV analysis is performed using a predefined reference and sex-matched WES panel and fed into ExomeDepth for CNV calling in the WES samples patient 8, a missense variant c.659T>A p.(Val220Asp) in SLFN14 and finally in patient 9, a stop gain mutation c.1611C>A p.(Cys537Ter) located within THBD was identified. All known variants found in the patients were successfully verified by Congenica software against our previously analyzed WES data (Table 1).  (Table S3). The ExomeDepth integrated tool was used to determine CNV based on read coverage (Table 3). First, WES data of the 117 patients were analyzed by filtering using an IBDs gene panel (Table S1). Variants were then filtered within the software based on the exclusion criteria initially stated in the GAPP study. A rare variant cutoff or minor allele frequency (MAF) of <0.01 in each data set was used and synonymous and intron variants ±5 base pairs away from the exon-intron boundaries were excluded. Non-shared variants between the same affected family members were also eliminated.

| WES analysis to identify new SNVs and CNVs using the Congenica platform
Following exclusion of variants based on these criteria, a range of between two and six variants (SNVs, small indels, and splice site) were noted per patient (Table 2). In silico pathogenicity prediction tools that have been integrated into the Congenica software were employed for further analysis (Table 2)  which was subsequently classified as pathogenic.

| CNVs found in the patient cohort
Overall, the CNV analysis using the integrated ExomeDepth tool revealed an average of four CNVs per exome (n = 15; Table 3). There were three rare structural variants covering large regions on chromosomes 11 and 17 and encompassing numerous genes, including T A B L E 1 Nine patients and the five known candidate variants used for validation of the Congenica software  Following ExomeDepth alignment with a panel of controls the reads ratio was around 0.5 which indicates heterozygosity, as observed in Table 3. A rare CNV gain was noted in patient 45 within TBXA2R on chromosome 19p13.3 and containing four genes in total (Figure 3c). The CNV reads ratio was 2.72 which is indicative of a heterozygous insertion.   (Nowakowska, 2017;Valsesia et al., 2012).

| Oligogenic findings in patient cohort
Paris-Trousseau syndrome, characterized by a bleeding defect with large α-granules and abnormal megakaryocyte morphology is well documented, which is caused by a dominant inheritance of q23 deletion on chromosome 11 (Stevenson et al., 2015). Patients with this disorder have variable size of chromosomal deletion associated with different components of the syndrome. A hemizygous deletion of FLI1 was attributed to the platelet defect in two individuals of our cohort. These CNVs noted in patients 35 and 71 cover the deletion region in FLI1 and are also surrounded by several flanking genes.
Both patients presented with thrombocytopenia and a secretion defect which suggest the platelet phenotype and the CNVs in FLI1 to be associated with their disorders. Thromboxane receptor deficiency is an autosomal recessive or dominant disorder characterized by bleeding symptoms associated with quantitative or qualitative defects within the thromboxane receptor (Mundell & Mumford, 2018).
Although we did not find any plausible candidate SNVs in the 119 candidate genes or the thromboxane receptor in patient 45, we did note a rare CNV duplication in the TBXA2R gene and deduce that either alone or in combination with variants in GP6 and SLFN14 T A B L E 3 (Continued)  In summary, we show validation and a practical approach of a robust diagnostic platform that can be employed for WES analysis. In this study, we use data from a cohort of patients with suspected IBDs; a broad category of diseases, well acknowledged in the hematology field as difficult to classify and associate to single causative genetic abnormalities. This study has shown the ability of the software to detect CNVs with high efficiency with the use of targeted gene panels as a replacement of traditional methods for detecting CNVs.
To conclude, our data reveals use of a highly sensitive and valuable tool which can be used for detecting SNVs and CNVs based on WES data. To our knowledge, this is one of the first studies, although in a research setting, to implement this software for both SNV and CNV analysis. We see this as a leap forward in the ability to classify highly complex disorders with a high degree of heterogeneity within the wider scientific community providing concise and definitive diagnosis for patients.

ACKNOWLEDGMENTS
We thank the Congenica software team, particularly Maria Valencia