Diagnostic application of a capture based NGS test for the concurrent detection of variants in sequence and copy number as well as LOH

Whole exome sequencing (WES) has made the identification of causative SNVs/InDels associated with rare Mendelian conditions increasingly accessible. Incorporation of softwares allowing CNVs detection into the WES bioinformatics pipelines may increase the diagnostic yield. However, no standard protocols for this analysis are so far available and CNVs in non‐coding regions are totally missed by WES, in spite of their possible role in the regulation of the flanking genes expression. So, in a number of cases the diagnostic workflow contemplates an initial investigation by genomic arrays followed, in the negative cases, by WES. The opposite workflow may also be applied, according to the familial segregation of the disease.


| BACKGROUND
The identification of the causative DNA lesion is central to genomic medicine, although in most cases it is complicated by genetic heterogeneity, variable expressivity, and incomplete penetrance. The emerging literature on digenic inheritance 1,2 makes further challenging the identification of disease-causing genes and variants.
The expectations created by precision medicine and the immense funding dedicated to it worldwide have made urgent to get the molecular diagnosis in genetic diseases, in order to entertain the more appropriate therapeutic strategy for the patient, and any possible prevention in relatives and fetuses at risk.
Since the last 10 years, genomic arrays have represented the first-tier analysis in different clinical conditions, both sporadic and familial, [3][4][5] allowing to detect Copy Number Variations (CNVs) or, when SNP-specific probes are included in the platform, also regions of copy-neutral loss of heterozygosity (cnLOH). The latter may be responsible for a clinical condition if the affected region contains either imprinted genes or at least one disease-variant, heterozygous in 1 parent and reduced to homozygosity. 6,7 More recently, Next Generation Sequencing (NGS) technologies have made increasingly accessible the identification of both single nucleotide variants (SNVs) and small insertions/deletions (InDels), largely overcoming the problem of genetic and phenotypic heterogeneity. 8 However, among the 141 000 genomic lesions reported in the Human Gene Mutation Database (HGMD), 10% consists of the total or partial deletion or duplication of the disease-associated genes. 9 Moreover, the association of a CNV with a loss of function or hypomorphic SNV in the other allele accounts for a proportion of autosomalrecessive diseases. 10,11 Although whole-genome sequencing (WGS) is able to provide information regarding point mutations and structural abnormalities, the cost and the objective difficulty in interpreting the numerous variants impair its application on a routine basis.
In contrast, the easier handling of the whole exome sequencing (WES) data made it part of the diagnostic routine in many genetics laboratories. Incorporation of the CNVs detection in the WES bioinformatics pipelines is more and more common, although no standard protocols are so far available, and the accuracy of the CNVs call is influenced by several factors, including the panel design, the sequencing technology, the reads length, and the local sequence context. [12][13][14] Obviously, CNVs in non-coding regions are totally missed by WES, in spite of their possible role in the regulation of flanking genes expression. 15 Therefore, in a number of laboratories an initial investigation by genomic arrays is followed, in the negative cases, by WES analysis or sequencing of specific panels of genes fitting with the proband phenotype.
Thus, achieving a molecular diagnosis often involves multiple different investigations that are planned case-by-case based on the more probable hypothesis. The availability of a single test, able to identify at the same time most of the molecular causes of the genetic disorders, would be of great advantage, allowing to reduce the number of tests, the final cost, and the reporting time-frame.
We conducted a validation study on a cohort of cases with previously identified genomic alterations, which was re-analyzed by using a commercial target-enrichment kit, allowing the concurrent detection of CNVs, SNVs/InDels and LOH events. In most of these cases, several tests had been applied before to reach a diagnosis, whereas by using this approach we would have been able to highlight the molecular etiology of the disease by a single experiment.

| Clinical details and previous molecular investigations
The details for the reported cases are summarized in Table 1
Kidney and brain echography were normal, whereas echocardiography showed patent foramen ovale.
Two angiomas were noticed, a frontal one extended up to the eyelids, and a nuchal one. Karyotype analysis on peripheral blood lymphocytes, showed the presence in all the metaphases of a SMC (supernumerary marker chromosome) that was classified, after array-CGH (180 K, Figure 1) and FISH analyses, as a neocentric inv dup (17) (p13.3): 47,XY,+mar.arr[hg19] 17p13.3(51 885-1 879 066)x4 dn. As the average log 2 ratio the amplified region was 0.83, we could not rule out that the marker was present in mosaic in the patient blood, at least in 80% of the cells. ( Figure S1, Supporting information).

| Case 9
The patient was ascertained at the age of 12 years because of severe scoliosis. Her growth parameters have always been below the third centile, and facial and body asymmetry with right hemibody hyperplasia were evident. She showed facial dysmorphisms, renal fusion with discoid kidney and ostium secundum atrial septal defect, surgically treated at 7 years of age. She also showed hyper-and hypopigmented skin lines.
Cytogenetic investigations were performed on both peripheral blood lymphocytes and cultured skin fibroblasts. The blood karyotype resulted to be 46,XX, whereas a trisomy 22 was found in two-third of the fibroblast metaphases: 46,XX [15]/47,XX + 22 [31]. Array-CGH (180 K) from skin fibroblasts showed the trisomy 22 with an average log 2 ratio of +0.48 ( Figure S1).

| Case 10
A 7-year-old boy was referred to our unit to re-examine a rearrangement detected at prenatal cytogenetic investigations. He presented with normal auxological parameters but a mild language delay requiring the support of a speech therapist.  The aberration was not detected by using standard parameters, but identified by a dedicated analysis.

| Bioinformatics analysis and variants interpretation
We used an in-house pipeline that implemented for CNVs analysis by EXCAVATOR. 25 Accordingly, reads were aligned to the human genome reference (GRCh37/hg19). Data processing and variant annotation was performed as previously reported. 26

| Disease-associated CNVs
In samples 1 to 4, 6 and 7, we correctly detected all the CNVs previ- mately 10% to 90% of cells for loss and LOH events, and in approximately 20% to 80% of cells for the gain ones. 30,31 In our cases, the mosaic duplication was of a size much larger than 2 Mb and was present in more than 50% of cells, as estimated by conventional cytogenetics.

| Critical cases
The deletion in cases 5 and 16 was not detected by EXCAVATOR under standard parameters but we were anyway able to pick-up and characterize them at bp level, by SureCall (case 5) and/or manual  Figure 4E). In this case, no reads mapping on the breakpoints were available.

| CONCLUSIONS
The purpose of our study was to investigate the ability of a single test, based on target enrichment and NGS, to identify different types of genomic lesions, including SNVs/InDels, LOH, and large deletions/ duplications not necessarily containing protein-coding genes.
Different approaches for CNVs detection from WES data have already been reported, demonstrating an increased diagnostic yield up to 6% (an average of 2%), that can be obtained without additional direct laboratory costs, but by optimization of the data analysis. 35 In our cohort, 3 cases (cases 1, 11 and 12) were fetuses with severe ultrasound abnormalities requiring invasive sampling, cytogenetic and molecular investigations. Moreover, the condition affecting other 7 cases might have been suspected prenatally (cases 2, 3,7,9,[13][14][15]. In all these cases a targeted sequencing approach integrating SNVs/InDels, CNVs and LOH analysis in a single NGS experiment, would have had the undoubted advantage of a rapid diagnosis. These findings further strengthen that our approach is particularly useful in a prenatal setting, where the time-frame for a genetic diagnosis is short.
In our cohort, the turnaround time from DNA extraction to data analysis and reporting was on average 15 days that could be possibly reduced by the novel approaches for faster library preparation. Since the platform we used contains only genes whose variants associate with known pathological conditions, the need for extended segregation studies in the family or in other families with similar pathologies is negligible. This fact dramatically shortens investigation times over the entire exome platforms, especially if parental and sibship DNA is already available for Sanger analysis.
Genomic arrays are able to diagnose up to 10% of fetal malformations with normal karyotype, 36,37 whereas the application of WES, with or without CNV detection, has demonstrated a variable percentage of successful diagnosis in small cohorts of fetuses with ultrasound abnormalities. 38,39 This study demonstrates that OneSeq was able to detect a wide range of genomic events, largely overcoming the limitations for CNVs detection by WES, mainly due to the uneven distribution of the reads, restricted to exons. We obtained high-level concordance between the 2 different pipelines we used, and between the size of the CNVs identified by the array and OneSeq approach (Table 1).
Data analysis remains the major bottleneck of NGS in general and CNV detection particularly.
As for the minimal resolution for CNVs analysis in diagnostics, the panel we used, having a backbone resolution of 300 kb, is able to detect CNVs of 400 kb or larger, as recommended by the ACMG Standards and Guidelines for constitutional cytogenomic microarray analysis. 40 This approach is a good compromise as long as WGS will be available in diagnostics, thanks to lower costs and proper interpretation algorithms. Software development and standardization as well as large prospective cohort studies are required to reinforce the benefit of every possible panel allowing combined detection of CNV, SNV and cnLOH.