Reduction of heterozygosity (ROH) as a method to detect mosaic structural variation

Dipartimento di Scienze agroalimentari, ambientali e animali, Universit a di Udine, Udine, Italy Istituto di Genomica Applicata (IGA), Udine, Italy IGA Technology Services, Udine, Italy Parco Tecnologico Padano, Lodi, Italy Institute of Biosciences and Bioresources, National Research Council, Sesto Fiorentino (Firenze), Italy Department of Mathematics and Computer Science, University of Udine, Udine, Italy

Structural variation (SV) is determined by genome rearrangements that affect large (>1 kb) segments of DNA such as inversions and balanced translocations (balanced SV) or insertions, deletions and duplications (unbalanced SV) (Feuk et al., 2006). SV occurred in somatic cells originating from the same fertilized egg and affecting only a portion of the cells of an organism give rise to mosaic structural variation (King et al., 2015). Mosaicism is relevant in fruit tree breeding, where valuable seedlings are immortalized by vegetative propagation. Grapevine (Vitis vinifera) is a vegetatively propagated crop in which mosaicism contributes to phenotypic differences within cultivars. One example is Pinot noir, a black-berried cultivar that gave rise to several clones harbouring somatic and/or mosaic variation such as Pinot blanc, Pinot gris and Pinot Meunier.
The absence of pigmentation in Pinot blanc has been attributed to a heterozygous deletion affecting a region of chromosome 2 involved in anthocyanin synthesis (Yakushiji et al., 2006). Pinot gris has grayish berries due to a localized synthesis of anthocyanins in epidermal cells. The causal mutation was originally identified as a deletion occurring only in the L2 layer of chromosome 2, which removed the functional alleles of two adjacent MybA genes (Vezzulli et al., 2012). Recent experiments confirmed the mosaic nature of the event, but suggested that the causal mutation is a large balanced SV consisting in the replacement of the chromosomal region carrying the red haplotype with the corresponding region carrying the white haplotype (Pelsy et al., 2015). Currently, next-generation sequencing (NGS) is the technology of choice for the detection of SV. Available methods are based on Paired End Mapping (PEM) or Depth of Coverage (DOC) (Marroni et al., 2014;Medvedev et al., 2009), but these approaches are not ideal for the detection of mosaic SV. PEM signatures reside in the flanking regions of the breakpoint and are obtained by read pairs mapping at distances that exceed the library insert size distribution. When a low proportion of molecules carry the deletion, the probability to distinguish true signals from noise is low. In addition, PEM is best suited for relatively small variation (usually <50 kb). DOC methods can identify larger variation, but also suffer from a high noise-to-signal ratio, unless only very high deviations in coverage are considered. Detection of small fluctuations can be confounded by the presence of short strong variances generated by the presence of repetitive elements or other sequencing biases. In addition, DOC methods cannot detect balanced SV.
Single cell sequencing has been used to detect mosaic SV (McConnell et al., 2013), but this strategy requires to individually sequence a high number of cells. We reasoned that mosaic SV should result in a reduction of heterozygosity (ROH) that can be detected using heterozygous SNPs in NGS data. In heterozygous genomic positions, the two alleles are expected to be equifrequent (p = q = 0.5), and heterozygosity is 2*p*q = 0.5, which is the maximum theoretical value. When a mosaic SV occurs, the frequency of the two alleles varies, thus leading to a reduction of heterozygosity (ROH). We present v-scan, a software that addresses the need for a method specifically developed for mosaic SV detection that is more accurate and sensitive than standard approaches for SV detection and yet does not require single cell sequencing.
A schematic representation of deviations in relative allele frequency caused by different mosaic SV is provided in Figure 1. Χ-scan uses information on SNP heterozygosity from two samples of somatic cells originating from a single zygote. These could be tumour and normal tissue from a patient, two plant clones obtained by vegetative propagation or cells from two different parts of the same plant. Details on the implementation of v-scan can be found in Methods S1, and a schematic representation and the pseudocode of the algorithm are presented in Figure S1 and Table S1, respectively. Briefly, the distortion of the relative allele frequency of heterozygous SNPs in one sample compared to the other is used as a proxy to detect a reduction of heterozygosity (ROH).
We tested the ability of v-scan to identify simulated deletions occurring as mosaics in varying proportion of cells by computing F1 scores at decreasing relative abundance of reads carrying the variant allele and using different detection thresholds (Methods S1). The performance of v-scan was compared with that of other tools for the detection of SV: DNAcopy, BDmax, Control-FREEC and DELLY. Χ-scan showed significantly higher median F1 score than all other tools in almost all the simulated scenarios. Overall, the average F1 score achieved by v-scan across all simulated scenarios was 50%, compared to 29% achieved by DNAcopy, 41% by BDmax, 42% by DELLY and 16% by Control-FREEC ( Figure S2). Our results show that even SV observed in less than 10% of the reads (chromosomes) can be detected by v-scan ( Figure S2).
Previous analyses have shown the mosaic nature of the mutation in Pinot gris that involves the L2 cell layer but not the L1 one (Pelsy et al., 2015;Vezzulli et al., 2012). Figure S3 shows the results of the identification of SV in chromosome 2 of Pinot gris (left) and Pinot blanc (right) with several tools leveraging NGS data. While all the methods identified the SV in Pinot blanc, vscan was the only tested algorithm able to detect the SV in Pinot gris, identifying a strong ROH signal. Our results pinpointed the subcentromeric boundary of the SV at 13.6 Mbp and the telomeric boundary at 18.78 Mbp, at the end of chromosome 2.
To fully understand the nature of the identified SV, we integrated v-scan results with DOC information obtained using DNAcopy ( Figure S4). We estimated the proportion of reads carrying the SV in Pinot gris to be 0.17, meaning that 66% of Pinot gris cells are carrying the variant allele, according to equation 1 in Methods S1. In spite of the high proportion of mutated chromosomes, there is no evidence of alteration in the coverage ratio of Pinot gris to both Pinot blanc and Pinot noir (top and centre plot, respectively) in the region affected by structural variation in Pinot gris (blue box in Figure S4), suggesting a balanced nature of the SV. On the contrary, the coverage of Pinot blanc showed a decrease in the region affected by the Pinot blanc deletion, relative to Pinot gris and Pinot noir (red box, centre and bottom plot, respectively), confirming the unbalanced nature of the SV. These results challenge the former interpretation that the observed large SV in Pinot gris is due to a deletion (Vezzulli et al., 2012), and confirm the more recent hypothesis of the variation being a chromosomal replacement (Pelsy et al., 2015). This might be the consequence of a double-strand break, caused by DNA repair via break-induced replication (BIR) (Llorente et al., 2008) or of a mitotic crossing over (Lee et al., 2009). Replacement of one The major limitation of v-scan in the detection of unbalanced SV is that it gives no information on the direction of the variation. However, this information can be determined by integrating coverage data with v-scan results. v-scan is the first software specifically developed for the detection of mosaic SV and can detect SV that do not involve copy number variation, such as mitotic crossing over or break-induced recombination.
v-scan will help the identification of mosaic structural variation associated with novel phenotypic variation in several crop species, thus enhancing marker assisted breeding, and will be of help in the identification of genetic variation in clonally propagated crop species. v-scan is freely available at https://bitbucket.org/dsca glione/xscan/overview.

Supporting information
Additional Supporting Information may be found online in the supporting information tab for this article: Figure S1 Representation of the algorithm implemented in vscan. Figure S2 Performance of several tools in the detection of mosaic SV. Figure S3 Structural variation detected in V. vinifera chromosome 2 by different methods. Figure S4 Depth of Coverage analysis. Table S1 Simplified pseudocode to summarize the main steps of the algorithms. Methods S1 Supplementary methods.