Whole-genome profiling and shotgun sequencing delivers an anchored, gene-decorated, physical map assembly of bread wheat chromosome 6A

Bread wheat (Triticum aestivum L.) is the most important staple food crop for 35% of the world's population. International efforts are underway to facilitate an increase in wheat production, of which the International Wheat Genome Sequencing Consortium (IWGSC) plays an important role. As part of this effort, we have developed a sequence-based physical map of wheat chromosome 6A using whole-genome profiling (WGP™). The bacterial artificial chromosome (BAC) contig assembly tools fingerprinted contig (fpc) and linear topological contig (ltc) were used and their contig assemblies were compared. A detailed investigation of the contigs structure revealed that ltc created a highly robust assembly compared with those formed by fpc. The ltc assemblies contained 1217 contigs for the short arm and 1113 contigs for the long arm, with an L50 of 1 Mb. To facilitate in silico anchoring, WGP™ tags underlying BAC contigs were extended by wheat and wheat progenitor genome sequence information. Sequence data were used for in silico anchoring against genetic markers with known sequences, of which almost 79% of the physical map could be anchored. Moreover, the assigned sequence information led to the ‘decoration’ of the respective physical map with 3359 anchored genes. Thus, this robust and genetically anchored physical map will serve as a framework for the sequencing of wheat chromosome 6A, and is of immediate use for map-based isolation of agronomically important genes/quantitative trait loci located on this chromosome.


INTRODUCTION
Bread wheat (Triticum aestivum L.) has been a constant staple food and major crop that has provided energy and protein for humankind for millennia. Today, it represents approximately 20% of all calories consumed by humans (http://www.fao.org). As the world population grows and the climate continuously changes, future generations may challenge the current food supplies, which could augment the demand for wheat (Ortiz et al., 2008;Sommer et al., 2013). Therefore, accelerating wheat breeding and production by understanding the molecular basis of phenotypic variation and exploiting genetic diversity to improve performance are crucial. In this respect, the sequencing of the entire bread wheat genome has been considered to be a critical step for achieving these goals (Eversole et al., 2009).
Accessing complete chromosomal sequences and the gene repertoire of hexaploid wheat (2n = 6x = 42) is quickly becoming a necessary but daunting task because of its large genome size (~17 Gb/1C), which has resulted from successive hybridization events of three diploid grasses with large and structurally similar genomes (A, B and D), populated with more than 80% of repetitive elements (Flavell et al., 1977). To overcome these challenges, a strategy has been identified that aims to break down genomic analyses into manageable-sized tasks (i.e. individual chromosomes and/or chromosome arms). This strategy relies on flow-sorting of individual chromosome arms used for construction of BAC libraries and also for next-generation sequencing (Dole zel et al., 2007(Dole zel et al., , 2014. Moreover, together with the advent of physical map building using whole-genome profiling (WGP TM ; van Oeveren et al., 2011), chromosome flow cytometry represents an important technological development that provides easier access to large and polyploid genomes (Dole zel et al., 2014).
Whole-genome profiling (WGP TM ) produces sequence tags at terminal ends of enzymatic restriction fragments from individual bacterial artificial chromosome (BAC) clones using a short-read 'Next Generation' or 'High Throughput' sequencing device . Identification of BAC overlaps by pairwise WGP TM tag comparisons allows for the assembly of BAC clones into contigs using the FINGERPRINTED CONTIGS (FPC; Soderlund et al., 1997) or LINEAR TOPOLOGICAL CONTIG (LTC) programs . WGP TM is considered more robust, less laborious and more efficient at building physical contigs than conventional high-information content fingerprinting (HICF; Luo et al., 2003;Philippe et al., 2012). The WGP TM approach has been successfully used on a number of genomes TTGC, 2012;Sierro et al., 2013), and its application has also been demonstrated for BAC clone contig formation originating from only a small fraction of wheat chromosome 3B (Philippe et al., 2012). Moreover, WGP TM short sequence tags connected to the sequence contigs obtained from whole-chromosome shotgun sequencing (WCS; or chromosome shotgun sequencing, CSS) may further expand the possibilities for using WGP TM .
Substantially longer WCS contigs connected to BACassociated WGP TM tags facilitate sequence homology searches against respective genetic markers with known sequences, thereby providing an in silico-anchored physical map. A physical map, aided by shotgun sequencing, can provide clear insight into the physically positioned and ordered gene repertoire before complete genomic sequences of the wheat chromosomes become available. Without a link to physical maps, individual wheat chromosome shotgun sequence data sets were previously used to estimate virtual gene order, composition and evolutionary chromosomal rearrangements (Vitulo et al., 2011;Hernandez et al., 2012;Akhunov et al., 2013;Tanaka et al., 2014).
Such studies were mainly performed by an analysis of extensive long-range conserved synteny with reference grass genomes [Oryza sativa (rice), Sorghum bicolor and Brachypodium distachyon]. Similarly, a fivefold genome coverage-based sequence assembly of the entire bread wheat genome shotgun sequence was previously completed and analyzed (Brenchley et al., 2012) using a similar comparative genomics-based approach that involved comparison with sequences of diploid ancestral and progenitor genomes. Although these reports represent significant achievements in wheat genome biology, a more accurate wheat reference genome that avoids assumptions made by comparative genomics requires the establishment of genetically anchored physical maps.
Genes assigned to a physical map significantly facilitate positional gene cloning efforts and detection of regulatory elements. Herein, we constructed a physical map of wheat chromosome 6A linked to the annotated gene sequence information. We report a high-resolution gene map of chromosome 6A based on DNA sequences obtained from flow-sorted chromosome arms and by using the WGP TMbased physical map approach. BAC assembly was performed using LTC software, and the assembly robustness was compared with FPC. WCS contigs from hexaploid wheat chromosome 6A [International Wheat Genome Sequencing Consortium (IWGSC), http://www.wheatgenome.org], together with available sequences from wheat ancestral diploid genomes (Jia et al., 2013;Ling et al., 2013), were aligned to the 6A physical map. Overall, we describe the development of a powerful resource for 6A that facilitates the study of its assigned genes and quantitative trait loci (QTLs).

BAC libraries of the 6A chromosome arms
Bacterial artificial chromosome libraries were constructed separately from short (6AS) and long (6AL) arms of chromosome 6A, which were purified separately by flow-cytometric sorting from two telosomic lines of wheat in which the arms originating from 6A of cv. Chinese Spring are stably maintained as telocentric chromosomes. Analysis of random samples of sorted chromosome fractions by fluorescence in situ hybridization (FISH) revealed an average purity of 89 and 86% for 6AS and 6AL, respectively. A total of 7.15 million and 6 million 6AS and 6AL arms were collected, which represented 4.91 and 4.53 lg DNA, respectively. The DNA was used to construct chromosome arm-specific BAC libraries TaaCsp6AShA (6AS; 49 (Table 1).

BAC contig assembly of the 6A chromosome arms
Whole-genome profiling (WGP TM ) data were produced for both chromosome arms 6AS and 6AL (Table 2; Appendix S1). Only BAC clones containing 6-68 tags entered the BAC assembly pipelines. A total of 18 820 BAC clones and 109 570 unique WGP TM tags were used as input for BAC assembly of 6AS (Table 2). For 6AL, 17 309 BAC clones containing 108 700 unique WGP TM tags were used to build the physical map (Table 2). This delivered an average of 29.7 WGP TM tags per BAC for 6AS and 28.6 WGP TM tags per BAC for 6AL, by considering all BACs for a given WGP TM tag (Table 2). We used LTC and FPC tools for assembly. LTC has been shown to outperform FPC because it can build longer, better ordered and mo re robust contigs, compared with FPC, employing HICF-based BAC fingerprints (Breen et al., 2013;Lucas et al., 2013;Philippe et al., 2013;Raats et al., 2013). The same set of BACs and WGP TM tags were used for both FPC and LTC analyses.
Automated LTC delivered 1217 contigs and 3136 singletons for 6AS and 1113 contigs and 2581 singletons for 6AL (Table 3; Appendices S2 and S3). All contigs contained at least two BAC clones. The 6AS average contig size was estimated to be 0.429 Mb with an L 50 contig size of 1 Mb, whereas the 6AL physical map had an average of 0.488 Mb and had an L 50 contig size of 0.945 Mb (Table 3). All contigs were validated for linear topology and represented as the net width value. Contigs with a width >1, which indicates the presence of clone(s) having no significant overlaps with clones from the selected minimal tiling path (MTP) of the contig, were considered questionable because their topological network representation contradicted the linear chromosomal structure. Almost 95% of the cumulative 6AS and 6AL contigs were initially linear, whereas the rest, having a width ≥2, were checked manually for linearity. In the latter group, nonlinearity was presumably caused by missing WGP TM tags in a particular single BAC. Thus, these contigs were kept as such and further break down was not performed. Using LTC, a total of 5139 (6AS) and 5621 (6AL) clones were selected for the MTP (Appendices S2 and S3). Another physical map was made by FPC (1e À11 final cut-off; Appendix S4), in which a total of 640 and 620 contigs for 6AS and 6AL were formed, respectively. A total of 5045 and 3560 BACs were sorted out as singletons for 6AS and 6AL, respectively (Table S1; Appendix S5).
BAC WGP TM tags facilitated in silico genetic anchoring of the 6A physical map To facilitate efficient in silico anchoring, the short WGP TM tags underlying 6A physical contigs were extended by connecting them to the available wheat sequence information in three ways (Figures S1 and S2; Appendix S6). Firstly, WGP TM tags were connected to the available shotgun sequence contigs (obtained from IWGSC) of the bread wheat 6A chromosome arms (Appendix S6). This resulted in assigning 165-Mb of sequences (out of 433.6 Mb) from WCS contigs to the wheat 6A physical map. Secondly, the previously assigned WCS contigs were connected to the progenitor genome sequences of Triticum urartu (an A-genome progenitor; Ling et al., 2013), which resulted in assignments of 375 Mb of sequences along the physical map. Thirdly, in order to enrich the physical map with further sequence information, an additional layer of BACs containing <6 or >68 tags were not entered into the physical map assembly pipeline. sequences was added to sequences already assigned to 6A WCS and T. urartu using Aegilops tauschii (a D-genome progenitor; Jia et al., 2013;Figures S1 and S2;Appendix S6). This provided a total of 157 Mb of sequences from Ae. tauschii assigned to the 6A physical map. Considering the more distant relation of Ae. tauschii to bread wheat chromosome 6A, the effect of the Ae. tauschii sequence inclusion to our anchoring analysis was checked. We found that the incorporation of this sequence data set had a limited contribution to the overall anchoring of the physical map, with no negative effect on the accuracy of the genetic anchoring (Appendix S7). Altogether, the average cumulative sequence information per physical contig increased from 2144 nt using WGP TM tags to 11 067 nt, simply by adding the available wheat sequence resources ( Figure S2; Appendix S8). This number of sequences directly assigned to the physical map enabled sequence homology searches against all available genetic markers with known sequences, and thus provided the basis for the integration of genetic and physical maps in silico (Appendix S6). Genetic anchoring was performed for both LTC and FPC physical map assemblies; however, only the result of genetic anchoring performed for LTC assembly is shown here. This assembly was selected as the final 6A assembly because we observed that LTC provided a more robust assembly than FPC (see the following section). First, LTC contigs were anchored to two highly dense wheat genetic maps (Poland et al., 2012;Cavanagh et al., 2013), which allowed us to genetically anchor 298 6AS and 384 6AL LTC-derived contigs. This genetically anchored a total of 661 Mb out of the 1048 Mb, which represents the cumulative contig map length (Figure 1; Appendix S9). By considering only genetically anchored LTC-built contigs, we were able to genetically anchor 132 Mb of WCS contigs, 303 Mb of sequence data from T. urartu and 129 Mb from Ae. tauschii to the 6A genetic maps. The remaining unanchored 6A physical contigs were subjected to a second round of anchoring using the publicly available barley genomic resource. This was performed to provide researchers with an additional layer of anchoring information for the respective physical map. The barley-based anchoring was kept independent (Appendix S10) of the wheat-based anchoring. Therefore, in our analysis, we used 15 719 high-confidence barley genes from the barley genome (IBSC, 2012), together with 723 499 anchored barley WCS contigs from barley POPSEQ data (Mascher et al., 2013). Using high-confidence barley genes we were only able to anchor 26 additional 6A physical contigs (8 Mb), whereas 98 physical contigs from 6A (37 Mb) were exclusively anchored via barley population sequencing (POPS-EQ) data (Appendix S10). Overall, 831 Mb (i.e. 79% of 6A physical contigs) were genetically anchored ( Figure 1).
The large portion of the physical map anchored to the respective wheat genetic maps allowed for the analysis of recombination frequencies along the entire wheat 6A chromosome. For this purpose, an integrated wheat genetic map was constructed from the two aforementioned wheat genetic maps (Appendix S6). We then calculated the physical distance per 10-cM bins (Appendix S11). As expected, for plants with large genomes [i.e. Zea mays (maize; Anderson et al., 2003), Hordeum vulgare (barley; Kunzel et al., 2000) and wheat (Lukaszewski and Curtis, 1993)], we were able to reconfirm that recombination greatly increased from the centromere towards the telomeres ( Figure 2). These detailed estimates related to recombination frequencies along 6A will be most useful for future map-based cloning and gene identification projects.

Comparison between LTC and FPC assembly at different cutoff values
To date, physical maps of four arms of wheat chromosomes, including 1AS, 1AL, 1BS, 1BL and 3B, have been reported (Paux et al., 2008;Breen et al., 2013;Lucas et al., 2013;Philippe et al., 2013;Raats et al., 2013). With the exception of chromosome 3B, both LTC and FPC tools have been used to assemble wheat chromosome physical maps. A comparison was performed between the two tools for wheat chromosome arm 1AL and showed that the LTC assembly provided significantly higher accuracy (Breen et al., 2013); however, the previous study only described the advantages of LTC by comparing the LTC assembly with a single FPC assembly constructed at 1e À45 and by using HICF data (and not WGP TM ) as the input (Breen et al., 2013). In our study, we compared contigs obtained by LTC as the reference assembly with individual FPC assemblies (Table  S1) generated at different stringencies to assess the differences between the LTC and the FPC assemblies employing WGP TM . This allowed for the identification of the most robust physical map for 6A. Our comparison distributed an assortment of LTC-made contigs into five groups ( Figure 3; Tables S2 and S3; Appendix S12). The LTC ≥ 2FPC group (LTC-derived contigs for which BACs were assembled into two or more different contigs via FPC; cases of conflict; Figure 4) was considered the most suitable group for comparing the two assembly platforms.
The number of such LTC-built contigs (LTC ≥ 2FPC) was found to decrease with declining FPC assembly cut-off values. This was explained by end-to-end merging during FPC assembly. To check whether contig merges were accurately formed via FPC while decreasing assembly cut-off, all LTC ≥ 2FPC cases were visualized and inspected (Figure 4). This inspection was only performed for cases of conflict between LTC and FPC assemblies at a 1e À11 FPC cut-off for each chromosome arm (Figure 4; Appendices S13 and S14). This cut-off was selected because it was closest to the LTC cut-off value (1e À10 ), and thus was considered to  (a) The number of LTC contigs that were anchored using one or a combination of different map resources: e.g. 493 LTC contigs were anchored by all three resources. (b) The corresponding cumulative size of the LTC contigs that were anchored using one or a combination of different map resources: e.g. 558 Mb was anchored by all three resources. The integrated genetic map of 6A (constructed in the current study; Appendix S6) refers to the combined genetic map derived from two highly dense wheat genetic maps, as previously described (Poland et al., 2012;Cavanagh et al., 2013). IBSC (2012)  Different classes of LTCderived contigs were identified when compared with FPC assemblies that include the first class, which are contigs that were exclusively made by LTC (not found in FPC). The number of such LTC-specific contigs was relatively constant for both of the 6A arms when compared with FPC assemblies at various stringencies. The second class contained contigs that were identical between assemblies made by both tools (LTC = FPC, i.e. BAC composition and order were the same in corresponding LTC and FPC contigs). This class showed a slight increase when the FPC stringency was lowered. The third class includes LTC > FPC (i.e. LTC contigs longer than FPC contigs), which refers to contigs with the same backbone, whereas more BACs were added to the end of corresponding contigs via LTC. This class also showed a slight decrease in number by lowering the stringency in FPC. The two remaining classes include LTC < FPC (i.e. LTC contigs that are shorter than FPC contigs) and LTC ≥ 2FPC (i.e. LTC-made contigs, the BACs of which were assembled into two or more different contigs via FPC). Both classes showed higher differences in number across the FPC assemblies that are mainly explainable by end-to-end merging during FPC assembly. For example, a decrease in the number of LTC ≥ 2FPC from 1e À30 to 1e À25 is likely to have resulted from the merging of the two corresponding FPC contigs at 1e À25 . Therefore, contigs of 1LTC = 2FPC at a higher level of FPC stringency later became 1LTC < 1FPC (or 1LTC = 1FPC) at a lower level of stringency. In both chromosome arms, identical classes show a relatively similar trend of variation across FPC assemblies, although they may contain different numbers of contigs in each of the arms. represent the most comparable cut-off value. In the case of 6AS, 72 conflicts were found for which a particular set of overlapping BACs were always shared between LTC and corresponding FPC counterparts. These BACs were also similarly ordered along the respective contigs (Figure 4, red-colored BACs; Table 4). We then analyzed whether the conflict could be resolved by consulting genetic markers anchored to corresponding BAC clones. For 22 of the 72 6AS cases of conflict, the corresponding affected LTC-and FPC-made contigs were provided with a sufficient number of anchored genetic markers (informative markers). Such markers allowed us to ascertain whether the different parts of affected contigs had been correctly overlapped/merged and, if so, whether the conflict could be resolved. In the majority of cases (17 out of 22), genetic markers revealed a false assembly of different FPC-derived contig parts (e.g. a chimerical FPC contig was built; Figure 4, status 1). For the remaining five cases, either the LTC-built contigs were chimeric (four cases; Figure 4, status 2) or the respective BACs were correctly assembled by both FPC and LTC tools (one case), and thus corresponding conflicts were resolved (Figure 4, status 3). We achieved a similar result for 143 conflicts identified for 6AL (Table 4). Therefore, we concluded that the ability of FPC to form longer and consequently lower numbers of contigs, while decreasing the assembly cut-off value, could potentially produce higher numbers of chimerical contigs as a result of false BAC contig end-to-end merges.
We also observed a highly consistent number of LTC-specific contigs (small contigs, average BAC/contig = 2.6-2.9) compared with FPC assemblies. These LTC-derived contigs (539 for 6AS and 368 for 6AL, compared with FPC at 1e À11 ) were composed of BACs that were left out as singletons in the initial FPC build at 1e À75 (Table S1), where BACs were assembled into contigs only if they shared >70% of WGP TM tags (Paux et al., 2008). BAC clones containing fewer tags (an average of 15 tags/BAC, compared with 34 tags/BAC for the rest of the assembly) were incorporated into contigs via LTC because the initial assembly was performed at a lower stringency (1e À2 cut-off). Therefore, such LTC-specific small contigs resulted in a higher number of LTC-made contigs, and consequently had higher chromosomal arm coverage, as they most likely cover the same regions as larger contigs. Although LTC had an artificially generated larger number of small contigs, the robustness of BAC order and overlap identification for the remaining contigs was significantly higher in LTC versus FPC assemblies (see cases of conflict above). For this reason, we considered the entire LTC-based assembly to be a more reliable physical map for subsequent analyses, including synteny analysis, MTP selection and future BAC-based sequencing of wheat chromosome 6A.

Efficacy of FPC at improving LTC-made assembly
To test whether gaps between LTC-derived contigs could be closed via FPC-assembled BACs, LTC-derived contigs were aligned against FPC assemblies using a 1e À50 cut-off as a reference. This was in contrast to the aforementioned comparisons, where LTC was used as a reference. We selected this stringently formed FPC assembly to ensure the robustness of overlap among BACs of the corresponding contigs. Using this approach, we identified 25 FPC-built contigs for 6AS and 45 FPC-built contigs for 6AL in which BAC clones were represented in more than two LTC-made contigs (Table 5). These LTC-derived contigs were thus considered potentially mergable or potential scaffolds because the gap between two LTC-made contigs could be bridged using a robustly formed FPC-built contig. All of these scaffolds were depicted as individual images according to BAC position to enable a visual inspection of their structure ( Figure  S3). Only scaffolds for which the corresponding FPC-made contig was composed of BACs from the ends of two different LTC-made contigs were flagged as potentially true scaffolds (10 for 6AS; 17 for 6AL); however, those FPC-derived contigs that, for example, contained BACs from the end of Neither the LTC contig nor the respective FPC contigs could be flagged as chimerical on the basis of anchored genetic markers. one LTC-made contig and BACs from the middle of another were rejected (almost 60% for each arm; Figure S3). These rejected scaffolds most likely represent falsely assembled BACs via FPC (chimerical contigs).
Using these scaffolds we then tested whether any anchoring information could support scaffold accuracy. For 6AS, genetic markers were available for four such scaffolds, of which one was confirmed genetically; for 6AL, we found three cases out of six (Table 5). This result indicated that the efficacy of FPC assemblies (at 1e À50 ) at improving LTC-derived assemblies was relatively low. In general, this might indicate that the complexity of the wheat sequence restricted the ability of FPC, using WGP TM data as input, to construct robust contigs, even at the initial higher stringency (in this case 1e À50 ).

Scaffolding the LTC assembly using WGP TM tags and shotgun sequence contigs
Publicly available T. urartu sequence contigs (Tu contigs) were used to determine whether they allow for the bridging of LTC-made contigs and building of LTC scaffolds. We checked whether a single Tu contig exclusively matched terminal parts of two different LTC-derived contigs (Appendix S15). Using Tu contigs as a proxy, we identified 84 6AS and 65 6AL potential scaffolds (Table 6; Appendix S16). Those LTC-made contigs engaged in scaffolding were relatively small (average BAC/contig = 5 6AS and 17 6AL), and therefore could not have a significantly positive effect on the L 50 of the overall BAC assembly; however, we asked whether LTC-derived contigs involved in such scaffolds were supported by genetically anchored markers to validate our approach (with a similar approach as described in Figure 4). For 6AS, we found only four of these scaffolds, of which two could be genetically validated; for 6AL, 12 scaffolds with genetic markers were provided, of which eight were genetically confirmed (Table 6). Although genetic markers were not available for structural confirma-tion of all the scaffolds constructed, we kept and reported all formed scaffolds (Appendix S16) because they could potentially support and/or guide future BAC-based sequencing and sequence assembly of the respective physical contigs.
'Gene decoration' of the newly formed anchored physical map of wheat chromosome 6A Chromosome 6A shotgun sequences (6A WCS contigs) have already been annotated for genes (IWGSC, In press), which established 5024 genes grouped into four confidence classes (HC1-HC4). Of these, 2531 genes were assigned to the HC1 category because ≥70% of coding sequences overlapped with a reference gene in B. distachyon, rice or S. bicolor. The remaining confidence classes (HC2-HC4) had less overlap with reference genes (IWGSC, In press); however, in our analysis of the total 5024 HC1-HC4 genes, 3359 genes (1667 6AS; 1692 6AL) were assigned to the anchored portion of the 6A physical map (i.e. genetically positioned genes). This was accomplished by assigning the corresponding WCS contigs to the WGP TM tags (Appendix S6). We then calculated the gene density on this newly formed physical map along 6A by dividing the chromosome length in bins of 5 Mb (Figures 5 and  S4). In more telomeric bins we found a maximum of 84 genes, whereas in the centromeric region with low recombination the number of genes decreased dramatically to less than 20 ( Figure 5). Moreover, this analysis revealed a general correlation between the recombination rate pattern and distribution of genes along the chromosome (Figures 2 and 5).

Synteny-based approach addressed 6A evolutionary relations with model grass genomes
To analyse the completeness of the anchored physical map and to gain insight into the evolutionary origin of Scaffolding was allowed if the corresponding sequence contigs hit at least three WGPTM from the ends of only two LTC contigs. c The corresponding LTC contigs were provided by genetic anchoring information. chromosome 6A, physical contig-associated sequences were compared against reference genomes (Appendix S6). Associated sequence information, including WGP TM tags, WCS, T. urartu and Ae. tauschii contigs, were compared with coding sequences from other grass genomes of Hordeum vulgare (Hv), Brachypodium distachyon (Bd), Oryza sativa (Os), and Sorghum bicolor (Sb). Anchored sequences showed sequence homology to 2799 Hv, 2455 Bd, 2539 Os and 2465 Sb genes. Of these genes, 40.0-48.5% matched syntenic chromosomes of the three genomes, including Bd3, Os2 and Sb4. These results are comparable with those reported previously (40.2-59.7%) for syntenic genes of 6B (Tanaka et al., 2014). Figure 6 depicts the 6A gene distribution among model grass genomes. We observed that the number of wheat coding sequences shared with at least three species (1994 genes) was higher than the genes presented in two (873 genes) or a single species (1713 genes). Proteins with significant similarity (seed length of 20, identity ≥75%; http:// www.vmatch.de) to wheat sequence information were plotted along the chromosomes (Figure 7). The average syntenic gene content was calculated as the number of matching wheat proteins in a window of 1 Mb, without overlap. In this analysis, 6A sequence information showed homology with genes/proteins located on chromosome 6H of Hv (Hv6H), Bd3, Os2 and Sb4, but not with any other chromosomes of these genomes (Figure 7), which is consistent with previous reports TIBI, 2010;Brenchley et al., 2012;IBSC, 2012).

DISCUSSION
The current study is the German contribution to the IWGSC. This international consortium has aimed to sequence wheat chromosomes and/or chromosome arms individually, with the construction of the respective BACbased physical maps as a necessary intermediate step. In this context, we report here the successful application of WGP TM together with the contig assembly tool LTC to robustly assemble BAC clones into 1113 contigs for the long arm and 1217 contigs for the short arm, representing the wheat chromosome 6A physical map. To date, physical Figure 6. Venn diagram of wheat chromosome 6A physical map-associated gene distribution with significant similarity to Brachypodium distachyon, Oryza stiva (rice) and Sorghum bicolor. Chromosome 6A physical mapassociated sequence information, including whole chromosome survey sequencing, Triticum urartu and Aegilops tauschii contigs, were compared with coding sequences from model grass genomes. In Hordeum vulgare, all genes including genetically anchored and unanchored genes were considered (IBSC, 2012). maps of four arms of wheat chromosomes, including 1AS, 1AL, 1BS and 1BL, as well as the entire chromosome of 3B, have been reported (Paux et al., 2008;Breen et al., 2013;Lucas et al., 2013;Philippe et al., 2013;Raats et al., 2013). The 6AS chromosome arm (336 Mb) has the closest estimated size to that of 1BS (314 Mb; Saf a r et al., 2010), for which an initial assembly (before manual end-to-end merging) of 254 LTC-derived contigs (with six or more clones) was reported. Excluding small contigs (with five or fewer clones) from our 6A assemblies would result in 293 (368 Mb) and 545 (459 Mb) contigs for the short and long arms, respectively. Therefore, the number of contigs obtained for 6AS is comparable with the total number of contigs in the initial assembly of 1BS; however, in contrast to the final assembly (after manual end-to-end merging) of 1BS that resulted in 57 scaffolds, no further contig merging or scaffold construction was considered in the final 6A assemblies, as this must be guided by highly reliable and robust genetic maps. Otherwise, the complexity and repeat content of the wheat genome could potentially hamper any manual contig merging or super-contig construction. Excluding small contigs, we obtained more than 100% coverage for 6AS (109%) and 6AL (124%) assemblies. This is most likely because the original chromosomal arm sizes have been underestimated.
Overall, we were able to anchor 79% of the physical contigs into corresponding genetic maps, which is greater than the aforementioned wheat chromosomal physical maps, including 1BL (74%), 1AS (74%), 1AL (~75%) and the first version of 3B with a 56% anchored physical map. Nevertheless, the 1BS physical map contained 83% of the contigs anchored to the respective genetic maps because of the low number of contigs (Raats et al., 2013). The availability of extended WGP TM tags allowed for the direct placement of 3843 genes into a physical map, which was comparable with the recent physical map for barley, where an average of~3700 genes could be assigned to each chromosome (IBSC, 2012). This high level of genetically anchored physical maps and their respective genes provides a more efficient way to clone agronomically important genes/QTLs located on 6A. Fine-mapping and the identification of genes underlying these important QTLs have been inhibited in wheat, mainly by technical constraints linked to its genome size (17 Gb), repeat content (>80%) and genomic redundancy (presence of three highly homologous genomes: A, B and D). These limitations may explain why very few wheat genes have been cloned (Krattinger et al., 2009). Therefore, chromosomal BAC-based physical maps are of utmost importance to promote and simplify positional cloning in this large genome.
In this study, the comprehensive and integrated 6A physical map localized some genetic determinants to the corresponding physical map, and provided information required for the development of tightly linked genetic markers (Figure 8). Such loci include one resistance QTL that is important against the wheat disease Fusarium head blight (Schmolke et al., 2005;Holzapfel et al., 2008), the stem rust resistance gene Sr13 Simons et al., 2011), an anti-xenosis gene against a new aphid biotype (Castro et al., 2005) and QTLs involved in adult plant resistance to powdery mildew (Muranty et al., 2009), as well as greater seedling vigor (Spielmeyer et al., 2007). All of these QTLs/genes had already been genetically mapped to 6A; however, in the current study, only two corresponding gene intervals were successfully localized to the 6A physical map. For the remaining genes, either the corresponding primer/marker sequence information was not publicly available or the marker sequence could not be detected in our 6A-connected sequence data set. For the identified intervals, the respective information, including the number of genes and physical contigs assigned to a respective region, were identified ( Figure 8). This analysis shows the usefulness of our physical map and represents an unprecedented opportunity to accelerate detailed gene studies, including positional cloning, and ultimately wheat breeding programs.
In the absence of the aforementioned wheat physical maps, comparative genomics and collinearity between Figure 8. Physical map intervals containing agronomically important genes on wheat chromosome 6A. Primers or corresponding PCR-amplified sequences were used to define the corresponding interval. An in silico sequence homology search was performed to connect primer sequences or corresponding PCR-amplified sequences to sequence information underlying the 6A physical map; PM, powdery mildew-resistant gene (Muranty et al., 2009); SR, stem rust resistance Simons et al., 2011). Bi-directional arrows indicate gene containing intervals; corresponding genetic and physical information of each interval have been highlighted with the same color. wheat and related grass genomes have been the method of choice for map-based cloning in wheat. Previous strategies have been shown to be very arduous, costly and to require a variety of genomic resources. For example, initial attempts at cloning the wheat Lr34/Yr18 locus failed because of a lack of sufficient collinearity between wheat and the small rice and B. distachyon genomes (Spielmeyer et al., 2008). Although conserved synteny was of great support in narrowing down respective gene-containing intervals, the region carrying Lr34/Yr18 is absent in both rice and B. distachyon syntenic segments, validating the hypothesis that grass resistance genes have less conserved micro-colinearity (Leister et al., 1998). Access to robustly assembled hexaploid wheat genome physical maps is the key to expediting future wheat genome sequencing efforts.
Together with the 6A physical map, we were also able to estimate the rate of recombination along 6A. A general agreement between the pattern of recombination rate and gene distribution along chromosomes was observed, similar to an earlier report (Erayman et al., 2004). This report revealed a correlation between recombination gradient and gene distribution by physically mapping 3025 genes/ QTLs to 334 deletion break points that spanned all seven wheat chromosomal groups. This eventually enabled the co-localization of gene-and recombination-rich regions along wheat chromosomes. In our study, the highest density of genes was found for more distal regions of 6A and where recombination rates reached their maximum. In the centromeric area of 6A, however, where recombination rates were the lowest, the number of genes per physical unit also declined dramatically. Such suppression of recombination limits our genetic anchoring resolution for the (peri)centromeric area in which a high number of physical contigs are anchored with unclear order. Therefore, different mapping approaches (e.g. radiation hybrid mapping that is independent of recombination; Kalavacharla et al., 2006) should be used in the future to precisely order contigs in this region. In any case, the recombination estimates along 6A provided are imperative for accomplishing a more efficient isolation of important genes/QTLs via map-based cloning, as genes located in regions with high recombination rates are more accessible to map-based isolation (Jander et al., 2002).
The efficiency of WGP TM technology and the advantage of the LTC assembly tool over FPC This study reports the application of WGP TM on individual wheat chromosome arms and effectively confirms how well WGP TM technology works on arms with highly repetitive sequences (>80%) to accurately form a physical map. The advantage of WGP TM over conventional HICF-BAC fingerprinting assembly in wheat has previously been discussed Philippe et al., 2012).
In this respect, the robustness and quality of a BAC assembly may be justified with the quality and quantity of information (e.g. tag length and density in WGP TM or band size and similarity in HICF) provided for pairwise comparison of BACs and the subsequent establishment of respective contigs. In our study, both tag length and density were significantly improved by 64% compared with the values obtained in the WGP TM -based pilot study performed on wheat chromosome 3B (Philippe et al., 2012). Our improved WGP TM results were achieved by applying a different combination of restriction enzymes with higher cutsite frequency (HindIII/MseI), and by increasing the initial sequence read length to 100 nt, as previously suggested (Philippe et al., 2012). Therefore, by using additional sequence information per BAC, together with higher stringency (tolerance value = 0) during assembly and a more efficient contig formation tool (LTC), we propose that a highly accurate BAC-based physical map of 6A has been developed.
The superior performance of LTC using HICF data compared with FPC was recently illustrated in wheat (Breen et al., 2013;Raats et al., 2013). Here, similar conclusions were derived while applying WGP TM tags for BAC assembly of both 6A arms. LTC-derived contigs could be classified into five groups when compared with FPC-made contigs obtained at a given cut-off value. The striking difference between LTC and FPC is reflected in LTC-built contigs (i.e. LTC ≥ 2FPC; cases of conflicts) in which underlying BAC clones were assembled in more than one contig when compared with FPC assemblies. In a randomly selected sample (LTC ≥ 2FPC, FPC at 1e À11 ), genetic anchoring revealed that in 71% of such cases, the corresponding FPC-derived contigs were incorrectly merged, whereas for LTC, this was only 13%. Moreover, by visually inspecting mis-assembled FPCmade contigs, we often observed that inconsistent cases had highly degenerate BAC coverage compared with other contig parts (Figure 4, red arrowheads). Low-coverage regions were rarely detectable in corresponding LTCderived contigs for which multiple FPC contigs were available. Low-coverage regions in FPC-built contigs are most likely the result of false end-merging during the stepwise reduction of assembly stringency in FPC. Our observations further demonstrate the advantages of the LTC-based BAC assembly for large genomes containing large numbers of repetitive elements. This is in agreement with a previous report that showed LTC was more efficient at forming BAC contigs using HICF (Breen et al., 2013;Raats et al., 2013). In addition, we showed that by applying a more reliable technology (i.e. WGP TM ), FPC performance is still considerably lower than that of LTC in physical map assembly. Therefore, we highly recommend combining WGP TM fingerprint methodology together with LTC assembly software for future physical mapping efforts in wheat.

Chromosome sorting and construction of BAC libraries
The 6AS arm was flow-sorted from a double ditelosomic line of wheat carrying both arms of chromosome 6A as telosomes (2n = 40 + 2t6AS + 2t6AL), whereas the 6AL arm was purified from a ditelosomic line carrying only the 6AL arm as a telocentric chromosome (2n = 40 + 2t6AL), according to Vrana et al. (2000). Both flow-sorted telosomes were derived from cv. Chinese Spring. The identity and purity in the sorted fractions was checked by fluorescence in situ hybridization using probes for the telomeric repeat and for the GAA repeat (Janda et al., 2006). Chromosome arm-specific BAC libraries were constructed as described by Simkov a et al. (2011). In order to estimate the average insert size, 160 BAC clones were randomly selected from each of the libraries and analyzed as described in Janda et al. (2006).

WGP TM data production
A 3D format of BAC pools was made for each BAC library (Appendix S17). High-concentration BAC DNA was subsequently isolated from pooled BACs, followed by WGP TM sample preparation, as previously described . Briefly, restriction ligation templates were prepared from pooled BAC DNA by digestion using HindIII and MseI, followed by ligation of adaptor sequences containing sample identification tags (barcodes), PCR amplification and the pooling of respective PCR products. Sequencing of the resulting amplified cluster was performed using the Illumina HiSeq with a 100-nt read length. Sequence reads were used for WGP TM tag production (see Appendix S1), which included barcode and restriction site identification, deconvolution of reads as WGP TM tags to the individual BACs and the filtering of WGP TM tags using various quality controls. This filtering pipeline was used to eliminate tags matching vectors, E. coli or chloroplast sequences. Tags containing homopolymer sequences ≥5 nt were considered uninformative (i.e. with a high chance of being present in more than one BAC in a particular plate). Moreover, tags potentially introducing ambiguities were also eliminated (i.e. those present in >12 BACs).

BAC contig assembly
To operate the BAC assembly, we used LTC  and FPC 9.4 (http://www.agcol.arizona.edu/software/fpc/). Both employ the same metric or so-called Sulston score. In the case of LTC, as the tolerance could be set at the best stringency (tolerance = 0), the initial net of significant clone overlaps obtained at a 1e À2 cut-off was considered. Corresponding subnets were then obtained at a 1e À10 cut-off and used for contig formation. All contigs with at least two clones were exported into FPC format and checked for linear topology. All contigs with a width ≥2 were checked and split manually to obtain linear contigs. If only one clone explained the nonlinearity, the contigs were left as such because this nonlinearity was likely to be caused by a lack of WGP TM tags in the corresponding clone (Philippe et al., 2013). Further parameters required to establish the LTC physical map were as follows: a tolerance of 0; gel length of 111 000; N_bands_Sulston (number of bands for Sulston score calculation) equal to gel length; and a minimum contig size of two clones. Adaptive clustering was performed using the following criteria: a 1e À3 cut-off (while the initial value to make the net of significant clone overlaps was set to 1e À2 ) and a step size of 1, with seven steps. MTP clones were selected applying the aforementioned parameters in the LTC program. Because of the low number of genetic markers and the lack of adequate anchoring, no further contig merging or supercontig construction was performed.
In addition to LTC, contig assembly was also performed via FPC to compare the performance of each tool in physical map construction. Briefly, the initial FPC assembly was performed with a 1e À75 cut-off. This was subsequently run through single-to-end and endto-end merging (Match, 1; From End, 13) at 13 sequentially higher cut-offs (thus, lower stringency) that ended up at 1e À11 , as was suggested for a WGP TM -based strategy in wheat (Appendix S4; Philippe et al., 2012).

SUPPORTING INFORMATION
Additional Supporting Information may be found in the online version of this article. Figure S1. Steps for the elongation of WGPTM tags by connecting them to the available 6A related sequence information. Figure S2. Different quantities of publicly available sequence information were connected with the physical contigs using the underlying WGPTM tags. Figure S3. An example of homology between LTC and FPC contigs (as reference). LTC contigs were aligned against FPC contigs. Figure S4. Different gene classes assigned to the physical contigs. Table S1. Reduction in number of contigs and singleton assembled using FPC as a result of decreasing cut-off value. Table S2. Comparison of 1214 LTC-assembled physical contigs of 6AS with FPC at different stringencies. Table S3. Comparison of 1108 LTC-assembled physical contigs of 6AL with FPC at different stringencies. Appendix S1. Whole-genome profiling (WGPTM) of 6A chromosome arms. Appendix S2. 6AS LTC-derived physical map. Appendix S3. 6AL LTC-derived physical map. Appendix S4. The FPC-based assembly of the WGPTM-based BAC fingerprints. Appendix S5. The FPC-based BAC assembly.