Chordoma is a rare bone tumor originating from notochordal remnants.1 It occurs almost exclusively in the axial skeleton, where it is distributed nearly equally among cranial, vertebral and sacral sites. The tumor occurs more frequently in males than in females (1.67:1) and in Caucasians than in African-Americans (4:1) and is diagnosed at a median age of 58.5 years with a range from infancy to >90.2 Families with 2 or more individuals with chordoma have been described worldwide, suggesting a genetic predisposition to the tumor. The pattern of inheritance in some of these families is compatible with an autosomal dominant trait.3 Previously, we performed linkage analysis using microsatellite (STR) markers on 3 unrelated chordoma kindreds (16 patients with chordoma) and reported significant evidence for linkage to chromosome 7q33 (Zmax = 4.78) with a minimal disease gene region of 11 cM.4 Recently, we clinically evaluated a fourth multiplex chordoma family that had been previously described.5 This family was reported to be linked to 1p36 based on a combined loss of heterozygosity (LOH) and linkage analysis (Zmax = 1.2).6
Recent technological advances in high-throughput single nucleotide polymorphism (SNP) genotyping have made it feasible to use SNPs for genome-wide linkage analysis. Although SNPs are less polymorphic than STRs, their great abundance and high potential for automated high-throughput genotyping analysis at moderate cost have prompted the consideration of using SNPs as an alternative to STRs for linkage analysis. Studies using SNPs in whole genome scans of both simulated and real data indicated that, when compared to commonly used STR markers, high-density SNPs might have increased power for detection of significant linkage signals and allow loci to be defined more precisely.7, 8, 9, 10 Improvement of analytical methods, such as the development of new algorithms and linkage programs,11 and the construction of high-resolution SNP maps12, 13 have made it more practical to use SNPs in linkage analysis. However, challenges still exist in the use of SNP-based linkage mapping, particularly, the computational burden associated with high-density marker maps, especially in complex pedigrees. To further investigate the evidence for linkage of a chordoma gene to 7q33 and genetic heterogeneity and to evaluate the use of SNP markers in linkage mapping of complex pedigrees, we genotyped the 4 chordoma families for a set of dense SNP markers on chromosomes 7 and 1p36 and carried out linkage analyses.
Material and methods
Subjects and genotyping
Families 1–3 have been described previously (Fig. 1a–c).4 We clinically evaluated a fourth chordoma family that had been reported by Dalpra et al.5 In family 4, the father had nasopharyngeal chordoma at age 9 that was treated with radiation therapy. The tumor recurred at age 42 and he died at age 47 from tumor progression. At the time of the study,5 his eldest daughter was considered to be unaffected. Subsequently, she was diagnosed with multicentric chordoma at age 25. The second daughter had cerebellar astrocytoma diagnosed at age 12, and the third daughter was diagnosed with skull base chordoma at age 7 (Fig. 1d).
Previously, we classified some subjects in families 1–3 as being affected with chordoma on the basis of MRI results alone.4 In the current linkage study, we used a more stringent case definition and required chordoma in all cases to be confirmed by histopathology. Thus, we changed the affection status of 2 individuals from family 1 and 1 individual from family 3 who were previously coded as affected to unknown. Juvenile astrocytoma occurred in an offspring of a patient with chordoma in both families 2 and 4. We classified these patients as affected, as we did previously.4 Thus, our present study included 16 affected individuals with either chordoma or astrocytoma from the 4 families (Fig. 1).
DNA from 50 relatives in the 4 families (including 16 affected cases) was genotyped for 76 SNP markers on 7q using the Illumina SNP linkage panel (n = 2,250 SNP version). The average distance between adjacent SNP markers was 1.54 cM. Information content of each marker set was measured using the entropy function in MERLIN.11 We also genotyped these individuals for 42 SNP markers with 1.8 cM average spacing on 7p to serve as a control region. In addition, we genotyped the 3 daughters and their mother in family 4 for 16 microsatellite (STR) markers in the chordoma locus region on chromosome 7q. Finally, to clarify the evidence for linkage to 1p36, we also genotyped the 4 chordoma families for 34 1p36 SNPs spanning 68.26 cM. The families were evaluated, and blood samples were collected under institutional review board-approved protocols.
The genetic map locations of the SNPs were assigned based on the interpolation method using the TSC map12 with chromosome coordinates of the SNPs based on the annotated sequence assembly that was described by Scherer et al.14 Allele frequencies of the SNP markers were calculated from the observed frequencies among nonbloodline unaffected family members and 4 CEPH individuals. RECODE (version 1.00, http://watson.hgen.pitt.edu/register/docs/recode.html) and MEGA2 (version 2.5)15, 16 were used to create and convert linkage-format files for different linkage analysis programs. Mendelian genotype incompatibilities (<0.1%) and additional genotyping errors that did not lead to Mendelian incompatibilities (0.3%) were identified by PEDCHECK17 and MERLIN,11 respectively, and were eliminated from the linkage analyses. Because each linkage program has different limitations in analyzing high-density marker sets in complex pedigrees, we used a number of different linkage programs and compared the results. We performed 2-point LOD-score linkage analysis using the MLINK program from the LINKAGE package,18 version FASTLINK 4.1P,19, 20 and multipoint parametric linkage analysis using GENEHUNTER 2.1_r5 beta21 and SIMWALK2 (version 2.86),22 under the assumption of autosomal dominant inheritance of a rare allele with population frequency 0.001. We also performed nonparametric multipoint linkage analyses using the programs MERLIN, GENEHUNTER and SIMWALK2. To be conservative, we conducted an affecteds-only analysis in all linkage analyses; thus, we coded the affection status of clinically unaffected individuals as unknown. Haplotype reconstruction was performed by using the haplotype functions of SIMWALK2 and MERLIN and then confirmed manually for the critical region. Formal tests for genetic homogeneity were conducted using the HOMOG program.23 Both 2-point (from MLINK) and multipoint LOD-scores (from LINKMAP) were evaluated using markers in the 7q linked region.
Linkage on 7q region
Haplotype analysis in this family of STR markers from the 7q chordoma locus region demonstrated that the 2 daughters with chordoma (the eldest and the youngest) did not share the same paternal haplotype (Fig. 1d). In addition, both parametric and nonparametric linkage analyses with 7q SNP markers indicated that this family was not linked to the 7q telomeric region (Fig. 2). The results from tests for genetic heterogeneity by the HOMOG program were suggestive of genetic heterogeneity, however, they were not statistically significant. Similarly, the HLOD scores calculated considering genetic heterogeneity from multipoint analysis obtained from GENEHUNTER (Zmax = 2.9) were much higher than the total LOD scores assuming genetic homogeneity (Zmax = 1.7) in the 4 families, also suggesting the presence of genetic heterogeneity. Therefore, this family was excluded in the subsequent linkage analysis of 7q region.
Figure 2 shows the nonparametric multipoint results (−Log p-values) for the entire 7q region using MERLIN. In families 1–3, the peak evidence for linkage on 7q occurred in the telomeric region, with p < 0.01 (−Log p > 2) observed at 8 consecutive markers, including 4 consecutive markers with p < 0.001 (−Log p > 3). Similar results were obtained from parametric multipoint linkage analyses (Zmax = 2.77 by SIMWALK2); 8 markers in this 25 cM region yielded LOD scores > 2. We compared the linked regions identified in these 3 families with SNP vs. STR markers, and our results showed they were the same, with the linkage peaks being located at approx. 150 cM. Assuming locus homogeneity, haplotype construction using MERLIN and SIMWALK2 provided similar results and revealed a common disease gene haplotype of 15.9 cM from marker rs890406 (140.2 cM) to rs727830 (156.1 cM). This is consistent with the minimal disease-gene region of 11.2 cM from marker D7S512 (139.5 cM) to marker D7S684 (150.7 cM) identified previously using STR markers4 (Table I).
Table I. 7q Haplotypes of Affected Individuals in the 4 Families
The haplotype regions on 7q shared within each of families 1–3 are indicated by the wide white box (identified by single nucleotide polymorphism [SNP] markers) and thin shaded box (identified by STR markers). An x indicates that the same allele was not shared by all affected people in that family. A box across families 1–3 shows the minimal disease gene region shared by affected people and obligate gene carriers shaded in dark (identified by SNPs) and light (identified by STRs), respectively. In family 4, the 2 daughters with chordoma did not share the same 7q paternal haplotype (as indicated by italics and bold). ND, the marker was not studied in this family.
To increase the power to detect linkage and to try to refine the boundaries of the minimal disease gene region in these 3 families, we combined 15 SNP markers typed for our study with 25 STR markers used previously in the 7q telomeric region of 42.3 cM. The combined marker set had higher information content (IC = 0.82) compared to either STR (IC = 0.76) or SNP markers (IC = 0.75) alone. Multipoint LOD scores obtained by using GENEHUNTER and SIMWALK2 with combined SNP and STR markers were similar and revealed a maximum LOD score of 3.8 (Fig. 3), which is close to what was obtained from the STR-only linkage analyses using the more stringent case definition. Consistent results were obtained from the nonparametric analyses with the most significant p-value of 0.0004 obtained using GENEHUNTER. The disease gene haplotype block was shared by all affected people in the 3 families and revealed a boundary similar to what we had defined previously (Table I).
As a control for the linkage finding on 7q, we evaluated 42 SNP markers on chromosome 7p. As expected, there was no evidence for linkage on 7p in either the 3 original families or in all 4 families combined (data not shown).
Linkage on 1p36
To clarify whether 1p36 harbored a susceptibility locus for chordoma, we performed linkage analyses in the 4 chordoma families with 34 1p36 SNPs. Our results indicated that families 1, 3 and 4 did not show evidence of linkage to this region (Fig. 4). A small positive linkage signal was observed in family 2 (Fig. 4); parametric linkage analysis yielded a maximum multipoint LOD score of 0.8 in this family.
Previously, we identified a chordoma locus on chromosome 7q in 3 unrelated chordoma kindreds. In this study, we clinically evaluated a new multiplex chordoma family (family 4) and investigated the evidence for linkage on 7q33 in all 4 chordoma families using SNP markers. Our results showed that, in the 3 original families, linkage to 7q33 was consistently identified with chromosome 7 SNPs. The integration of STR and SNP markers in this region increased the information content; however, there was no further increase in linkage signals compared to those obtained with STRs alone. There are 2 possible explanations for this finding. First, a disease haplotype shared by all affected individuals was established by STR markers. Adding more low-informative markers within this haplotype would provide little additional information for linkage analysis. Second, both SNP and STR maps had high information content. It is possible that, beyond a certain point, increasing information content may not provide further help. We observed similar results in a study using GAW14 simulated data. Specifically, extremely dense SNP maps did not provide significant improvement in linkage signals compared to STRs with lower information content when phase information was easily reconstructed (little missing genotypic data, extended pedigree structures, etc.) and when there was no significant genetic heterogeneity.24 Thus, additional families, rather than a denser marker set, will be critical to further fine map the chordoma locus region.
The minimal disease gene region on chromosome 7q encompasses approximately 16 megabases with 128 known or putative genes, 21 pseudogenes and the large T-cell-receptor beta locus. Among the annotated genes in the region, those related to known or suspected cancer-associated genes include growth factors (pleiotrophin), transmembrane receptors (ephrins A1 and B6, plexin A4), signaling proteins (b-raf and Rho guanine nucleotide exchange factor 5), transcription factors (TIF1, CNOT4, CREB3L2, EZH2) and apoptotic pathway proteins (caspase 2). In addition, the region contains genes whose products are involved in cancer-implicated pathways such as cell-cell adhesions (zyxin and C-type lectin superfamily member 5) and a regulator of NFkappaB (myotrophin). Less likely candidates for the familial chordoma gene in the region are several olfactory and taste receptors as well as genes involved in metabolism (e.g., isotypes of aldo-keto reductase). Finally, there is incomplete annotation of the region, suggesting that the familial chordoma gene may require complete characterization after it is identified as has occurred with genes associated with other familial tumor syndromes.
Our data indicated that family 4 was not linked to 7q33. The results from haplotype analysis with combined SNPs and STRs showed that the 2 daughters with chordoma did not share the same paternal haplotype in the minimal disease gene region identified in the 3 original families (Table I). However, these 2 daughters may have shared the same haplotype that was centromeric to the reported linked region (Table I). We cannot exclude the occurrence of sporadic chordoma in one of the affected daughters. But since chordoma is such a rare tumor, this is highly unlikely. The results from tests for genetic heterogeneity in the 4 families by the HOMOG program were suggestive of heterogeneity; however, they were not statistically significant, primarily due to the small size of family 4.
A tumor suppressor locus for chordoma has been suggested to map to 1p36 based on a combined loss of heterozygosity (LOH) and linkage analysis of family 4 (Zmax = 1.2)6 and a recent LOH study of 27 sporadic chordomas.25 Using 17 CA-repeat markers spanning 1p36.32-1p36.11 typed in family 4, Miozzo et al. identified a common haplotype of 11 consecutive markers shared by the affected father and 2 younger daughters (1 with chordoma and 1 with astrocytoma)6 (Fig. 1d). The eldest daughter had not been diagnosed with chordoma at the time of that study, therefore, she was classified as unaffected. She did not share the same paternal 1p36 haplotype as her 2 affected sisters. After she was diagnosed with chordoma, we questioned the evidence for linkage to the 1p36 region. To clarify the evidence of linkage in this region, we performed linkage analysis using 1p36 SNPs in all 4 chordoma families. Our results indicated that families 1, 3 and 4 were not linked to this region. A small positive linkage signal was observed in family 2, but this finding needs to be interpreted with caution due to the family's small size. Given the small magnitude of the linkage signal in family 2 and negative evidence for linkage in the 3 other families, our data do not support a major susceptibility locus for chordoma on 1p36.
In this linkage study, the use of a dense marker set in a group of families that included a large pedigree (family 1) resulted in a number of analytical complexities. Programs such as FASTLINK can analyze only a small number of markers and so are not well suited to SNP maps. Conversely, programs such as GENEHUNTER and MERLIN cannot handle a large number of study subjects, in particular, a pedigree the size of family 1. To minimize the problem associated with pedigree size, we also analyzed the data using SIMWALK2, which uses Markov chain Monte Carlo (MCMC) and simulated annealing algorithms in multipoint analyses.22 However, results from SIMWALK2 yield estimated rather than exact statistics, in contrast to GENEHUNTER and MERLIN. In addition, it may be difficult to guarantee the adequate convergence of the program; it also requires extensive computer time to find good approximations. Thus, current linkage programs need to be improved or new programs need to be developed to accommodate the utilization of SNP markers in complex pedigrees. Finally, multipoint linkage analyses implemented in all of these programs assume marker-marker equilibrium and therefore may not be well suited for densely selected SNP markers. In our study, there was no evidence for strong linkage disequilibrium (LD) among the measured SNPs in the 7q linked region, with r2 < 0.4 observed in all of the marker pairs examined by Haploview.26 In addition, since establishing phase was not problematic because of the extended pedigree structures in our study and a disease haplotype was observed to cosegregate with all affected individuals, we would not expect a significant impact from LD among SNPs on our linkage findings. However, as indicated by some recent studies, the presence of LD on a dense SNP map might cause inflated LOD scores in affected sib pairs or nuclear families.27, 28
In summary, the results of our linkage analysis with SNP markers corroborated a chordoma locus on 7q and demonstrated evidence of genetic heterogeneity in familial chordoma. In addition, our data did not support a previously reported major susceptibility locus for chordoma on 1p36. We are continuing to sequence candidate genes in the minimal disease gene region on 7q and are seeking additional chordoma kindreds to narrow this region and to identify other chordoma susceptibility loci.
We thank Ms. D. Zametkin for her outstanding skills as a research nurse; Ms. K. Haque for technical assistance; and, especially, the patients and their families for their participation. This work was supported by the Cancer Epidemiology and Biostatistics Fellowship Program, Division of Cancer Epidemiology and Genetics, National Cancer Institute (X.R.Y.) and a VA Merit Award (M.J.K.).