To the Editor:
We read with interest the recent report by Meenagh et al (1) that their study had been unable to independently replicate our previously reported linkage for osteoarthritis (OA) on chromosome 6 (2). Our linkage was to females with hip OA. Independent replication of linkage and association is the ambition for all genetic studies, and for multifactorial diseases in particular it has proven to be very difficult. As such, although their inability to replicate our findings disappointed us, we were not surprised. However, we were surprised by a number of errors in their report. The title is inaccurate since we had mapped the original linkage interval to chromosome 6 (an 11.4-cM interval at 6p12.3-q13) and not to chromosome 6p. They note our original logarithm of odds (LOD) score of 2.9 and cite Chapman et al (3) in their introduction when the article they are referring to is by Loughlin et al (4). They state that our later LOD score of 4.6 was found in a subset of 166 female-only pedigrees. The actual number was 146 pedigrees. We give details of the family structures in Table 1 of our report in 2002 (2) and show that 10 of those families were trios. The number of sibling pairs available for analysis was 166, but these were not independent. Meenagh et al refer to the 2 candidate genes that reside in this region, COL9A1 and BMP5, and state that COL9A1 encodes a minor cartilage collagen and BMP5 encodes a protease. Again, this is incorrect. COL9A1 encodes 1 of the 3 polypeptide chains of type IX collagen, and BMP5 encodes a signaling factor that binds to a cell surface receptor.
Meenagh et al state that their study has 80% power at a significance level of 0.05 if a genotype relative risk (GRR) of 4.0 is assumed, thereby arguing that a Type II error in their study is an unlikely explanation for their failure to replicate our results. We strongly disagree with their assertion. There is not enough information provided in their article to allow us to replicate their power calculation exactly. When parameterized in terms of the GRR, power is also highly dependent on the risk allele frequency and the mode of inheritance (dominant, recessive, additive, multiplicative) (5). However, we can calculate power using our linkage results and the Meenagh et al sample sizes.
Using the computer package ASPEX (online at http://aspex.sourceforge.net/), which performs linkage analysis using affected sibling pairs, our maximum multipoint LOD score (MLS) was 4.0. For 166 independent sibling pairs, an MLS of 4.0 corresponds to a locus-specific λs of 2.0, and for 146 independent sibling pairs, an MLS of 4.0 corresponds to a locus-specific λs of 2.25 (6). Thus for 166 sibling pairs, of which only 146 were independent, the corresponding λs is between 2.0 and 2.25. Using Risch's binomial method (6), we calculated the power to reject the null hypothesis (at a significance level of 0.05) with 288 independent sibling pairs, with 54 independent sibling pairs, and with 32 independent sibling pairs using these values for λs. These sample sizes correspond to Meenagh et al's total number of hip OA sibling pairs (containing both males and females), their total number of female-only hip OA sibling pairs, and their number of families with ≥2 female siblings, respectively. With 288 independent sibling pairs, Meenagh et al would certainly have sufficient power to reject the null hypothesis if the true λs was ≥2.0. However, these sibling pairs contain males. As clearly noted in our previous reports, our linkage is restricted to females. The relevant sample available for Meenagh et al is actually 54 female-only sibling pairs from 32 families. Meenagh et al had between 42% and 52% power to reject the null hypothesis using 32 independent sibling pairs and 53–75% power to reject the null hypothesis using 54 independent sibling pairs (an overestimate of their sample size). Thus, one quite probable reason that Meenagh et al failed to replicate our results is that their study was underpowered.
Meenagh et al state that they compared allele frequencies between a control group (n = 10) and 40 affected individuals (Table 3 in their report). This is an extraordinarily small sample size, and it has been established many times that findings based on these sample sizes can lead to spurious results (7). They state that there was no significant difference between the distributions of case and control groups, and they therefore conclude that there was no evidence for linkage to this region. With these very small sample sizes, however, the difference between the case and control risk allele frequencies would have to exceed 0.30 to have an 80% chance of rejecting the null hypothesis at a significance level of 0.05.
In their conclusions as to why their results do not support ours, Meenagh et al make several allegations against us, which we are compelled to refute. First, they assert that the Oxford cohort was gathered in a manner less systematic than theirs. This is untrue. Our collection strategy was both methodical and organized. In our study, only patients with symptomatic end-stage primary OA were recruited. All patients had undergone joint replacement surgery. In addition, detailed assessment of condition onset and progression, including a review of preoperative radiographs, was performed by full-time clinical research nurses. The assessment was made using questionnaires and, if necessary, examination at home visits. Any cases of secondary OA were excluded. All patients had severe pain, including night pain, and radiographic evidence of severe OA of grade 3 or grade 4 on the Kellgren/Lawrence scale (8). In any event, if there were any etiologic heterogeneity in our sample, it would reduce our power to detect linkage rather than lead to spurious findings of linkage.
Second, Meenagh et al state that because of the importance of using correct allele frequencies, they assessed their allele frequencies with care, implying that our allele frequencies were less rigorously determined and in error. In fact, we were very careful in estimating our allele frequencies. We stated in our article that the allele frequencies produced by the GAS program (online at http://users.ox.ac.uk/∼ayoung/gas.html) were used. As one of the many checks that we undertake in our analysis, these frequencies were compared with those produced by a separate program called DOWNFREQ (online at ftp://ftp.ebi.ac.uk/pub/software/linkage_and_mapping/linkage_cpmc_columbia/analyze/). Both programs produced identical allele frequencies. We run a number of checks on all our data. These include tests for Mendelian inconsistencies by use of the PEDCHECK program (online at http://watson.hgen.pitt.edu/register/soft_doc.html) and tests for preferential or missed amplification of one allele for each marker by the use of RECODE version 1.4 (online at http://watson.hgen.pitt.edu/register/soft_doc.html). We also checked the full-sib status of all our families by genotyping them for 50 unlinked markers and testing whether the proportion of alleles shared identically by descent was consistent with the expected proportion for each relative pair (online at ftp://linkage.rockefeller.edu/software/relative). Finally, Meenagh et al argue that multipoint analyses take no account of linkage disequilibrium, which can inflate slightly positive data. This is correct, and thus we presented both 2-point and multipoint data in our article.
In summary, we were very disappointed to hear that Meenagh et al were unable to replicate our findings. However, since we can demonstrate that there was not adequate power in their study to do so, we expect that it was more likely that their result was an example of a Type II error than that we had reported a Type I error.