Screening the genome for rheumatoid arthritis susceptibility genes: A replication study and combined analysis of 512 multicase families

Authors


Abstract

Objective

A number of non-HLA loci that have shown evidence (P < 0.05) for linkage with rheumatoid arthritis (RA) have been previously identified. The present study attempts to confirm these findings.

Methods

We performed a second genome-wide screen of 256 new multicase RA families recruited from across the United States by the North American Rheumatoid Arthritis Consortium. Affected sibling pair analysis on the new data set was performed using SIBPAL. We subsequently combined our first and second data sets in an attempt to enhance the evidence for linkages in a larger sample size. We also evaluated the impact of covariates on the support for linkage, using LODPAL.

Results

Evidence of linkage at 1p13 (D1S1631), 6p21.3 (the HLA complex), and 18q21 (D18S858) (P < 0.05) was replicated in this independent data set. In addition, there was new evidence for linkage at 9p22 (D9S1121 [P = 0.001]) and 10q21 (D10S1221 [P = 0.0002] and D10S1225 [P = 0.0038]) in the current data set. The combined analysis of both data sets (512 families) showed evidence for linkage at the level of P < 0.005 at 1p13 (D1S1631), 1q43 (D1S235), 6q21 (D6S2410), 10q21 (D10S1221), 12q12 (D12S398), 17p13 (D17S1298), and 18q21 (D18S858). Linkage at HLA was also confirmed (P < 5 × 10−12). Inclusion of DRB1∗04 as a covariate significantly increased the probability of linkage on chromosome 6. In addition, some linkages on chromosome 1 showed improved significance when modeling DRB1∗04 or rheumatoid factor positivity as covariates.

Conclusion

These results provide a rational basis for pursuing high-density linkage and association studies of RA in several regions outside of the HLA region, particularly on chromosomes 1p, 1q, and 18q.

Rheumatoid arthritis (RA) is a chronic, systemic, inflammatory disease with a complex genetic component. It is the most common cause of inflammatory polyarthritis in adults. The aggregation of RA in families with several autoimmune disorders (1) and the frequent presence of autoantibodies in the serum of RA patients (2) suggests that the disease has an autoimmune pathogenesis, although the precise etiology is unknown. An HLA (human leukocyte antigen) association with RA has been known for over two decades (3) and has been confirmed in numerous population studies (4). As shown by 3 separate genome-wide screens (5–7), genes in the HLA region provide the largest genetic contribution to RA. However, twin and family studies indicate that non-HLA genes are also involved in disease susceptibility (8).

In an attempt to identify the non-HLA loci involved in RA susceptibility, 4 genome-wide scans have been performed by different groups (5–7, 9) in studies involving RA multiplex families with affected sibling pairs. These studies have identified a number of non-HLA loci that demonstrate modest evidence of linkage with RA. However, despite a certain degree of overlap between these results, there is no locus outside of the HLA region that has yet been unambiguously identified. This may be due to the fact that the sample sizes have not been large enough to provide statistical power to detect loci that have only modest effects (10).

In our initial genome-wide screen of 257 RA families (6), we identified a number of non-HLA loci (on chromosomes 1, 4, 12, 16, and 17) associated with RA risk (P < 5 × 10−3). We have now performed a second independent genome-wide screen on a new panel of 256 RA multicase families. The results of this second analysis have allowed us to replicate evidence for linkage in several regions outside of the HLA locus. In addition, as suggested by Lander and Kruglyak (11), we have combined our current data set with that from our initial genome screen in an attempt to further enhance the evidence for linkage. The combined data set contains 677 affected sibling pairs (581 independent sibling pairs, using the Hodge correction for multiple affected siblings). Thus, our study constitutes the largest collection of RA sibling pairs reported to date.

SUBJECTS AND METHODS

North American Rheumatoid Arthritis Consortium (NARAC) family collection.

Multiplex RA families were recruited nationwide in the United States through the 12 participating recruitment centers of NARAC. Informed consent was obtained from every subject, including all participating family members, and the local institutional review board's approval was secured at every recruitment site prior to the start of enrollment. The criteria for entering a family into the study were 1) ≥2 siblings satisfied the 1987 American College of Rheumatology (formerly, the American Rheumatism Association) criteria for RA (12), 2) at least 1 of the siblings had documented erosions on hand radiographs, and 3) at least 1 of the siblings had disease onset between the ages of 18 and 60 years. The presence of psoriasis, inflammatory bowel disease, or systemic lupus erythematosus (SLE) was an exclusionary criterion in affected individuals.

Radiographs of both hands were obtained from all affected siblings at the time of recruitment into the study, unless films obtained within the prior 2 years were available. All of the radiographs were scored by a single radiologist (DF) who utilized a preliminary, locally developed severity score on a scale of 0–5. The following criteria were utilized for this scale: 0 = no erosion, 1 = subtle erosion, 2 = erosion without joint space loss, 3 = erosion with joint space loss, 4 = complete joint space loss without marked bone destruction or joint deformity, and 5 = marked destruction with or without joint deformity. All patients were examined by trained NARAC personnel and/or consulting rheumatologists at the time of entry and were scored for the presence of joint swelling and tenderness, joint alignment, and range of motion according to standard published methods (13).

Laboratory procedures.

Blood samples were collected from the affected siblings and, if available, from their parents, for analysis of DNA and serum. DNA was isolated from the peripheral blood mononuclear cells using a salting-out kit from BIO-101 (Carlsbad, CA). Rheumatoid factor was measured in each affected sibling at the University of Washington, Department of Laboratory Medicine Immunology Division, by latex-enhanced nephelometric assay (Behring Diagnostics, San Jose, CA), which uses human and rabbit IgG coated on latex beads as antigen. This assay was calibrated to the World Health Organization international standard for rheumatoid factor (14). Broad-level HLA–DRB1 typing and high-resolution DRB1∗04 typing were accomplished by initial polymerase chain reaction (PCR) amplification of groups of alleles (all DRB1 alleles for broad-level typing, and group-specific amplification for DRB1∗04 alleles) using biotinylated PCR primers, followed by hybridization to immobilized sequence-specific oligonucleotide probes in a linear array format. Positive hybridization reactions were detected using a streptavidin–horseradish peroxidase conjugate and a soluble colorless substrate, 3,3′,5,5′-tetramethylbenzidine (15). A computer algorithm based on the sequence-specific oligonucleotide probe hybridization pattern and the Anthony Nolan 1999 HLA sequence database (available at the World Wide Web site http://www.ebi.ac.uk/imgt/hla/nomen.html) were used to assign genotypes to each sample.

Of the 512 multiplex RA families used in this study, 256 families had been analyzed previously in our first genome-wide screen, which we will refer to as “screen 1” (6). (Although we originally reported on the analysis of 257 families in our first genome screen, reevaluation of hand radiographs from 1 family led us to exclude that family from the current combined analysis due to the absence of clear-cut erosions in either affected sibling.) DNA samples from affected sibling pairs and parents (where available) from the 256 new RA families were genotyped in the present study using markers spaced at ∼10-cM intervals throughout the entire genome; we will refer to this second genome-wide screen as “screen 2.” All markers used were the same as those analyzed in screen 1. These were from the set8A combo list from the Marshfield World Wide Web site (available at http://www.marshmed.org/genetics/). Some additional markers were added to the set at certain chromosomal locations, such as the HLA complex. The entire marker set consisted of 47 panels (379 markers), with each panel containing markers pooled together according to size and fluorescent label (6-FAM, HEX, and NED). Reaction conditions were standard for all markers used, as described in the Marshfield PCR protocol (available at the Marshfield Web site). Each panel of pooled markers was electrophoresed on a 3700 DNA Analyzer (PE Biosystems, Foster City, CA).

Data analysis.

Semi-automated sizing of alleles was carried out using GeneScan 2.1 software (PE Biosystems), and individual genotypes were assigned with the help of the Genotyper version 2.1 software (PE Biosystems). To ensure accuracy of the genotypes, 2 individuals manually checked each genotype. The alleles were then “binned” using a program implemented by us, and consistency in the inheritance of alleles within families was verified using the PedCheck program (16). After correction of any inconsistencies in the data, all allele sizes were downcoded for analysis. We also ensured that this data set did not contain any monozygotic twin pairs by searching the database for siblings with identical genotypes. Three such twin pairs were identified. We also verified the genetic relationships among individuals by applying the program Relative (17) to all of the marker data, to determine if the data set contained any half-sibling pairs or unrelated pairs that were originally ascertained as full-sibling pairs. We identified 16 half-sibling pairs and 2 unrelated pairs in this manner. We used Simwalk to perform marker-to-marker analysis and to check the positions of markers against published maps. By using Simwalk (18) to check genetic distances against published distances, we were able to identify one region of chromosome 18 for which the earlier map that was applied in our first genome scan was incorrect (6). The order and distances among markers have been corrected in the current analysis.

Analysis of the data was performed in 2 stages: 1) The data from the 256 new RA families were analyzed as an independent genome-wide screen (screen 2) for the current study. 2) The data from screen 1 and screen 2 were pooled and analyzed as a combined screen of 512 multicase RA families. Descriptive statistics for demographic and clinical parameters were produced using the SAS package (Chicago, IL). A model-free linkage analysis was carried out using SIBPAL from the SAGE version 3.1 (1997) package (19) for all autosomal marker data. We included all data from affected sibling pairs to implement 1-sided statistical t-tests for evaluating whether the identical by descent (ibd) sharing departs from expectation, assuming Mendelian inheritance of the marker. The pairs were analyzed as though they were independent, but a correction was made to the degrees of freedom to allow for correlation among pairs of siblings from larger families (20).

We used the program ASPEX (available at ftp://lahmed.stanford.edu/pub/aspex/index.html) to perform genetic linkage analysis of the X chromosome and to provide estimates of the average probability that sibling pairs share 0 alleles ibd (the ASPEX software was slightly modified to obtain more significant digits for the probability of sharing 0 alleles ibd). The GeneHunter Plus program was used to perform multipoint analysis of the data for regions that showed significant evidence for linkage using SIBPAL. For the multipoint analysis, we used the Kong and Cox (21) modified version of the nonparametric linkage score to identify regions showing evidence for genetic linkage between marker alleles and the presence of disease in families. The test that we performed evaluates whether there is excess sharing of alleles ibd among pairs of affected relatives. The Kong and Cox approach provides the most weight to relative pairs for which the inheritance of marker alleles can be inferred without error, and provides decreasing weight to marker data according to how poorly one can infer the ibd sharing at a locus among the relatives at a particular locus, given the available marker data from all of the relatives.

For regions that showed significant evidence for linkage using SIBPAL, we also evaluated the impact of covariates on the support for linkage (22). The program LODPAL evaluates the evidence for linkage using a model-free approach that allows the genotypic risk for disease to vary according to pair-specific covariates. As implemented by us (and using the defaults), the program fitted a single regression per covariate, thus assuming that the genotypic effects are approximately additive. Standard output from LODPAL indicates the support for linkage at each marker location by forming a likelihood ratio, in which the numerator models the covariate effects and evidence for linkage, while the denominator only models covariate effects. The distribution of 4.6 times this log10 likelihood ratio is conservatively approximated by a chi-square distribution with 2 degrees of freedom (22). A likelihood ratio statistic to evaluate for significant covariate effects can be obtained by multiplying by 4.6 the difference between the log likelihood ratio for a model including the covariate and one excluding it. In the presence of linkage, this likelihood ratio statistic is asymptotically distributed as a chi-square variate with 1 degree of freedom. The covariates that were examined included DRB1∗04, presence of erosions on hand radiographs, rheumatoid factor >100 IU, presence of nodules, and presence of a male affected sibling in the family.

RESULTS

RA families.

The structure of the 256 new RA families analyzed for the current genome-wide screen (screen 2), in terms of the number of parents and affected siblings for whom DNA was available and who were therefore genotyped, is shown in Table 1. Unaffected siblings were not typed, except in the case of 2 of the families in which parental DNA was not available. At least 1 parent was available for analysis in 46.1% of the families. Table 2 shows the distribution of families after combining the screen 2 data set with that from screen 1.

Table 1. Structure of the genotyping of families in screen 2*
Parents availableOverall (n = 256)With 2 affected sibs (n = 226)With 3 affected sibs (n = 27)With 4 affected sibs (n = 2)With 6 affected sibs (n = 1)
  • *

    Values are the number of families genotyped. sibs = siblings.

Both3129200
One8778900
None1381191621
Table 2. Structure of the genotyping of families in the combined screen*
Parents availableOverall (n = 512)With 2 affected sibs (n = 453)With 3 affected sibs (n = 53)With 4 affected sibs (n = 5)With 6 affected sibs (n = 1)
  • *

    Values are the number of families genotyped. Unaffected siblings were not typed, except in the case of 2 families for whom parental DNA was not available. sibs = siblings.

  • One family had half-siblings.

  • Nine families had half-siblings.

  • §

    Thirteen families had half-siblings.

  • Four families had half-siblings.

Both6660600
One1801621800
None266231§2951

The clinical and demographic features of the affected siblings who participated in the study are summarized in Table 3. A more detailed analysis of the demographic and clinical features of the patient population is described elsewhere (23). The data from screen 1 differed slightly from the previously published results (6), since the results now included data from a few more affected individuals, and 1 family was dropped for the current analysis (see Patients and Methods). Overall, the demographics and clinical features of the 551 affected siblings in screen 2 were very similar to those in screen 1. In screen 2, the affected siblings were mostly Caucasian (92.9%), had severe RA with high rates of erosions in the hand (94.8%), had a high frequency of seropositivity for rheumatoid factor (80.7%), and had a mean radiographic severity score of 3.3 on a scale of 0–5. These patients also had a relatively young age at disease onset (39 years) and long disease duration (14.6 years), and 70.9% of them were positive for HLA–DRB1∗04. The ratio of female patients to male patients in this data set was 3:1 (76% female), as expected for this disease.

Table 3. Clinical and demographic features of affected individuals in the North American Rheumatoid Arthritis Consortium collection*
Clinical/demographic featuresScreen 1 (n = 546)Screen 2 (n = 551)Combined screens 1 and 2 (n = 1,097)
  • *

    RF = rheumatoid factor.

Female, %77.476.076.8
Caucasian, %90.192.991.2
RF positive, %82.880.781.1
HLA–DRB1∗04 positive, %70.470.970.7
Hand erosions, %95.794.895.3
Mean age at disease onset, years39.139.039.1
Mean disease duration, years13.814.614.3

Genome-wide screen.

Screen 2.

The results of a nonparametric analysis (SIBPAL) performed on the genotypic data from the screen 2 data set are shown in Figure 1. Linkage at the HLA complex was confirmed (P < 0.00005). Evidence for linkage at the level of P < 0.05 was replicated at 1p13 (D1S1631), 6p21.3 (the HLA complex), and 18q21 (D18S858) in this independent data set. There was new evidence for linkage in this new data set at 9p22 (D9S1121; P = 0.001) and 10q21 (D10S1221 [P = 0.0002] and D10S1225 [P = 0.0038]). All markers that showed evidence for linkage at P < 0.05 in this second screen are listed in Table 4.

Figure 1.

Summary of the sibling pair analyses implemented by SIBPAL, showing evidence for linkage with rheumatoid arthritis. For each chromosome, the negative logarithm of the P values (−logP) was plotted against the markers by chromosomal position. For the X chromosome, the analysis was performed using the MLOD method from ASPEX, and the negative logarithm of the P values was plotted against the chromosomal position of the markers.

Table 4. Chromosomal regions with a significance level of P less than 0.05 in screen 2 for the SIBPAL analysis
LocusDistance (cM)No. of sibling pairsMean sharingP
Chromosome 1    
 D1S1631136.92900.53480.0175
 D1S549239.72940.53150.0264
Chromosome 2    
 D2S1353164.52890.53100.026
 GATA30E06210.42680.52770.0463
Chromosome 3    
 D3S23875.53000.52780.0451
Chromosome 6    
 D6S195934.22780.54040.0029
 D6S1641372440.56290.0001
 D6S26544.42420.57665.68 × 10−6
 TNFa44.72620.57722.72 × 10−6
 D6S162944.92540.58052.76 × 10−7
 D6S27347.72490.57813.66 × 10−7
 D6S29149.52830.56346.93 × 10−6
 D6S38953.82020.59271.43 × 10−5
 D6S242753.82930.55550.0007
 D6S101763.32440.53650.0067
 D6S241073.12870.54890.0004
 D6S105380.52820.53870.0108
 D6S103188.62800.53560.0134
 D6S1043100.93010.54060.0098
 D6S1056102.82630.53540.0226
Chromosome 7    
 TCRB1532920.53010.0446
Chromosome 8    
 D8S2640.732550.53160.0329
Chromosome 9    
 D9S112144.32980.54760.001
Chromosome 10    
 D10S122175.62900.55940.0002
 D10S122580.82930.54340.0038
 GATA121A0888.42810.52730.0443
Chromosome 11    
 D11S19842.12960.52960.0318
 ATA34E0833.02800.53100.0205
Chromosome 12    
 D12S104248.72540.53000.0454
 D12S129478.12650.53130.0435
Chromosome 16    
 D16S402113.52990.52950.0367
Chromosome 18    
 D18S85880.42930.53660.0098
 D18S135788.62990.52740.0472

Combined screen.

A similar nonparametric analysis of the combined data sets from screen 1 and screen 2, using SIBPAL, showed evidence for linkage at the level of P < 0.005 at 1p13 (D1S1631), 1q43 (D1S235), 6q21 (D6S2410), 10q21 (D10S1221), 12q12 (D12S398), 17p13 (D17S1298), and 18q21 (D18S858). Linkage at HLA was again confirmed, and the significance level was increased from P < 5 × 10−6 (in screen 1 or screen 2) to P < 5 × 10−12. A list of all markers that showed evidence for linkage at P < 0.05 is shown in Table 5, and the results of the SIBPAL analysis are also shown in Figure 1. The results of the SIBPAL analysis performed on the initial data set of 256 families (screen 1) have also been included in Figure 1 for comparison. Analysis of the data from screen 2 and from the combined screen using the GeneHunter program showed evidence for linkage at 1p13, 1q43, 6q21, 16p13.1, and 18q21 (Figure 2).

Table 5. Chromosomal regions with a significance level of P less than 0.05 in the combined screen for the SIBPAL analysis
LocusDistance (cM)P valueLambda
Screen 1Screen 2Combined screen
Chromosome 1     
 D1S1631136.90.01410.01750.00111.220
 D1S2141233.40.19050.07010.04871.096
 D1S549239.70.24770.02640.03181.133
 D1S235254.60.00480.12560.0031.151
Chromosome 2     
 D2S1353164.50.32870.0260.04531.071
Chromosome 4     
 D4S236193.50.06390.05650.01411.050
 D4S1647104.90.00010.91980.04361.056
Chromosome 5     
 D5S80719.00.07890.12270.03361.078
 D5S81722.90.06630.1740.04211.073
 D5S1462105.30.00790.28870.01761.175
 D5S2501116.90.05590.18950.03821.069
Chromosome 6     
 D6S195934.20.01350.00291.99 × 10−41.699
 D6S26544.42.39 × 10−65.68 × 10−65.33 × 10−111.807
 D6S162944.92.97 × 10−52.76 × 10−75.01 × 10−111.811
 D6S27347.73.36 × 10−63.66 × 10−74.19 × 10−121.830
 D6S29149.50.00136.93 × 10−69.90 × 10−81.822
 D6S38953.80.01261.43 × 10−57.99 × 10−71.406
 D6S242753.80.0310.00071.46 × 10−41.396
 D6S101763.30.27640.00670.01971.503
 D6S241073.10.28440.00040.00281.347
 D6S1021112.20.00750.18920.00831.136
Chromosome 8     
 D8S2640.70.08040.03290.01151.225
 D8S2778.30.00880.37130.0261.242
 D8S111067.30.01870.08550.00671.088
 D8S373164.50.00840.32340.02241.101
Chromosome 9     
 D9S112144.30.49720.0010.013031.193
Chromosome 10     
 D10S122175.60.17840.00020.00061.176
 D10S122580.80.66620.00380.04611.149
Chromosome 11     
 ATA34E0833.00.24180.02050.02921.087
Chromosome 12     
 D12S37336.10.00310.68660.04991.129
 D12S104248.70.12470.04540.02161.183
 D12S39868.20.00510.14290.00481.110
 D12S105283.20.02270.21920.02661.041
Chromosome 14     
 D14S74212.50.19430.05240.04331.024
Chromosome 16     
 D16S40343.90.00420.20830.00761.192
Chromosome 17     
 D17S129810.70.00530.09330.00311.000
Chromosome 18     
 D18S87754.40.06350.09620.02281.128
 D18S53564.50.11720.07480.03091.180
 D18S85880.40.04330.00980.0021.233
 D18S135788.60.26310.04720.04941.120
Figure 2.

Analysis of the data using the GeneHunter program, showing evidence for linkage at 1p13, 1q43, 6q21, 16p13.1, and 18q21. −log p = negative logarithm of the P values.

Analysis of covariates.

The results of the covariate analysis are summarized in Figure 3. On chromosome 6p, inclusion of DRB1∗04 genotypes increased the logarithm of odds (LOD) score, with the most significant finding observed at D6S1629, which increased the LOD score from 10.99 to 14.28 (P < 0.00009). Significant increases that were evidence for linkages on chromosome 1 were observed when covariates were modeled jointly. Inclusion of DRB1∗04 significantly increased the probability of linkage (P < 0.003) at D1S1589 (192 cM), yielding an LOD score of 1.91 compared with 0 without the covariate, and inclusion of rheumatoid factor >100 IU as a covariate produced significant increases indicative of linkage (P < 0.0005) in the region around D1S551 (114 cM), with the LOD score going from 0.13 to 2.71. However, no significant increase indicating evidence for linkage was observed for chromosome 18 or for any of the other chromosomes. The “nodules” and “erosive” covariates did not significantly contribute to a difference in the level of linkage (data not shown).

Figure 3.

Covariate analyses of the contribution of DRB1∗04, male affected siblings (sib), and rheumatoid factor (RF) seropositivity (>100 IU) for selected chromosomal regions, showing evidence for linkage in sibling pair tests. Evidence of a significantly increased likelihood of linkage on chromosome 1 was observed when jointly modeling covariates, including DRB1∗04 and RF seropositivity. On chromosome 6p, inclusion of DRB1∗04 genotypes increased the logarithm of odds (LOD) score. However, no significant increases that would indicate evidence for linkage were observed for chromosome 18 or for all other chromosomes.

DISCUSSION

In this report, we have presented the results of our second genome-wide screen in 256 new multicase RA families from the NARAC collection. In addition, we have performed a combined analysis of the data from our 2 independent genome-wide screens, in a total of 677 RA sibling pairs (corresponding to 581 independent pairs, using the Hodge correction [24]). Analysis of the demographic and clinical features of the affected siblings from the screen 1 and screen 2 data sets showed that the 2 patient populations were very similar to each other in terms of disease features. This was reassuring, since homogeneity between the 2 populations is clearly desirable if they are going to be used to compare the results of 2 independent genome-wide screens.

The nonparametric analysis of the data from screen 2 revealed several regions that showed evidence for linkage at the level of P < 0.05. Of these, 3 regions were replicated from our initial screen, namely, 1p13 (D1S1631), 6p21.3 (the HLA complex), and 18q21 (D18S858). Although a number of other regions had shown some modest evidence for linkage (P < 0.005) in screen 1, these were not replicated in screen 2. Similarly, some new loci showed evidence of linkage at P < 0.05 in the current data set (screen 2), and not in screen 1. These results may be explained by the limitations of the 2 individual screens in terms of power. Both data sets consisted of similar numbers of affected sibling pairs, and it is likely that these sample sizes do not provide adequate power to unambiguously replicate linkage at loci that have a modest effect in RA. In fact, several thousand affected sibling pairs may be required to demonstrate “definite” linkage for genes of modest effect (11). In the case of RA, it is becoming increasingly clear that pooling results across different genome screens (10) will be required to increase the power to detect true linkage for loci with modest effects. The sample size of 581 independent sibling pairs is adequate to detect locus-specific increased recurrence risks (25) of 1.21, 1.31, and 1.40 at the 0.05, 0.005, and 0.0005 significance levels, respectively.

As a start, we combined the data sets from our 2 independent genome-wide screens, and repeated the analyses that we carried out on each screen. The increased power of this combined data set to detect linkage is reflected in the improvement in the significance level of linkage at the HLA markers, which improved by ∼10−6-fold. Thus, it may be extrapolated from this observation that this larger data set has improved power to detect other loci having more modest effects than HLA. Among the non-HLA loci, evidence for linkage at P < 0.005 was observed at 1p13 (D1S1631), 1q43 (D1S235), 6q21 (D6S2410), 10q21 (D10S1221), 12q12 (D12S398), 17p13 (D17S1298), and 18q21 (D18S858). Analysis of the data using the GeneHunter software showed evidence of linkage (LOD score >2) at 1p13, 1q43, 6q21, and 18q21 (Figure 2). The GeneHunter results also demonstrated linkage at 16p13.1 (LOD score >2), although this locus only reached a P = 0.007 level of significance in the SIBPAL analysis.

The analysis of covariates, including DRB1∗04, presence of erosions on hand radiographs, rheumatoid factor >100 IU, presence of nodules, presence of a male affected sibling in the family, and ethnicity, did not show a significant increase in linkage, except on chromosomes 1 and 6. On chromosome 6p, the increase in LOD score observed with the DRB1∗04 covariate showed that DRB1∗04 or a closely linked locus in disequilibrium with DRB1∗04 modifies the genetic risk for RA. Further stratified analysis showed that the presence of DRB1∗04 in both siblings increased the evidence for linkage; however, among sibling pairs who were both negative for DRB1∗04, there remained evidence for linkage (P = 0.0002). These data are consistent with our recent observations that the DRB1 locus does not explain all of the HLA associations with RA (26). The results for chromosome 1 showed that there may be multiple genetic factors on chromosome 1, and that the genetic effects may be most pronounced in subsets of RA patients. Because there was no evidence for linkage at D1S1589 when covariate effects were not modeled, the distribution of the test for a significant covariate effect may not be very well approximated by chi-square distribution (22).

Further studies of additional data will help to evaluate whether there is linkage heterogeneity according to DRB1∗04 status for D1S1589. Stratifying the data set in terms of presence of rheumatoid factor >100 IU or presence of DRB1∗04 may be particularly useful for further fine mapping at the 2 loci that showed an increase in LOD scores when these covariates were included in the analysis.

To date, 3 other genome-wide screens have been performed in RA multicase families (5, 7, 9). Our combined screen in the present study represents the largest data set among all of these genome screens in RA. The strongest evidence for linkage was observed at the HLA locus in all screens, except one (9). In addition, some of the non-HLA loci that provided modest evidence of linkage (P < 0.005) in our combined screen overlap with regions that have been identified in other RA screens. The 1q43 locus has been linked to RA in both the European (5) and the British (7) data sets. The 6q21 region, however, showed evidence of linkage to RA in the British data set only, whereas linkage at the 18q21 locus was observed in the European study. Studies of candidate regions have implicated the 17q22 locus (27), which is syntenic to the Cia5 locus in the rat, and the 8q13 locus in the vicinity of the CRH gene in RA susceptibility. In our first genome-wide screen (screen 1), the 17q22 locus showed encouraging evidence for linkage (P < 0.0017). However, this result was not replicated in our second independent screen, nor was it present in our combined data set of 512 families. The 8q13 locus, on the other hand, was close to the P < 0.005 level of significance for linkage in our combined screen (D8S1110; P = 0.0067). Further analyses will be required to ascertain whether this modest level of linkage with RA is due to CRH or to some other gene in the region.

The non-HLA regions showing linkage with RA have also been implicated in a number of other autoimmune diseases. The 1p13 region has been linked to SLE (28). The 1q43 region containing the D1S235 marker was first linked to susceptibility to SLE using a candidate approach (29), and this linkage was subsequently confirmed in 3 separate genome-wide screens (28–30). The 18q21 locus has shown evidence for linkage with type I insulin-dependent diabetes mellitus (31–33), SLE (34), and Grave's disease (35) in humans, and the orthologous region in rodents has been implicated in animal models of SLE (36), multiple sclerosis (37), and experimental allergic encephalomyelitis (38).

The linkage data described herein provide only a rough guide as to the location of RA susceptibility genes. A further analysis of linkage with a dense set of markers may allow us to narrow down regions of interest to within 5–10 cM. This is still a large region of the genome, and it is premature to aggressively evaluate a large number of candidate genes in these regions. Given that the etiology of RA is still unknown, the spectrum of reasonable candidate genes can be very broad, even within a particular linked region. Nevertheless, it is interesting to examine these regions for particularly provocative candidate genes. For example, one possible candidate gene on chromosome 18q21 is the TNFRSF11A gene, which encodes receptor activator of nuclear factor κB (RANK). RANK has been shown to be critically involved in the differentiation of osteoclasts in inflamed synovium and is clearly important for the development of bone resorption in inflammatory arthritis (39). Because the NARAC collection is specifically selected for erosive RA, our patient population is well suited to detect genes involved in the pathogenesis of bony erosions. Therefore, we intend to pursue family-based association studies of RANK in our family collection. Association studies on multiplex family data sets can offer more statistical power than that of standard case–control analyses, particularly when association is specifically examined in families with evidence of linkage to the region of interest (40). Nevertheless, a candidate-gene strategy is risky, and it is not an efficient way to thoroughly evaluate the candidate regions of linkage that we have identified.

For RA and other complex autoimmune disorders, the challenge is to move to gene identification in these regions of modest linkage. Further expansion of multiplex family collections will enhance confidence in these linkage results. However, it is likely that only dense haplotype-association mapping will provide sufficiently precise localization to justify a comprehensive analysis of all candidate genes in a region. A combined approach based on positional cloning and candidate-gene analysis has led to gene identification in Crohn's disease (41, 42). We are hopeful that an analysis of candidate-linkage regions in RA will provide similar success in the future.

Acknowledgements

We would like to thank the following people for contributing to this work: Nancy Postiglione and Nina Kohn for data management, Dorothy Guzowski for synthesis of oligonucleotides, Dakai Zhu and Wenfu Wang for enabling us to perform error-checking of the genotyping data on the Web, Jianfang Chen and Qingsong Liu for the binning and error-handling programs used, and Sally Kaplan, Mary Noelle Holly, Clare Cleveland, Peggy Rasmussen, Sarah Kupfer, Dana DePew, Carol Blalock, Karen Rodin, Linda Ingles, Diana Amox, Kay Randall, Donna McGregor, Marianna Crane, Pat Cummins, Phyllis Daum, and Jennifer Pearce for outstanding field work in recruiting RA families. We would also like to extend our thanks to all of the patients and their families for participating in the NARAC study and making this research possible.

Ancillary