Association analysis of the R620W polymorphism of protein tyrosine phosphatase PTPN22 in systemic lupus erythematosus families: Increased t allele frequency in systemic lupus erythematosus patients with autoimmune thyroid disease




Recent case–control studies show associations of the minor T allele (of the C1858T single-nucleotide polymorphism corresponding to the R620W amino acid substitution) of PTPN22 with multiple autoimmune diseases, including systemic lupus erythematosus (SLE). We performed family-based association studies of this polymorphism in 4 independent cohorts containing SLE patients and their parents and/or other family members.


A total of 2,689 individuals from 902 independent Caucasian families with SLE were genotyped using polymerase chain reaction pyrosequencing (cohorts 1 and 2) and the Sequenom MassArray system (cohorts 3 and 4). The transmission disequilibrium test (TDT) and the pedigree disequilibrium test (PDT) were conducted to assess the evidence of association.


The 1858 C > T allele frequencies of the parents showed no deviation from Hardy-Weinberg equilibrium within each cohort. No evidence of preferential transmission of the T allele from heterozygous parents to their affected offspring was observed in each of the 4 cohorts or in the combined sample. Consistent with the TDT result, the PDT analysis revealed no significant association between the T allele and SLE. In 54 of the 661 SLE patients (cohorts 1 and 3) with documented autoimmune thyroid disease, the T allele frequency was higher than in individuals with SLE alone (16.7% versus 8.5%; P = 0.008, odds ratio 2.16 [95% confidence interval 1.25–3.72]).


The R620W polymorphism of the PTPN22 gene is not a major risk allele for SLE susceptibility in our sample of Caucasian individuals from northern America, the UK, or Finland, but it appears to be a risk factor for the concurrent autoimmune diseases of autoimmune thyroid disease and SLE.

Systemic lupus erythematosus (SLE) is a systemic autoimmune disease in which the deposition of autoantibodies and their immune complexes in many organs and tissues causes inflammation and tissue injury. Multiple abnormalities of T and B lymphocytes, which are central to the pathogenesis of the disease, are frequently found in patients with SLE. The PTPN22 gene is expressed primarily in cells of hematopoietic lineage, and its encoded protein, tyrosine phosphatase, regulates tyrosine kinases that participate in T cell activation (1).

Findings of epidemiologic, linkage, and association studies imply that susceptibility to SLE is influenced to a great extent by genetic factors (2). The minor T allele of the C1858T polymorphism, which causes the R620W amino acid substitution and disrupts the formation of the Lyp–Csk complex (3), has recently been associated with multiple autoimmune diseases (4, 5), including type 1 diabetes mellitus (3, 6–8), autoimmune thyroid disease (7, 9), rheumatoid arthritis (RA) (10–12), and SLE (12, 13), but not multiple sclerosis (14). In 1 SLE study (13), the reported T allele frequency estimates were 12.67% in 525 independent North American Caucasian SLE cases and 8.64% in 1,961 Caucasian control individuals (P = 0.00009). In another SLE study (12), the T allele frequency estimates were 9.8% in 338 Spanish Caucasian SLE cases and 7% in 512 ethnically matched controls (P = 0.03).

Most of the previous association studies have used a population-based case–control design, which is vulnerable to the effects of poor matching of case and control samples, as well as population stratification. We report herein the findings of a family-based association analysis of 2,689 Caucasian individuals from 4 independent cohorts. Results of our analyses do not support the reported association of the PTPN22 R620W polymorphism with SLE, but provide evidence of an association within a subset of SLE patients with autoimmune thyroid disease.


SLE families

Four cohorts containing simplex and multiplex SLE families of Caucasian origin were analyzed (Table 1). Samples of subjects of European ancestry were collected in Los Angeles (cohort 1; 29% western, 22% eastern, 7% northern, 16% southern, and 26% central European origin), London (cohort 3; northern and western European origin), and Finland (cohort 4; Finnish origin [15]). Subjects of cohort 2 were collected in Columbus, OH, and were of Caucasian origin; however, further information on their ancestry is not available. All 981 SLE patients met the American College of Rheumatology criteria for the classification of SLE (16). SLE patients were enrolled into the study after obtaining their written informed consent and following protocols approved by the appropriate institutional review boards.

Table 1. The 4 cohorts studied in the family-based association study
CohortTotal no. of familiesNo. of complete trios*No. of other simplex familiesNo. of multiplex familiesCountry of recruitmentTotal no. of individuals
  • *

    Complete trios are families in which both parents and 1 affected offspring were available for study.

  • See Subjects and Methods for recruitment details. UCLA = University of California, Los Angeles; OSU = Ohio State University, Columbus.

  • In these families, 1 of the parents, 1 child with systemic lupus erythematosus (SLE), and 1 unaffected child were available for study.

  • §

    Although some trios from cohort 1 were derived from multiplex families, only 1 affected offspring in each family was included in the family-based association study.

  • In these families, neither parent, 1 child with SLE, and 1 unaffected child were available for study.

  • #

    These families are missing one or both parents and are therefore not informative for transmission disequilibrium testing but could be used for pedigree disequilibrium testing.

1219172470§US (UCLA)659
2646400US (OSU)192

DNA preparations

Genomic DNA from SLE patients and their families was isolated from peripheral blood mononuclear cells using a standard protocol.

Genotyping PTPN22 R620W polymorphism.


The polymerase chain reaction (PCR) was performed using a 15-μl mixture containing 20–40 ng of genomic DNA template, standard PCR buffer, 200 μM dNTPs, 2 mM MgCl2, 100 nM primers, and 1 unit of AmpliTaq Gold Polymerase (Applied Biosystems, Foster City, CA). The PCR began with an initial denaturation at 95°C for 5 minutes, followed by 50 cycles of denaturing at 95°C for 15 seconds, annealing at 65°C for 30 seconds, plus extension at 72°C for 15 seconds, and it ended with a final elongation step at 72°C for 5 minutes. Using the forward primer 5′-TGC-CCA-TCC-CAC-ACT-TTA-TT-3′ and the reverse primer 5′-GGA-TAG-CAA-CTG-CTC-CAA-GG-3′, a 133-bp amplicon was generated and evaluated for size and purity by agarose gel electrophoresis. The forward primer was purified by high-performance liquid chromatography and was biotinylated at the 5′ end to allow immobilization onto streptavidin-coated Dynabeads M-280 (Dynal Biotech, Oslo, Norway).

Pyrosequencing (17) of 10 μl of the PCR product immobilized onto the beads was performed using the sequencing primer 5′-CCC-CTC-CAC-TTC-CTG-3′ and single-nucleotide polymorphism (SNP) reagent kits (Pyrosequencing, Uppsala, Sweden) according to the manufacturer's instructions. The SNP genotype analysis was performed using the SNP software in a PSQ 96MA system (Pyrosequencing). The allelic calling rate of 851 DNA samples from cohorts 1 and 2 was 98%, and the assay reproducibility was 100% in >100 repeated samples. No Mendelian errors were detected in the SNP genotyping data of 725 families.

Sequenom MassArray system

Genotyping was carried out with a MassArray technique (Sequenom, San Diego, CA) (18) using a chip-based matrix-assisted laser desorption ionization−time-of-flight (MALDI-TOF) mass spectrometer (19). Multiplex SNP assays were designed using SpectroDesigner software (Sequenom); 384-well plates containing 2.5 ng of DNA in each well were amplified by PCR following the specifications of Sequenom. After PCR, shrimp alkaline phosphatase (Sequenom) was added to samples to prevent future incorporation of unused dNTPs that could interfere with the primer extension assay. Allele discrimination reactions were conducted by adding the extension primer(s), DNA polymerase, and a cocktail mixture of dNTPs and ddNTPs to each well. MassExtend clean resin (Sequenom) was added to the mixture to remove extraneous salts that could interfere with MALDI-TOF analysis.

Genotypes were determined by spotting an aliquot of each sample onto a 384 SpectroChip (Sequenom), which was subsequently read by the MALDI-TOF mass spectrometer. Assay conditions and primer sequences are available on request. The Sequenom MassArray system had a 95.8% genotyping success rate for PTPN22 R620W. Four Mendelian errors were detected and excluded from data analysis in cohort 3.

Statistical analysis

The transmission disequilibrium test (TDT) (20) was used to investigate whether the T allele is preferentially transmitted from heterozygous parents to their SLE-affected offspring in each cohort and in the combined sample, using the extended TDT software (21) for complete trios (both parents and 1 child). The pedigree disequilibrium test (PDT) (22) was also used when incomplete trios and other family structures were included. The genotype and allele distributions of the PTPN22 SNP in SLE patients with and without autoimmune thyroid disease were compared using a contingency table and Fisher's 2-tailed exact test (Prism 3.02 software; GraphPad Software, San Diego, CA). The genotype distribution of the PTPN22 SNP was compared in SLE patients with CC/CT and CC genotypes to explore the association of this polymorphism with the age at diagnosis of SLE using 2-sample, 2-sided t-tests (Prism 3.02). P values less than 0.05 were considered significant.


No skewed transmission of the 1858T allele. The 1858 C > T allele frequencies of the parents from independent families in each cohort analyzed separately showed no significant deviation from Hardy-Weinberg equilibrium. Using the TDT, skewed transmission of the T allele from heterozygous parents to their affected offspring was not observed within each cohort or in the combined sample of 611 independent Caucasian family trios with SLE (Table 2). There was no consistent increased or decreased preferential transmission of the T allele among the 4 SLE cohorts.

Table 2. Transmission of the minor 1858T allele of PTPN22 in complete trios from 4 SLE family cohorts*
CohortNo. of triosNo. of CC parentsNo. of TT parentsNo. of CT parentsNo. of transmissions of T alleleNo. of nontransmissions of T alleleP
  • *

    Complete trios are families in which both parents and 1 affected offspring were available for study. SLE = systemic lupus erythematosus.


Consistent with the TDT results, no significant association between the minor T allele and SLE was observed in the combined family sample by PDT (P = 0.8).

No difference in allele frequency between family-based cases and controls. Table 3 shows the allele frequency estimates in the 611 independent and complete SLE trios. The allele frequencies for the cases were estimated from the parental transmitted alleles to the affected children, and the control allele frequencies were estimated from the parental nontransmitted genotypes. We believe that this Affected Family–Based Controls (AFBAC) approach (23), which uses complete trios and creates controls by subtracting the alleles in the affected SLE children from the parental alleles, provides an appropriate method with which to estimate control allele frequencies that are matched for age, sex, and ethnicity when no formal control group is available. This analysis allows data from the homozygous parents to be included in the statistical test.

Table 3. AFBAC case–control PTPN22 1858T allele frequency estimates from complete trios in 4 independent cohorts from the current analysis and previously reported case–control studies of SLE, RA, type 1 DM, and Graves' disease*
CohortPTPN22 1858T allele frequency, no. (%) of chromosomesP
Parental transmitted alleles, caseParental nontransmitted alleles, control
  • *

    As with the Affected Family–Based Controls (AFBAC) method, the case frequencies are derived from the alleles transmitted from the parents to the affected children in these trios, and the control allele frequencies are derived from the nontransmitted parental alleles (23). Using cohort 1 as an example, among 21 T alleles in the cases, 18 were preferentially transmitted from heterozygous parents to affected offspring, and 3 (21 − 18 = 3) were directly transmitted from TT homozygous parents and did not provide informative transmission data for transmission disequilibrium testing. To calculate the numbers of T alleles in the control sample that were not passed to the children with systemic lupus erythematosus (SLE), take the total number of parental chromosomes, which is 688. The parents have a total of 54 T alleles (2 × 3 TT + 48 TC). Their children have 21 T alleles. Thus, the controls have 54 − 21 = 33 T alleles in the untransmitted parental chromosomes. RA = rheumatoid arthritis; DM = diabetes mellitus; NS = not significant.

Present cohorts   
 121/344 (6.10)33/344 (9.59)NS
 215/128 (11.72)10/128 (7.81)NS
 362/602 (10.30)70/602 (11.63)NS
 424/148 (16.22)16/148 (10.81)NS
 1 + 2 + 3 + 4122/1,222 (9.98)129/1,222 (10.56)NS
Previously reported cohorts (ref.)   
 SLE 1 (13)133/1,050 (12.67)339/3,922 (8.64)0.00009
 SLE 2 (12)66/676 (9.76)71/1,024 (6.93)0.03
 RA 1 (10)146/926 (15.77)161/1,852 (8.69)5.6 × 10−8
 RA 2 (12)171/1,652 (10.35)153/2,072 (7.38)0.001
 RA 3 (11)96/604 (15.89)63/748 (8.42)0.00003
 Type 1 DM (3)129/936 (13.78)111/1,218 (9.11)0.0006
 Graves' disease (9)151/1,098 (13.75)67/858 (7.81)0.000034

As expected from the TDT results, tests of differences in T allele frequencies within the individual cohorts showed no significant differences. In the combined sample of all 4 cohorts, the minor (T) allele frequency of 9.98% (122/1,222) in the SLE cases was not significantly different from the frequency of 10.56% (129/1,222) observed in the family-based controls and was not significantly different from the frequency of 8.64% (339/3,922) of the previously reported controls (P = 0.15) (13).

The minor T allele is associated with autoimmune thyroid disease in SLE patients. Since it has previously been reported that the minor T allele is associated with Graves' disease (7, 9) and Hashimoto thyroiditis (5), we also tested its association with autoimmune thyroid disease in SLE patients. Among all 661 SLE patients in cohorts 1 and 3, the genotype distribution of the C1858T polymorphism was found to be different in the 54 SLE patients who had autoimmune thyroid disease compared with the 607 SLE patients who did not (P = 0.0003) (Table 4). (No clinical data for autoimmune thyroid disease were available in cohorts 2 and 4.) The higher minor T allele frequency was observed in the autoimmune thyroid disease group (16.7% versus 8.5%; P = 0.008, odds ratio 2.16 [95% confidence interval 1.25–3.72]) (Table 4). In this very small sample, the T allele also showed a trend toward preferential transmission (transmitted:nontransmitted 8:4; P = 0.25) from heterozygous parents to the offspring affected with both SLE and autoimmune thyroid disease. We considered performing a similar analysis of the co-occurrence of SLE and RA, but the sample size was even smaller (n = 3) in cohort 1.

Table 4. Genotype and allele distributions of the R620W PTPN22 SNP in SLE probands with and without autoimmune thyroid disease in the combination of cohorts 1 and 3*
R620W PTPN22No. (%) of SLE patients with autoimmune thyroid diseaseNo. (%) of SLE patients without autoimmune thyroid diseaseP
  • *

    A total of 661 systemic lupus erythematosus (SLE) patients comprise cohorts 1 (n = 219) and 3 (n = 442). No clinical data for autoimmune thyroid disease were available in cohorts 2 and 4. P values were determined by Fisher's exact test (2-tailed). SNP = single-nucleotide polymorphism.

  • Odds ratio 2.16 (95% confidence interval 1.25–3.72).

 CC39 (72.2)507 (83.5)0.0003
 CT12 (22.2)97 (16.0) 
 TT3 (5.6)3 (0.5) 
 C90 (83.3)1,111 (91.5) 
 T18 (16.7)103 (8.5)0.008

Association of the minor T allele with younger age at SLE diagnosis in cohort 1. Genetic influences may manifest when the disease has an earlier age at onset. We tested this in patients from cohorts 1 and 3 in whom the ages at SLE diagnosis were documented. When stratifying by CT/TT versus CC, a younger age at diagnosis was observed in the 175 patients from cohort 1. In this cohort, there were 15 individuals with CT/TT genotypes whose mean ± SD age at diagnosis was 20.7 ± 10.1 years and 160 individuals with CC genotypes whose age at diagnosis was 26.3 ± 9.7 years (P = 0.04). However, this difference was not observed in 104 cases from cohort 3, where 27 individuals with CT/TT genotypes had a mean ± SD age at diagnosis of 30.6 ± 11.1 years and 77 individuals with CC genotypes had a mean ± SD age at diagnosis of 30.4 ± 11.6 years (P = 0.95).

Higher frequency of the minor T allele in familial, and not sporadic, cases in cohorts 1 and 4. Since 185 independent cases from the 525 previously studied SLE patients were members of affected sibpair families, we considered the potential difference of this genetic association between familial SLE (1 or more family members have the disease) and sporadic SLE (no other family member has the disease), using the multiplex families from cohorts 1 and 4. Cohort 1 contains 84 trios derived from SLE multiplex families and 88 trios classified as sporadic cases. Among the 74 Finnish complete trios in cohort 4, 56 are sporadic and 18 are from SLE multiplex families. While the T allele frequency was higher in familial cases than in sporadic cases (7.74% versus 4.55% in cohort 1 and 19.44% versus 15.18% in cohort 4), these differences were not significant within the individual samples.


Compared with population-based case–control designs, the TDT method, although it may provide less power, is less vulnerable to Type I statistical errors that are due to population substructure. Our combined-sample TDT result of 105 transmissions and 112 nontransmissions of the T allele is consistent with the recently published study by Kyogoku et al (13), which showed 70 transmissions and 57 nontransmissions from heterozygous founders to SLE-affected offspring (P = 0.22). Those authors stated that their sample size was “underpowered to detect the effect by TDT” and that 229 trios would be required for 80% power at P < 0.05 (13). The total number of trios studied within the US (cohorts 1 plus 2 [the US cohort], 172 + 64 = 236 trios) and the UK (301 trios) thus provides adequate power to detect an association between the PTPN22 R620W SNP and SLE. Our combined sample of 611 complete trios provides 80% power to detect an association of an allele with a genotype relative risk of ∼1.35 or greater, with a level of significance of 0.05 (24), which is appropriate for a replication analysis of a single SNP.

Using the family-based TDT (Table 2) and AFBAC case–control analyses (Table 3), no evidence to support the reported association between the R620W polymorphism and SLE was observed in each of the 4 independent cohorts and in the combined sample. The 1858T allele frequency in independent SLE cases in the combined data was 9.98%, which is not significantly different from the 10.56% frequency observed in the family-based controls and the previously reported 8.64% frequency in the largest population-based controls published thus far (P = 0.15) (13). Recognizing the possible bias that may result from including only complete trios, we tested the allele frequency between the 172 complete and 47 incomplete trios in cohort 1, which had the lowest T allele frequency in SLE patients among the 4 cohorts. The frequency estimate was lower in cases from the incomplete trios (3.19%) than in cases from the complete trios (6.1%). Thus, if the incomplete trios were included in our analyses, they would not provide evidence for an SLE association with the T allele. Although the allele frequency of cases in cohort 1 was quite low (6.10%) (Table 3), the higher control frequency of 9.59% indicated that this sample was not different from the others.

Interestingly, the allele frequencies of the 1858T polymorphism of PTPN22 differed significantly among the cases in our 4 cohorts (ranging from 6.10% to 16.22%; P = 0.005 by the Fisher-Freeman-Halton exact test), whereas the control frequencies in the respective families exhibited a smaller range, and they did not differ significantly from each other (ranging from 7.81% to 11.63%; P = 0.58). In the previously reported findings (13), the T allele frequency showed less variation among the 4 cohorts: 13.5%, 12.5%, 12.2%, and 12.2%. In a second SLE study (12), the T allele frequency estimates were 9.8% in 338 Spanish Caucasian SLE cases and 7% in 512 ethnically matched controls (P = 0.03). All the subjects were from Spanish populations, which may be genetically different from other Caucasian populations because of their admixture with Arabians and Africans. An RA study (10) used 926 controls that were also used in the SLE study, and the controls used in the study of type 1 diabetes mellitus contain 189 students attending college in California, 160 parents from the Minnesota Twin Study, 46 healthy individuals from the Barbara Davis Center for Autoimmune Diabetes in Denver, CO, and 214 healthy Italians collected in Italy (3).

The R620W polymorphism of the PTPN22 gene does not appear to be a major risk allele for SLE susceptibility among Caucasian individuals from northern America, the UK, or Finland. Possible reasons for the conflicting inferences between other studies and this one may include disease and genetic heterogeneity that often confound studies of multigenic complex diseases. These discrepancies highlight the utility and relevance of the family-based association design. In an examination of the 4 cohorts reported here and the data from 2 published cohorts (SLE 1 and SLE 2 in Table 3), the estimated odds ratio of the combined results was 1.33, with a 95% confidence interval of 1.15–1.53, indicating that in this combined sample, the impact of PTPN22 R620W on SLE is not as strong as in the original sample.

The frequency of the 1858T allele varied considerably among the SLE cases in these 4 cohorts, presumably reflecting stochastic fluctuations, although the variation among the control frequencies derived from the untransmitted alleles was smaller than that among the cases. The heterogeneity among the 4 case cohorts supports the notion that the most appropriate study design is family-based, where appropriate matching is not an issue. This lack of replication highlights the importance of suitably powered large studies in assessing the role of genetic polymorphisms in common complex disorders.

It is also possible that other polymorphisms in linkage disequilibrium with the C1858T SNP may be causative alleles of PTPN22, which would account for the previously observed association with type 1 diabetes mellitus (3, 6–8), autoimmune thyroid disease (7, 9), RA (10–12), and SLE (12, 13). Sequencing the PTPN22 gene in 48 RA patients identified 13 tagged SNPs, among which only the T allele of the C1858T SNP (and a SNP haplotype carrying the T allele) is strongly associated with RA (25). The associated T allele of the 1858 SNP appears to be the most likely risk allele for susceptibility to autoimmune diseases.

It is interesting that the association between the minor T allele and autoimmune thyroid disease was observed in SLE probands from cohorts 1 and 3, which supports previous reports of an association between this allele and Graves' disease (7, 9) as well as Hashimoto thyroiditis (5). However, the sample size is too small to have sufficient statistical power to detect any significant associations. Individuals with a co-occurrence of autoimmune thyroid disease and SLE may be enriched for genetic factors that predispose to autoimmune diseases. This includes the T allele encoding the 620W of the PTPN22 gene, and thus, the T allele may be a risk factor for developing concurrent autoimmune diseases.


We thank all the participating patients and their family members, as well as the many physicians for referring patients and verifying their diagnoses. We thank Naoko Kono for help in the preparation of the manuscript.