Identification of novel susceptibility genes in childhood-onset systemic lupus erythematosus using a uniquely designed candidate gene pathway platform

Authors


Abstract

Objective

Childhood-onset systemic lupus erythematosus (SLE) presents a unique subgroup of patients for genetic study. The present study was undertaken to identify susceptibility genes contributing to SLE, using a novel candidate gene pathway microarray platform to investigate gene expression in patients with childhood-onset SLE and both of their parents.

Methods

Utilizing bioinformatic tools, a platform of 9,412 single-nucleotide polymorphisms (SNPs) from 1,204 genes was designed and validated. Molecular inversion probes and high-throughput SNP technologies were used for assay development. Seven hundred fifty three subjects, corresponding to 251 full trios of childhood-onset SLE families, were genotyped and analyzed using transmission disequilibrium testing (TDT) and multitest corrections.

Results

Family-based TDT showed a significant association of SLE with a N673S polymorphism in the P-selectin gene (SELP) (P = 5.74 × 10−6) and a C203S polymorphism in the interleukin-1 receptor–associated kinase 1 gene (IRAK1) (P = 9.58 × 10−6). These 2 SNPs had a false discovery rate for multitest correction of <0.05, and therefore a >95% probability of being considered as proven. Furthermore, 7 additional SNPs showed q values of <0.5, suggesting association with SLE and providing a direction for followup studies. These additional genes notably included TNFRSF6 (Fas) and IRF5, supporting previous findings of their association with SLE pathogenesis.

Conclusion

SELP and IRAK1 were identified as novel SLE-associated genes with a high degree of significance, suggesting new directions in understanding the pathogenesis of SLE. The overall design and results of this study demonstrate that the candidate gene pathway microarray platform used provides a novel and powerful approach that is generally applicable in identifying genetic foundations of complex diseases.

Systemic lupus erythematosus (SLE) is a debilitating multisystem autoimmune disorder affecting ∼0.1% of the North American population (predominantly females). It is characterized by chronic inflammation in various organ systems such as the skin, joints, kidneys, lungs, and brain and the production of autoantibodies to multiple self antigens (1). Genome-wide linkage studies have been performed in small to medium-sized collections of families with 2 or more affected members, and several genetic intervals have been identified (2–7), some of them corroborated in 2 or more independent studies (8–11). Taken together, the findings of these studies suggest that multiple genes contribute to the pathogenesis of SLE, each providing quite modest genetic effects. Furthermore, these studies have shown that the genetics of SLE are not dominated by a single major genetic effect (such as the effect of HLA in type 1 diabetes mellitus or rheumatoid arthritis, both autoimmune diseases).

While the linkage analysis methods used to date have been quite successful in identifying rare variants with strong genetic effects, this approach has limited power to detect common variants with more modest effects. Although some rare alleles with strong genetic effects (such as C1q deficiency) can contribute to SLE genetics, it is probable that common alleles with modest genetic effects play a more important role in disease susceptibility. Thus, we hypothesized that many genetic alleles important to the SLE phenotype will not be identified through genome-wide linkage studies. As a case in point, it was recently shown that an allele of PTPN22 (the gene for protein tyrosine phosphatase N22, a lymphocytic phosphatase that is capable of decreasing T cell activation) is a risk factor for SLE, with an odds ratio of 4 (12). PTPN22 is encoded on chromosome 1 at 1p12, a region that was not identified in any of the SLE linkage studies.

Since association studies are more powerful than linkage studies when the predisposing variant is more frequent and when the genes have a moderate association with the disease (13, 14), a better strategy would be to perform a series of candidate gene single-nucleotide polymorphism (SNP) screens in a study population that is most conducive to expression of these susceptibility genes. However, it is becoming increasingly clear that association studies performed with a wide, random selection of candidate genes are unlikely to yield reproducible results; indeed, it has been estimated that the rate of false-positive results in such studies is near 95% (15). Accordingly, it has been suggested that a Bayesian methodology (wherein instead of selecting candidate genes at random, the investigators select the candidate genes based on prior available information) is one way to increase the reliability of association studies and to increase the likelihood of finding genes actually associated with a disease (15, 16).

In this report we describe a novel strategy using a combination of state-of-the-art hardware and analysis methods to investigate genetics of complex diseases, whereby the investigation is initiated with a bioinformatics-driven design of a custom-made chip that incorporates close to 10,000 SNPs derived from ∼1,000 selected genes. This chip was used to genotype families with childhood-onset SLE, and data were analyzed using rigorous statistical methods including multicomparison correction.

PATIENTS AND METHODS

The University of Southern California Institutional Review Board for research on human subjects approved this study. The study was also approved by the Human Subject Institutional Review Boards at each institution from which subjects were recruited, and informed consent was obtained from all subjects (parents provided consent on behalf of children who were under the legal age of consent).

Inclusion criteria and data collection.

For the purposes of this study we considered a subject to have childhood-onset SLE if the American College of Rheumatology (ACR) criteria for SLE (17, 18) were fulfilled and the diagnosis of SLE was made before the subject was 13 years old, by at least 1 pediatric rheumatologist participating in the study. Each SLE patient and his/her parents were interviewed, and a family history was obtained. We collected data describing a fixed family structure (proband's grandparents, parents, and siblings). In addition to self-declared ethnicity, information on the birthplace of the subject, the parents, and the grandparents was collected, for accurate ethnic characterization of families. For each case, information regarding sex, date of birth, date of first symptoms, and date of diagnosis was collected. For all cases, medical records documenting SLE diagnosis and disease progression, including all treatments and results of all serologic and chemical blood tests, biopsies, and radiologic studies, were reviewed by at least 1 pediatric rheumatologist. When possible, disease severity was evaluated, based on the number of organs involved and severity of involvement, using the Systemic Lupus International Collaborative Clinics/ACR Damage Index (19). All of this information was collected and imported into our database. Blood was collected and genomic DNA and plasma prepared and stored according to standard procedures.

Subjects.

The 753 subjects in the present study (representing 251 complete trio families) were a subsample of those in the University of Southern California Childhood-Onset SLE Genetics Study database, projected to reach 850 childhood-onset SLE cases by the end of 2008. In parallel, 536 adult-onset SLE patients and their families have been recruited from the same populations and geographic areas. Demographic and clinical information on the patients in the study sample is summarized in Table 1.

Table 1. Clinical and demographic features of the subjects with childhood-onset SLE and those with adult-onset SLE*
 Childhood-onset SLE (n = 251)Adult-onset SLE (n = 536)
  • *

    Values are the number (%). SLE = systemic lupus erythematosus; ACR = American College of Rheumatology; ANA = antinuclear antibody.

Male/female96 (38)/155 (62)48 (9)/488 (91)
Ethnicity  
 Caucasian89 (35)176 (33)
 Hispanic98 (39)241 (45)
 Asian35 (14)69 (13)
 African American19 (8)47 (9)
 Mixed10 (4)3 (0.56)
ACR SLE criteria met  
 Malar rash131 (52)273 (51)
 Discoid rash35 (14)80 (15)
 Photosensitivity182 (72.5)380 (71)
 Oral/nasal ulcers156 (62)337 (63)
 Joint inflammation191 (76)279 (52)
 Pleurisy or pericarditis133 (53)105 (20)
 Renal disorder150 (60)166 (31)
 Neurologic disorder73 (29)51 (9.5)
 Hematologic disorder205 (82)424 (79)
 Immunologic disorder216 (86)429 (80)
 Positive ANA250 (99.6)536 (100)

DNA preparation.

Blood samples were collected from all participants, and genomic DNA was extracted from peripheral blood mononuclear cells by standard procedures. Resultant DNA, resuspended in Tris–EDTA buffer, was quantified initially using an ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE). Before genotyping, DNA was requantified using PicoGreen reagent. Samples were normalized to a concentration of 150 ng/μl and interdigitated into 96-well plates.

Genotyping.

Molecular inversion probes were designed and produced at ParAllele Biosciences (Palo Alto, CA) and printed on Affymetrix GeneChip Tag arrays. Genotyping reactions were carried out according to the manufacturer's recommendations, using previously described protocols (20, 21). Molecular inversion assays were performed with the MegAllele genotyping kit (ParAllele Biosciences), with 96-well plates and samples from 24 subjects per plate for each of 4 allele channels. Genotypes were scored at ParAllele Biosciences, using Euclidian clustering analysis of the “contrast” measures derived from the normalized signal intensities. Relative intensities of 2 expected allele bases and 2 background bases indicate genotype and probe performance.

Statistical analysis.

The transmission disequilibrium test (TDT) (22) was used to evaluate transmission disequilibrium between the 2 alleles at each of the SNPs. The TDT statistics are calculated from the ratio of transmission of an allele to an affected child from a heterozygous parent, or TDT = (b − c)2/(b + c), where b is the number of transmissions of the first allele to affected children from a heterozygous parent and c is the number of transmissions of the second allele to affected children from a heterozygous parent. In order to calculate TDT statistics, we used a custom Perl program which calculated the TDT for each SNP. The calculation was done with concomitant reductions in memory usage over equivalent programs which read the entire data file; we tested the output of our custom TDT program at selected SNPs, and found it to be identical to the output generated with Spielman's TDT/S-TDT suite (http://genomics.med.upenn.edu/spielman/TDT.htm). The TDT statistics have a chi-square distribution with 1 df; we used R (http://www.R-project.org) to calculate P values from the TDT statistics. To correct for multiple hypothesis testing, we applied the q value correction, derived from the false discovery rate (FDR) (23), to the resultant P values, using the qvalues package for R (http://cran.r-project.org/src/contrib/Descriptions/qvalue.html).

False-positive report probability (FPRP) was estimated as described by Wacholder et al (24), i.e., FPRP = 1/{1 + [π/(1 − π)][(1 − β)/p]}, where π is the prior Bayesian probability of the alternative hypothesis being true, (1 − β) is the power of the TDT, calculated using the significance level α = 0.05 (corresponding to Xα = 1.6), and p is taken from the TDT P values. This is a slightly different representation (although the same formalism) from the original approach, whereby we were asking the question, “If we set as a significant probability alpha the P value from the TDT test, what corresponding limiting value of FPRP will be obtained?”

Transcription factor binding site analysis.

The search for transcription factor binding sites on allele variants with the SNPs located in the promoter regions was performed using the TRANSFAC database, accessed via Match interface (http://www.gene-regulation.com/cgi-bin/pub/programs/match/) (25).

RESULTS

Platform design, microchip production, and validation.

In the present study we adopted an essentially Bayesian approach, but rather than concentrating on specific candidate genes, we developed a collection of candidate pathways. To this end, we took advantage of the accumulated data from genome-wide scans of adult SLE families, candidate gene investigations, genetic information gained from studies of mouse models of lupus, and gene expression profiling data on human SLE. Based on our examination of the literature, we selected a list of candidate functional pathways judged to be relevant to the pathogenesis of SLE. Three databases (NCBI [http://www.ncbi.nlm.nih.gov/SNP/], GeneCards [http://www.genecards.org/], and Harvester [http://harvester.embl.de/]) were searched using a set of key words representing these functional pathways (for the list of the key words, see supplemental Table 1, available on the Arthritis & Rheumatism Web site at http://www.mrw.interscience.wiley.com/suppmat/0004-3591/suppmat/).

This initial inclusive search resulted in the selection of 6,384 genes. Subsequent analyses were conducted to identify the following: 1) genes that could be excluded based on their expression pattern or function in unrelated processes (e.g., believed to be involved only in embryogenesis or expressed in tissues deemed irrelevant), 2) genes initially included solely on the basis of their homology to a relevant gene, and 3) genes without any known or predicted function. Genes in the latter 2 groups were included for further analysis only if they resided within established linkage peaks. Finally, the number of times a gene was picked up using distinct key words was scored and utilized in prioritization (pathway score [see below]). Using the above criteria, a final list of 1,204 genes was selected (see supplemental Table 2, http://www.mrw.interscience.wiley.com/suppmat/0004-3591/suppmat/). The algorithm used in the study design is depicted in Figure 1.

Figure 1.

Schematic diagram of the gene pathway platform design. Rectangles represent computer-generated data; ovals represent investigators' input. A list of key words was selected to reflect the collection of candidate pathways developed. These key words were then submitted as search terms to NCBI, GeneCards, and Harvester by get_<database>_results. The results for each term were saved to a disk, and later parsed by parse_<database>_results to generate a table of genes (with symbol, aliases, accession number, and location for each gene) from the html (or xml) returned by the databases. These tables were then assembled by combine_results into a complete table. Identical genes that were obtained multiple times under different aliases were deleted. The program also tracked how many times each gene was picked up with a distinct key word (a measure dubbed by us as “pathway score”). The final gene table was then used to select the desired genes and subsequently the corresponding single-nucleotide polymorphisms (SNPs), averaging ∼10 SNPs per gene.

The choice of SNPs within the selected genes was based on available information from databases and accumulating information from the Human Haplotype Mapping Project (HapMap). Priority was given to SNPs demonstrating high heterozygosity, those that were informative in 2 or more relevant ethnicities, and those representing amino acid coding variants. The list of SNPs was then cross-checked against the accumulated SNP validation test results available through ParAllele Biosciences, an active participant in the International HapMap project. A final list of 9,412 SNPs was selected for genotyping assay development. To this end, ParAllele molecular inversion probe technology on the Affymetrix TAG3 platform was used (21). The molecular inversion probe assay relies on enzymatic specificity, rather than the hybridization specificity of other chip-based approaches. Enzymatic specificity is sensitive to single base changes, thereby reducing false-positive signal. In addition, the insensitivity of these inversion probes to intermolecular interactions allows the probes to be multiplexed so that all 9,412 SNPs could be genotyped in a single assay. The genotyping platform was validated on 18 control samples and 5 complete HapMap trios. Of the 9,412 SNPs, robust genotyping data were generated for 9,375 (99.6%). Several control samples were genotyped up to 8 times, resulting in 99.98% reproducibility of these genotypes.

Characteristics of the SLE patients.

The candidate pathway genotyping platform developed as described above was applied to a sample of 753 subjects, corresponding to 251 childhood-onset SLE trios (patients and both of their parents). Since kidney involvement is one of the most devastating complications of SLE, it is noteworthy that 60% of our patients with childhood-onset SLE had kidney disease, whereas kidney disease occurred in fewer than one-third of the adult-onset SLE patients (Table 1). Similarly, while ∼20% of the adults with SLE had cardiac or pulmonary involvement, this complication was present in >50% of our childhood-onset patients. Childhood SLE is more often a multisystem disease than is adult-onset SLE, as was further exemplified by the fact that 29% of the childhood-onset SLE patients manifested a neurologic disorder, while it appeared in <10% of the adults with SLE. Nevertheless, childhood- and adult-onset SLE exhibited similar sets of manifestations, albeit at different frequencies, and responded to similar therapies, supporting the notion that they are the same disease. Because sex hormones are less likely to play an important role in the onset of disease in children, a much higher frequency of males was found in our childhood-onset cohort compared with the adult-onset cohort (38% versus 9%), and the female:male ratio was reduced from 9:1 in the adult-onset group to ∼3:2 in the childhood-onset group (Table 1).

Genes identified by family-based TDT.

TDT (22) was used to calculate the significance of SNP association with SLE. A confounding effect due to population stratification was avoided by using the family-based TDT, in which the preferential transmission of the test allele from parents to affected offspring provides evidence of association of the test allele with disease.

The standards of statistical proof that are commonly used in biomedical literature have been questioned when applied to large SNP-based genetic association studies. The problem of multiple testing pervades the discipline, without a clear consensus on how it should be solved (26). The classic Bonferroni correction is both too strict and inappropriate in the case of genetic studies because it assumes that each test is independent, whereas in actuality a complex and unknown mutual dependence is present among genes, and even more prominently among SNPs of the same gene. The FDR approach (27) is currently widely used in genetic microarray and association studies. We adapted a variation of the FDR (23) for the multitest correction in our study.

We decided on 2 levels of FDR as representing significant outcomes in this study: SNPs with q values of <0.05 would be considered as “proven” with >95% probability, and those with q values of <0.5 as “noteworthy” and requiring followup studies for verification. Table 2 shows that 2 genes, SELP (gene for P-selectin) and IRAK1 (gene for interleukin-1 receptor–associated kinase 1 [IRAK-1]) fell into the first category. Indeed, the most significant associations found in the present study were with a polymorphism at amino acid position N673S in SELP (χ2 = 20.571, P = 5.74 × 10−6) and with a polymorphism at amino acid position C203S in IRAK1 (χ2 = 19.593, P = 9.58 × 10−6). The N673S polymorphism in P-selectin is located in the eighth Sushi domain of the protein. Sushi domains (complement control protein modules) are characteristic of a variety of complement and adhesion molecules, and form domain interactions with other proteins (28). Thus, the polymorphism in this domain is likely to affect important protein–protein interactions responsible for SELP-associated signal transduction processes.

Table 2. Genes shown to be associated with systemic lupus erythematosus by TDT and q values analysis*
SNP IDGene nameGene locationSNP locationAllelesAmino acid changeAmino acid positionPTDT, χ2qAccession no.
  • *

    TDT = transmission disequilibrium test; SNP = single-nucleotide polymorphism; 3′-UTR = 3′-untranslated region.

  • A splicing isoform (accession no. NP_839942) has the substitution at position 96.

rs3917815SELP1q24.2Coding exonA/GN/S6735.74 × 10−620.5710.025NP_002996
rs10127175IRAK1Xq28Coding exonA/TS/C2039.58 × 10−619.5930.028NP_001020413
rs1805749KLRG112p13.31Coding exonA/GW/R588.77 × 10−515.3850.153NP_005801
rs2274065NCF21q25.33′-UTRA/C7.28 × 10−515.7360.153NP_000424
rs1234314TNFSF41q25.1PromoterC/G1.14 × 10−414.8850.166NP_003317
rs4728142IRF57q32.1PromoterA/G1.92 × 10−413.9090.239NP_002191
rs6072794PTPRT20q12IntronC/T2.65 × 10−413.3020.257NP_008981
rs10406301KIR2DS419q13.42Coding exonC/GS/C1032.56 × 10−413.3700.257NP_036446
rs4406737TNFRSF610q23.31IntronA/G3.78 × 10−412.6360.330NP_000034

Seven additional SNPs fell into the second category. Among this group of SNPs, it is noteworthy that 2 additional SNPs were found to cause amino acid changes in their respective proteins: the W58R polymorphism in KLRG1 (killer cell lectin–like receptor subfamily G, member 1 [KLRG-1] gene) and the S103C polymorphism in KIR2DS4 (killer cell Ig-like receptor, 2 domains, short cytoplasmic tail 4 gene). Moreover, the C to G polymorphism in the promoter of TNFSF4 (gene for tumor necrosis factor superfamily 4, encoding the OX40 ligand) (Table 2) is predicted to alter the binding site for the c-Myc/Max transcription factor, as indicated in the TRANSFAC database (25). The Bayesian design of the microarray and rigorous multitest correction analysis in the present study assured that with relatively modest numbers of samples, the design of the study resulted in high-confidence findings.

FPRP.

Although there are a variety of study and analysis designs currently used in linkage and association studies, only a few involve rigorous statistical methods (15, 16). It is also quite common that replication studies or even reanalysis of the published data do not confirm the original conclusions. We therefore thought it important to analyze our present results using a different statistical approach.

Because of the Bayesian methodology applied in this study at the outset (during the gene selection for the chip design), we used FPRP, a recently described method of Bayesian data analysis (24), for comparison with the FDR q values analysis. We established 4 categories for ranking the SNPs and estimating Bayesian prior probability values.

In the first category (pathway score), the ranking was done according to the number of times the gene was picked up by the gene searching programs from the databases (see Figure 1). This ranking ranged from 0 to 9, with 9 being the maximum-scoring SNP; SNPs scored 0 were included because of chromosome locations (see supplemental Table 1). For example, rs3917815 (SELP) was picked up in 4 different key word searches, giving it a pathway score of 4 and a normalized pathway score of 0.44 (Table 3). In the second category (gene location score), the ranking was done on the basis of established linkage with SLE (2–11). This ranking ranged from 0 to 5. High scores were assigned to genes based on the distance from the center of a linkage peak confirmed in at least 2 studies (e.g., SELP); genes that were further away received lower scores, and genes that were outside established linkage peaks received a score of 0. However, specific genes that were outside linkage peaks but confirmed to be involved in the genetic predisposition to SLE (e.g., PTPN22) received high scores. Next, each of the 1,024 genes was ranked by the investigators, using the description and Gene Ontology function, with respect to its likelihood of being associated with SLE based on available evidence in the literature (gene function score, range 0–10). Last, each SNP was ranked (SNP rank, range 0–5) according to its correspondence to a functionally identifiable region, (e.g., coding exon, promoter).

Table 3. Genes shown to be associated with SLE by TDT and FPRP analysis*
SNP IDGene namePathway scoreGene location scoreGene function scoreSNP rankTotal scoreBayesian prior probabilityFPRP
  • *

    Bayesian prior probability was assigned taking into account the total score and the number of single-nucleotide polymorphisms (SNPs) in the group, thus effectively adjusting for multitest comparisons. To establish Bayesian prior probability, all SNPs were ranked in 4 categories. The “Pathway score” column shows the ranking of each SNP based on the number of times the respective gene was picked up with the original gene searching program. The “Gene location score” column depicts the value given to each SNP based on its closeness to an established linkage peak or being within an established systemic lupus erythematosus (SLE)–associated gene even if outside a linkage peak. The “Gene function score” represents the value of each SNP ranked based on the likelihood of the respective gene to have a function deemed important to the pathogenesis of SLE. The “SNP rank” column depicts the value given to each SNP depending on its location within a gene, following an order of priority (coding region, promoter, 3′-untranslated, intron). Because we could not assign relative importance to the ranking categories a priori, scores were normalized to range from 0 to 1 and were then added to provide a total score. SNPs were ranked from 1 to 9,412 based on their total score, and divided into 4 groups for assignment of Bayesian prior probabilities: top 1% (94 SNPs), top 5%, top 25%, and the rest. For the top 1% SNPs we assigned the prior probability of 0.02, for the top 5% the prior probability of 0.005, for the remaining top 25% the prior probability of 0.001, and for the rest the prior probability of 0.0003. The column “Bayesian prior probability” depicts these prior probability rankings. False-positive report probability (FPRP) was calculated as described in Patients and Methods. TDT = transmission disequilibrium test.

rs3917815SELP0.4410.312.740.020.0005
rs10127175IRAK10.1100.111.210.0010.0153
rs1805749KLRG10.330.60.112.030.0050.029
rs1234314TNFSF40.56110.63.160.020.0401
rs4728142IRF50.110.60.60.61.910.0050.0906
rs9267522BAT20.1110.50.62.210.0050.2119
rs2274065NCF20.1110.301.410.0010.2444
rs1270942BF0.3310.40.21.930.0050.3495
rs6072794PTPRT0.220.20.801.220.0010.4519
rs2234978TNFRSF60.440.410.62.440.020.4557
rs2476601PTPN2200.4112.40.020.4661
rs10406301KIR2DS400.20.111.30.0010.4879

Because no distinct relative importance could be assigned a priori to the 4 categories, each score was normalized to a range of 0–1; all scores were then summed to yield the total score (Table 3). Thus, theoretically, the maximum possible value for the total score is 4; in practice, the highest scoring SNP had a total score of 3.16 (TNFSF4 [rs1234314]). Next, all of the SNPs were ranked from 1 to 9,412 based on their total score and divided into 4 groups for assignment of Bayesian prior probabilities: top 1% (94 SNPs), top 5%, top 25%, and the rest, following a published algorithm (24). Wacholder et al (24) have suggested that when considering one or a few candidate polymorphisms, 0.1 should be viewed as the highest value of prior probability that any given polymorphism is true, and 0.01 as a modest value. In order to take into account multiple testing, we adopted a more conservative approach, in which prior estimates corresponded to a likelihood that ∼2 SNPs from each group would be described by alternative hypotheses. Thus, for the top 94 SNPs (top 1%) we assigned the prior probability of 0.02, for the top 5% the prior probability of 0.005, for the remaining top 25% the prior probability of 0.001, and for the rest the prior probability of 0.0003.

The powers for the TDTs were calculated according to method 4 described by Iles (29), using the frequency of the genotypes among affected children versus the expected frequency given Mendelian segregation to estimate genotype relative risk when possible, and conservatively assuming a genotype relative risk of 2:1:1 when it was not possible to estimate. Allele frequencies were estimated using the observed frequency of the major and minor SNP allele in the parents (see supplemental Table 3, http://www.mrw.interscience.wiley.com/suppmat/0004-3591/suppmat/).

Wacholder et al (24) recommended designating genes with an FPRP of <0.5 as “noteworthy,” and such genes are listed in Table 3. Of note, every gene that was selected using a different analysis method (q values) (Table 2) is also included in Table 3, demonstrating the robustness of our findings. The 2 top genes (SELP and IRAK1) were the same in both analyses and had an FPRP or a q value of <0.05 in both cases. The FPRP procedure indicated as “noteworthy” 3 additional genes not identified in the q values analysis: BAT2 (gene for HLA–B–associated transcript 2), BF (gene for B-factor, properdin), and PTPN22; however, because we decided a priori to use q values to determine noteworthy genes, we do not argue for their significance here. It is also of interest that the average scores for SNP rank, gene location, and gene function for the genes shown in Table 3 were relatively high and similar (0.63, 0.62, and 0.52, respectively), emphasizing that each of these factors contributed significantly to the gene selection process, whereas the average pathway score (0.23) was considerably lower, reflecting the more general automated approach in the original gene selection.

It has been suggested that Bayesian analysis can be viewed in terms of the data from a study moving the field from the initial amount of information (Bayesian prior probabilities) to an increase in knowledge as reflected in the posterior probabilities (30), which in the present study were paralleled by FPRP values. Thus, in the case of IRAK1, for example, the prior probability that the null hypothesis (noninvolvement of IRAK1 in SLE) is true was 99.9%. This prior judgment was modified to the posterior probability of IRAK1 involvement in SLE being at least 98.47% (comparison of Bayesian prior probability and FPRP) (Table 3). Finally, regarding the problem of multiple hypothesis testing in association studies, Colhoun et al suggested that in the presence of prior evidence of association, P values of 5 × 10−4 or smaller can be considered significant (15). Applying this simple criterion to our TDT results yielded exactly the same 9 genes listed in Table 2.

The results for other SNPs located within the genes shown in Table 2 are presented in supplemental Table 4 (http://www.mrw.interscience.wiley.com/suppmat/0004-3591/suppmat/). Since this study was not designed for fine mapping, meaning that in most cases relatively few SNPs were chosen per gene, no patterns can be discerned from these results.

DISCUSSION

Childhood-onset SLE presents a unique subgroup of patients for genetic study, because earlier disease onset, a more severe disease course, a greater frequency of family history of SLE, and a lesser effect of sex hormones in disease development (31, 32) may imply an increased likelihood of expressing the genetic etiology. Most previous genetic studies were performed in patients with adult-onset disease. To our knowledge, this is the first study to use childhood-onset SLE cases and their parents.

We present herein a novel strategy using a combination of state-of-the-art hardware and analysis methods to investigate the genetics of a complex disease. The investigation is initiated by a bioinformatics-driven design of a custom-made chip that incorporates close to 10,000 SNPs derived from ∼1,000 selected genes. A variety of statistical data analysis methods have been used in studies reported in the current literature, with an all-too-common inability to replicate results of a different study or even a similar study using a different analysis method. In the present investigation, we used 2 fundamentally different methods for data analysis and obtained similar results. Overall, the study identified 2 new genes that were highly significantly associated with SLE, as well as 7 additional genes as candidates for followup investigation. The design of the microarray and rigorous multitest correction analysis assured that with a relatively modest number of samples, the study would yield high-confidence findings.

The most significant associations found in the present study were with polymorphisms at Asn/Ser amino acid 673 in SELP and Cys/Ser amino acid 203 in IRAK1. Seven additional SNPs demonstrated association, although not to a great enough level that they can be considered as proven. These SNPs and the respective genes in which they are found are prime candidates for further confirmation studies.

Although genetic association between SELP or IRAK1 and SLE has not been reported previously, both are attractive candidates. Indeed, P-selectin, a transmembrane protein expressed on activated platelets and endothelial cells, is an adhesion receptor for neutrophils, monocytes, and T lymphocytes (33). The interaction between P-selectin on endothelial cells and its ligands on T lymphocytes is responsible for the migration of these cells into inflamed tissue (33). Levels of platelet–leukocyte complexes as well as soluble P-selectin have been found to be significantly elevated in SLE patients (34). Since kidney involvement is one of the most devastating complications of SLE, it is notable that expression of both glomerular and interstitial P-selectin was up-regulated in various forms of proliferative glomerulonephritis including lupus nephritis (35). A recent study by He et al (36) showed that P-selectin–deficient MRL/lpr mice had accelerated development of glomerulonephritis and early mortality, and expression of monocyte chemotactic protein 1 (MCP-1) was increased in the kidneys and in supernatants of lipopolysaccharide-stimulated renal endothelial cells from these mice. These observations raise the possibility that expression of P-selectin is important for modulating the progression of glomerulonephritis, perhaps by down-regulating endothelial MCP-1 expression.

IRAK-1 is a serine/threonine protein kinase involved in the signaling cascade of the Toll/interleukin-1 receptor (TIR) family (37). The TIR family comprises the interleukin-1 (IL-1) receptor subfamily, recognizing the endogenous proinflammatory cytokines IL-1 and IL-18, and the members of the Toll-like receptor subfamily, recognizing pathogen-associated molecular patterns. A hallmark of the TIR family is the cytoplasmic TIR domain, which serves as a scaffold for a series of protein–protein interactions that result in the activation of a unique and exclusive signaling module consisting of myeloid differentiation factor 88, IRAK family members, and Toll-interacting protein. Subsequently, several central signaling pathways of the innate and adaptive immune system are activated in parallel, the activation of NF-κB being the most prominent event of the inflammatory response (37). IRAK1 is considered to serve as the “on-switch” of the signaling complex by linking the receptor complex to the central adapter/activator protein tumor necrosis factor receptor–associated factor 6, and also as the “off-switch” of the complex by its autoinduced removal from the complex (38).

The C203S (Cys→Ser) polymorphism in the IRAK1 gene is not a part of any currently known functional domain of the protein. However, the rather dramatic changes in the physicochemical properties of the amino acid substitution may suggest an associated functional change. The extensive involvement of IRAK1 in regulation of the immune response makes its association with SLE potentially important and a prime candidate for followup genetic and functional studies.

The W58R polymorphism in KLRG1 and the S103C polymorphism in KIR2DS4 suggest involvement of natural killer (NK) cells in the genetic predisposition to SLE. Both KLRG-1 and KIR2DS4 are expressed on NK cells and subsets of activated T lymphocytes. KIR2DS4 is an activating NK receptor molecule that enhances lysis by NK cells expressing KIR2DS4 (39), while KLRG-1–expressing NK cells show decreased proliferative activity (40). SLE patients, including those with childhood-onset disease, exhibit quantitative and qualitative alterations in NK cells (41, 42). The genetic association of SLE with KLRG1 and KIR2DS4 in the present study, together with previous findings that first-degree relatives of SLE patients (43) and healthy monozygotic cotwins of SLE patients (44) show reduced numbers and activity of NK cells, suggests that this phenotype might be involved in disease causation rather than being a consequence of the disease process.

Neutrophilic cytosolic factor 2 (NCF-2) is an essential component of the NADPH oxidase enzyme complex in phagocytic leukocytes. Its importance in host innate immunity is demonstrated by the finding of recurrent infections in individuals with chronic granulomatous disease resulting from genetic defects in components of the NADPH complex, including NCF2 (the gene for NCF-2) (45). However, phagocyte-generated reactive oxidants can also contribute to host injury associated with inflammation. Furthermore, the association between SLE and NCF2 gene suggested from the present results may be related to the overexpression pattern of various neutrophil genes observed in gene expression profiles of patients with childhood-onset SLE (46).

Although several members of the tumor necrosis factor and tumor necrosis factor receptor families have been implicated in the pathogenesis of SLE (including the TNFRSF6 gene suggested in the present study), the data presented herein provide the first direct evidence of genetic association between SLE and TNFSF4, encoding the OX40 ligand. Interaction between OX40 and its ligand is involved in costimulation of T and B lymphocyte activation and in T cell adhesion to endothelium. Immunohistologic study of renal biopsy specimens from patients with lupus nephritis demonstrated an abundant presence of OX40 ligand in all cases of proliferative lupus nephritis, in a unique granular distribution and colocalized with subepithelial immune deposits (47).

It is also noteworthy that 3 of the 5 highest-scoring genes in the present study are closely colocalized (1q24.2–1q25.3, within a stretch of 14 Mb) (Table 2), suggesting a strong association of this region with SLE and making it a prime candidate for followup fine mapping studies. The linkage of SLE with this chromosomal region has been reported previously (4, 9, 45).

Our results also corroborate the previously reported association between SLE and IRF5 (gene for interferon regulatory factor 5) (48, 49) and emphasize the importance of the interferon-α pathway in SLE (50). Finally, the association of SLE with PTPRT (gene for protein tyrosine phosphatase receptor type T) is a novel addition to the known connection between SLE and PTPN22 (12) and underscores the importance of lymphocyte tyrosine phosphatase regulation.

The present results demonstrate the powerful potential of this novel combination of up-to-date biotechnology and bioinformatics methods in the search for genetic origins of common complex diseases. Furthermore, the discovery of new SLE-associated genes opens promising new directions for understanding the genetic foundations of and ultimately treating this relatively common and devastating disease.

AUTHOR CONTRIBUTIONS

Dr. Jacob had full access to all of the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Study design. Jacob, Reiff, Armstrong, Zidovetzki.

Acquisition of data. Jacob, Reiff, Myones, Silverman, Klein-Gitelman, McCurdy, Wagner-Weiner, Nocton.

Analysis and interpretation of data. Jacob, Armstrong, Solomon, Zidovetzki.

Manuscript preparation. Jacob, Reiff, Armstrong, Zidovetzki.

Statistical analysis. Armstrong, Zidovetzki.

Acknowledgements

The cooperation of the patients and families involved in this study is gratefully acknowledged. We thank L. Li, Y. X. Wu, and N. Jacob for technical assistance and genomic DNA preparation, V. Ciobanu for database support, V. Carlton for genotyping, D. Conti, D. Thomas, C. D. Langefeld, and X. Cui for useful discussions, and D. Thomas and D. Conti for critical reading and comments on the manuscript.

Ancillary