The Role of Epistasis in the Etiology of Polycystic Ovary Syndrome among Indian Women: SNP-SNP and SNP-Environment Interactions


Correspondence author: Dr. B. Mohan Reddy, Professor, Molecular Anthropology Group, Biological Anthropology Unit, Indian Statistical Institute, Street No. 8, Habsiguda, Hyderabad 500007, Andhra Pradesh, India. Tel: +91(40)27171906; Fax: +91(40)27173602; E-mail:,


Polycystic Ovary Syndrome (PCOS) is the most common endocrinopathy in women of reproductive age. It is a heterogeneous androgen excess disorder determined by the interaction of multiple genetic and environmental factors. Our earlier analysis on a panel of six candidate genes (Androgen receptor CAG repeats, Follistatin, Luteinizing hormone β subunit, Calpain10, Insulin receptor substrate-1 and PPARγ) based on 250 PCOS cases and 299 controls revealed significant association patterns with PCOS among South-Indian women. We report here for the first time, the SNP-SNP and SNP-environment interactions of these genes in the same cohort. Both multivariate logistic regression as well as epistasis analysis (using Multifactor dimensionality reduction software) yielded significant results (P < 0.05). All CAPN10 SNPs show association (either risk-conferring or protective) in the obese group, highlighting the importance of this gene in the PCOS pathophysiology. LHP7(LHβ) and UCSNP44(CAPN10) emerged to be the prominent SNPs in the SNP-SNP interaction analysis. The best SNP-SNP interaction model was obtained between CAPN10 UCSNP-44 and PPARγ His447His, implying a significant metabolic component in the PCOS pathology. Replicating our findings in BMI-specific cohorts in different ethnic populations would be warranted in future to identify the physiological networks in PCOS.


Polycystic Ovary Syndrome (PCOS) is the most common endocrinopathy in women of reproductive age with typical features of anovulation, hyperandrogenism, hirsutism, cystic ovarian morphology, and infertility. It is a complex multigenic disease with epigenetic and environmental factors playing a significant role in the phenotypic expression of disease symptoms. Such patients are also at increased risk for obesity, insulin resistance, type 2 diabetes mellitus, and premature arteriosclerosis (Homburg, 2002; Ehrmann, 2005). A number of genes that are implicated in different pathways and affect steroidogenesis, insulin resistance, gonadotropin function, and obesity have been investigated. However, there was a paucity of knowledge on the status of association of the candidate genes with PCOS in the Indian populations among which the majority of the PCOS studies have been confined to the clinical dimensions. No comprehensive molecular genetic study has hitherto been attempted on Indian populations to understand the genetic etiology underlying this complex disorder. We focused on the candidate gene analysis of PCOS in the Indian context by analyzing a panel of six genes (Androgen receptor, Follistatin, Luteinizing hormone β subunit gene, Calpain10, Insulin receptor substrate-1, and PPARγ) involved in different pathological pathways ranging from steroid hormone effects, gonadotropin action and regulation, insulin action and secretion, as well as energy homeostasis. We initiated the analysis for testing the association pattern of each of the candidate genes individually with PCOS susceptibility (Dasgupta et al., 2010; Dasgupta et al., 2012a, b, c, d). While the Androgen receptor gene was analyzed for CAG repeat polymorphism (Dasgupta et al., 2010), all the other five genes were studied for the presence of SNPs and their respective association with PCOS (Dasgupta et al., 2012a, b, c, d). In Follistatin gene, we could not identify any mutation or polymorphism in either the cases or controls (Dasgupta et al., 2012c).

This study design has been followed by most genetic studies to date whereby a single-locus strategy is adopted for testing association of each variant with the disease phenotype. However, in the case of complex genetic disorders, it is usually observed that in spite of a small effect of an individual SNP, the genetic effects of combinations of functionally relevant SNPs may synergistically contribute to increased disease risk. Epistasis or gene-gene interaction is, therefore, likely to be a ubiquitous component of the genetic etiology of complex diseases such as PCOS. These epistasis effects could in turn regulate the independent effects of any one susceptibility gene. Therefore, it is imperative to study the pattern of interaction between genes and the cumulative role in conferring susceptibility toward PCOS. In view of this, to capture the epistatic phenomenon in the etiology of PCOS, we structured our analysis of genetic polymorphisms in three stages:

  • Stage I: Multivariate logistic regression analysis for all the SNPs in all the candidate genes.
  • Stage II: SNP-SNP interaction analysis using different combinations across the candidate gene panel.
  • Stage III: SNP-environment interaction analysis between each genetic polymorphic locus and environmental factors considered as categorical variables.

Each of the six candidate genes involved in different etiological pathways were analyzed individually in our PCOS cohort. Since the Androgen Receptor gene was analyzed from the perspective of an STR marker (CAG repeats), and in the Follistatin gene we could not identify any mutation or polymorphism in either the cases or controls, the remaining four genes, that is, Luteinizing hormone β subunit gene, Calpain10, Insulin receptor substrate-1, and PPARγ, were utilized for subsequent SNP-SNP and SNP-environment interaction analysis. A total of 16 SNPs from four candidate genes were included in the epistasis analysis (Table 1).

Table 1. Nomenclature used during analysis for the panel of 16 SNPs
   ChromosomeSNPMajor allele/
Nomenclaturers IDGenelocationlocationminor allele
LHP1rs1800447LHβ subunit19q13.32Exon 2T/C
LHP2rs34349826LHβ subunit19q13.32Exon2T/C
LHP3rs6521LHβ subunit19q13.32Exon 2C/G
LHP4rs1056914LHβ subunit19q13.32Exon 2A/C
LHP5rs2387588LHβ subunit19q13.32Intron 2T/C
LHP6rs4287687LHβ subunit19q13.32Intron 2CAG/C-G
LHP7rs1056917LHβ subunit19q13.32Exon 3T/C
UCSNP-44rs2975760Calpain-102q37.3Intron 3T/C
UCSNP-43rs3792267Calpain-102q37.3Intron 3G/A
UCSNP-56rs2975762Calpain-102q37.3Intron 4G/A
UCSNP-19rs3842570Calpain-102q37.3Intron 6Del/Ins
IRSP1rs1801278 (Gly972Arg)IRS-12q36Exon 1G/A
IRSP2rs2234931IRS-12q36Exon 1G/A
PPG1rs1801282 (Pro12Ala)PPAR-γ3p25Exon 2C/G
PPG2rs3856806 (His447His)PPAR-γ3p25Exon 6C/T

Materials and Methods

Ethics Statement

The study protocol was approved by the Indian Statistical Institute Review Committee for Protection of Research Risks to Humans.

Sampling Criteria: Case and Control Groups

A total of 549 women consisting of 250 PCOS cases (aged 14–40 years) and 299 controls (aged 14–47 years) were enrolled for this study from July 2008 to April 2009. Patients were recruited from the Gynecology clinic of the Osmania General Hospital, Hyderabad as well as from an infertility clinic (Anu's Test Tube Baby Centre, Hyderabad) as per the Rotterdam Criteria, 2003 (The Rotterdam ESHRE/ASRM-sponsored PCOS Consensus Workshop Group, 2004) according to which any two of the following three conditions need to be fulfilled for inclusion:

  1. Presence of clinical and/or biochemical signs of hyperandrogenism.
  2. Infrequent periods with intermenstrual interval of more than 35 days.
  3. Polycystic ovaries; an ovary with the ultrasound appearance of more than 10 subcapsular follicles (<10 mm in diameter) in the presence of prominent ovarian stroma was considered polycystic.

Patients with hyperprolactinemia, thyroid and adrenal diseases, 21-hydroxylase deficiency, and androgen-secreting tumors were excluded. The weight and height of the subjects were recorded. Hirsutism was defined as a Ferriman-Gallwey score >5. Normal controls with no history of treatment for fertility, no evidence of clinical hyperandrogenism (hirsutism/acne/alopecia), and with normal menstrual cycles every 25–32 days were recruited from the Family Planning Centre of the Osmania Hospital and from the general population, representing broadly similar age ranges and ethnic backgrounds.

Collection of Blood Samples, DNA Isolation, and SNP Analysis Using Direct DNA Sequencing

Intravenous blood samples (∼5 ml) were collected in K3-Sodium EDTA-coated vacutainers from both the patients and controls after obtaining their informed written consent. DNA was isolated from the above samples following the protocol of Sambrook et al. (1989). We carried out PCR amplification and direct sequencing to screen the exons of the Follistatin gene (FST), the Luteinizing hormone β subunit gene (LHβ), the Calpain10 gene (CAPN10) for a panel of five SNPs, the IRS-1 gene (Exon 1), and PPARγ (Exons 2 and 6), in order to validate the previously identified SNPs in those regions as well as to identify any novel variant that may be specific to the Indian population. The IRS-1 exon was amplified and screened using seven overlapping primers. The primer sequences along with the amplification conditions are listed in Table S1.

Statistical Analysis

Allele frequencies were determined by the gene-counting method. All the statistical analyses were performed with the help of SPSS statistical software (version 19.0, IBM SPSS, NY, USA). The power of the study was calculated using G*Power software (version 3.1; Faul et al., 2009). The Hardy-Weinberg equilibrium was estimated by the χ2 test using Pypop software. A χ2 test was carried out to test for differences between the case and control groups in terms of the anthropometric characteristics. The nucleotide positions mentioned for the LHβ individual SNPs is according to Takahashi et al. (2003). Multifactor dimensionality reduction (MDR) software (Moore et al., 2006) was used for carrying out SNP-SNP interaction analysis. Significance was considered both at the 5% and 10% level for multivariate and MDR analysis. Multivariate regression analysis as well as gene-environment interaction analysis was carried out using R software. For gene-environment interaction analysis, we considered five environmental factors:

  1. Age: The age of subject at the time of sample recruitment.
  2. BMI: Body mass index (weight in kilogram/[height in meter]2).
  3. Parental consanguinity status categorized into:
    1. Mother's brother's daughter (MBD).
    2. Father's sister's daughter (FSD).
    3. Uncle-niece (UN).
    4. Unrelated.
  4. Socioeconomic status: Broad classification considering the caste hierarchy, household income, occupation and lifestyle pattern; categorized into three variables as high, medium and low socioeconomic strata and
  5. Age at menarche: Age at which subject attained first menstruation.


PCOS Cohort Overview and Characterization

The composition and baseline characteristics of PCOS cases and controls are presented in Table 2. PCOS subjects had a significantly higher mean value for BMI. The age of menarche is, however, significantly lower in the PCOS group compared to the controls. The proportion of obese women (BMI ≥ 25) within the PCOS group was significantly higher than in the control group (55.1% vs. 15.1%, respectively, P < 0.001).

Table 2. (A) Age-wise composition of the PCOS cases and controls. (B) Comparison of baseline characteristics of PCOS cases and controls
 N N 
 N N  
Parental consanguinity     
Socioeconomic status     
Physical activity     
Age at menarche     
 Age <1314658.419565.20.852
 Age >137630.49832.8 

For majority of the PCOS cases, the data on clinical and biochemical parameters were obtained and the characteristics that are relevant to the PCOS phenotype are presented in Table 3. The total number of cases “N” denotes the number of cases for which the respective data could be obtained. Since over 90% of the PCOS cases were aged below 30 and only one woman was aged above 35, none of them were diagnosed to be type-2 diabetes mellitus (T2DM) cases, which is also coherent to the general observation that the manifestation of T2DM is mostly seen above 35–40 years of age. All the cases with ultrasound data had polycystic ovary morphology along with the clinical presentation of irregular menstrual cycles. Nearly 79% of the PCOS cases in this study were reported to be infertile due to lack of ovulation. Among the other clinical features of PCOS, hirsutism and acanthosis nigricans were significantly more frequent in the obese PCOS cases than in the lean PCOS cases (P < 0.001). A significantly greater proportion of obese cases had elevated cholesterol levels (>200 mg/dl) than did the lean PCOS cases (P = 0.002). Comparison of the biochemical parameters between the lean and obese PCOS cohort revealed that although the mean levels of LH and FSH are not significantly different between the lean and obese PCOS cases, a higher LH: FSH ratio (characteristic feature of PCOS) is evident among the obese group. Moreover, obese PCOS cases had a significantly higher mean level of cholesterol and triglycerides than the lean PCOS cases (Table 4). Unfortunately, the biochemical parameters could not be obtained for the control group, which would have enabled a comparison between the cases and controls.

Table 3. Clinical profile of the PCOS cases under study
 All patientsLean PCOS (BMI <25)Obese PCOS (BMI ≥25)**
Phenotypic featuresN%N%N%
  1. *Significantly different between lean PCOS and obese PCOS cases (P ≤ 0.002).

  2. **Given the lean body mass of Indian women (Asian Indian phenotype), BMI ≥25 is generally considered as cut-off distinguishing the lean women from overweight/obese women albeit the WHO criteria specifies BMI >30 as obese.

Menstrual irregularities210/21010056/56100104/104100
Acanthosis nigricans*110/22050.019/5435.268/10167.3
Elevated cholesterol (>200 mg/dl)*38/15025.31/2504.025/8031.2
Table 4. Hormonal profile of the lean and obese PCOS cases
 Lean PCOSObese PCOS  
 NMean ± SENMean ± SEP-valueNormal range
  1. SE, standard error; t-test degree of freedom (df) = 1.

  2. *Values represent follicular phase levels.

Testosterone (ng/ml)791.47 ± 0.54950.82 ±–0.9 ng/ml
LH (mIU/ml)*6415.9 ± 1.867615.2 ± 3.600.874.6–12.4 mIU/ml
FSH (mIU/ml)*708.4 ± 1.45835.9 ± 0.400.106.9–12.5 mIU/ml
RBS (mg/dl)1671.1 ± 4.42687.7 ± 9.70.21<70–170 mg/dl
Cholesterol (mg/dL)72120 ± 10.488145 ± 9.40.08<200 mg/dl
HDL (mg/dL)7228 ± 2.48831 ± 2.00.3740–60 mg/dl
LDL (mg/dL)7271 ± 6.38884 ± 5.90.13<100 mg/dl
Triglyceride (mg/dL)72106 ± 11.788152 ± 14.90.02<150 mg/dl

From our individual SNP association analysis for each of the genes (Dasgupta et al., 2012a, b, c, d; Tables S2 and S3), we observed that many of the SNPs follow Hardy-Weinberg equilibrium although a few of them show departures to varying degrees. At the genotype level, only two SNPs each from the LH β subunit gene and the CAPN10 gene were found to be significantly associated with the PCOS phenotype, namely LHP7(rs1056917) and UCSNP-44(rs2975760), respectively. With reference to alleles, two more SNPs (both from the PPARγ gene) emerged to have a significant pattern, although protective in nature, namely Pro12Ala (rs1801282) and His447His (rs3856806) along with LHP7 and UCSNP-44 as risk factors (Table 5). Our analysis on Androgen receptor CAG repeats revealed that in the obese PCOS women, this microsatellite variation may account for the hyperandrogenicity to a larger extent than in the lean PCOS women, prompting future investigations on specific PCOS cohorts characterized by certain BMI ranges (e.g., the obese group) to gauge its effect on the hyperandrogenic traits in particular and on the overall PCOS phenotype in general (Dasgupta et al., 2010). Post hoc power analysis revealed that our study had sufficiently high power [(1-β error probability) >0.90] even to detect a minimal effect size of 0.1 at the 5% significance level.

Table 5. Summary table for significant observation in genotype frequency and allele frequency analysis for individual genes
    Genotype/PCOSControlsOdds ratio 
GeneSNPrs IDLocationallele(N = 250)(N = 299)(95% C.I)P-value
Genotype frequency analysis
 Luteinizing hormone- βsubunitLHP7rs1056917Exon 3TT0.3280.422
    TC0.3690.3291.54 (1.01–2.37)0.047
    CC0.3030.2491.45 (0.97–2.16)0.069
 Calpain-10UCSNP-44rs2975760Intron 3TT0.6180.64
    TC0.2070.2710.79 (0.52–1.19)0.265
    CC0.1740.0892.02 (1.18–3.44)0.01
Allele frequency analysis
 Luteinizing hormone- βsubunitLHP7rs1056917Exon 3T0.5120.587
 Calpain-10UCSNP-44rs2975760Intron 3T0.7220.775

Stage I: Multivariate Logistic Regression Analysis

We performed a multivariate logistic regression taking into account all the 16 SNPs together as categorical indicators against a binary response variable (presence or absence of PCOS) and estimated the odds ratio in each case. The analysis was performed separately for genotypic and allelic data.

From the genotype regression analysis, we observe that only the UCSNP-63 heterozygote genotype (CT) emerges to show a significant odds ratio suggesting a protective role [OR = 0.28 (0.09–0.90), P = 0.032; Table S4]. At the 10% significance level, a few more genotypes indicate association with PCOS. While the LHP3 CG heterozygote, UCSNP-44 TC heterozygote, and IRSP7 GA heterozygote depict a protective nature, the LHP6 deletion homozygote and the UCSNP-19 insertion homozygote turn out to be risk-conferring for PCOS. In contrast to their individual analysis, where there was no evidence of association, the LHβ SNPs (LHP3 and LHP6) and CAPN10 SNP UCSNP19 exhibited a weak association pattern in this multivariate genotype regression model. Additionally, while UCSNP-44 showed a different pattern of association in the multivariate context, IRSP7 displayed consistent results as compared to the individual SNP-based analysis. The allele-based multivariate regression analysis yielded more SNPs with significant association patterns (Table S5). The analysis was repeated in subcategories according to ethnicity (Hindus and Muslims) as well as BMI (Lean and Obese). A summary grid displays the overall result of this regression analysis (Table 6). The SNP association pattern is somewhat different for the Hindu and Muslim samples except for two genetic loci (LH3 and LH7). By far the association pattern in Muslims reflects more closely the pattern observed in the pooled cohort. Similarly, the pattern of association is quite distinct for the lean and obese PCOS cases. In contrast to the lean group, the obese cohort exhibits a much more magnified contribution of different genetic loci toward PCOS in a multivariate logistic regression model. Overall, the results of multivariate regression analysis are far more conclusive compared to the individual gene analysis (where only few polymorphisms depicted an association pattern) reiterating the necessity to analyze multiple SNPs that are involved in complex phenotypes like PCOS, together in an interactive statistical model.

Table 6. Summary grid depicting the multivariate regression analysis (allele-wise) in different categories and significance levels (P—protective, R—risk-conferring)
Alleles(P < 0.10)(P < 0.05)(P < 0.10)(P < 0.05)(P < 0.10)(P < 0.05)(P < 0.10)(P < 0.05)(P < 0.10)(P < 0.05)
ALL_LHP5T R   P    
ALL_LHP6DelR R       
ALL_UCSNP19Ins R      R 
ALL_UCSNP43A         P
ALL_UCSNP44C  R     P 
ALL_PPG2T         P

Stage II: SNP-SNP Interaction Analysis

We have attempted to explore SNP-SNP interactions through the MDR method. The MDR method (Moore et al., 2006) is a constructive induction algorithm that proceeds as follows: the observed data are divided into 10 equal parts and a model is fit to each nine-tenths of the data (the training data), and the remaining one-tenth (the test data) is used to assess model fit, thus using 10-fold cross-validation (CV). Within each nine-tenth of the data, a set of N genetic factors is selected and their possible multifactor classes or cells are represented in n dimensional space. The ratio of the number of cases to the number of controls is estimated in each cell and the cell is labeled as either high-risk or low-risk. The procedure is repeated for each possible N-factor combination and the combination that maximizes the case–control ratio of the high-risk group is selected. The testing accuracy (which is equal to 1 – prediction error) of this best N-locus model can be estimated using the remaining test data portion of the data. The whole procedure is repeated for each of the nine-tenth one-tenth partitions of the data, and the final best N-locus model is the model that maximizes the testing accuracy or, equivalently, minimizes the prediction error. In the MDR analysis for SNP-SNP interactions, we have considered up to six loci combinations, that is, N = 6. Subsequently, we used the MDR permutation tool whereby the entire analysis was rerun for 1000 iterations, that is, on 1000 independent random samples, yielding significant SNP interactions in each locus combination. Table 7 lists all the significant interactions under each locus combination category obtained through the MDR program. The detailed MDR output with the accuracy values and CV consistencies is provided in Table S6. Except for the six loci category, all other categories yielded significant results. It was only in the two- and five-loci combination categories that the strong interaction patterns could be observed (P < 0.05). Out of all the 16 polymorphisms that we analyzed, LHP7 and UCSNP44 emerged to be the prominent SNPs with their presence in most of the significant interactions. The best SNP-SNP interaction model obtained across all the categories, however, was between UCSNP44 and PPARγ His447His.

Table 7. Significant SNP-SNP interactions
combinationscombinationsvaluesconsistencyvalueGenes involved
Two lociUCSNP44,PPG20.581100.003CAPN10, PPAR-γ
 LHP7,PPG10.57770.06LHβ, PPAR-γ
 LHP7,PPG20.587100.09LHβ, PPAR-γ
Three lociLHP3,UCSNP44,PPG20.61690.08LHβ, CAPN10, PPAR-γ
 LHP5,UCSNP56,PPG20.61580.08LHβ, CAPN10, PPAR-γ
Four lociLHP4,LHP7,UCSNP44,UCSNP560.674100.06LHβ, CAPN10
Five lociLHP4,LHP7,UCSNP44,UCSNP56,PPG10.727100.04LHβ, CAPN10, PPAR-γ
 LHP4,LHP7,UCSNP19,UCSNP44,PPG10.728100.08LHβ, CAPN10, PPAR-γ
 LHP3,UCSNP44,UCSNP56,IRSP2,PPG10.72980.09LHβ, CAPN10, IRS-1,PPAR-γ

Stage III: SNP-Environment Interaction Analysis

To understand the nature of association of these genetic polymorphisms under different environmental factors, we screened for possible SNP-environment interaction between all the 16 SNPs and five environmental factors: Age, BMI, parental consanguinity, socioeconomic status, and age at menarche. The environmental factors were transformed into categorical variables and then tested for interaction with each SNP individually (Table S7). Each interaction analysis for a particular environmental factor was adjusted for the remaining four factors.

Four SNPs exhibited significant interaction with different environmental factors, namely UCSNP-43, UCSNP-44, IRSP1 and PPG2 (Table 8). All these loci were significant at the 5% level, whereas one SNP (LHP1) showed significant interaction with parental consanguinity status at the 10% level. While LHP1 and UCSNP-43 showed interaction with parental consanguinity status, UCSNP-44 and IRSP1 genotypes were associated with the socioeconomic status of the PCOS cases and controls. Additionally, the PPG2 (PPARγHis447His) polymorphism was found to be associated with BMI. Analysis of each of these interactions in relation to the genotype distribution under each environmental factor yielded interesting results.

Table 8. P-values of gene-environment interaction


  1. Italicized text represents interactions with P < 0.10.


Both LHP1 and UCSNP-43 showed interaction with parental consanguinity; their genotype distribution pattern was distinct in each of the two consanguinity categories. In the PCOS cases with parental consanguinity the LHP1 wild-type homozygotes were more frequent than in the controls, the UCSNP-43 heterozygotes were more frequent among them, which is somewhat inconsistent with the theoretical expectation. For socioeconomic status, we divided the cohort into three categories—low, middle, and upper. Two out of 16 polymorphisms, UCSNP44 and IRSP1, depicted significant interaction with these categories. The frequency of UCSNP44 CC genotype is consistently higher among the PCOS cases than the controls across the three socioeconomic categories. However, this difference in frequency is highest among the upper-class women followed by the lower and middle socioeconomic groups, respectively. The last significant gene-environment interaction was observed between the PPG2 (PPARγHis447His) polymorphism and BMI. We categorized BMI into three groups—lean (<25 kg/m2), overweight (25–30 kg/m2), and obese (>30 kg/m2). Consistent with the conclusions of our earlier analysis that the PPG2 variant has a probable protective role against PCOS, the heterozygote genotype frequency was significantly higher among the controls than among the PCOS cases.


Although the candidate genes and the underlying biological pathways analyzed in this study have been previously implicated in the etiology of PCOS, their genetic interactions (detected through SNP-SNP interactions) as well as gene-environment interactions have not been described before. Therefore, to identify the complex biological relationships between the pathological pathways leading to PCOS, we attempted to understand the epistatic phenomenon involved in PCOS etiology through SNP-SNP and SNP-environment interaction analysis. In the individual gene analysis, three out of four genes indicated significant associative patterns with PCOS; while the LHβ subunit gene and the CAPN10 gene SNPs turned out to be risk-prone, the PPARγ SNPs were found to be protective in nature. The LHβ subunit gene and CAPN10 gene association elucidates the underlying biological pathways of neuroendocrine dysfunction as well as that of insulin secretion and action, which could be involved in PCOS susceptibility among South-Indian women. On the other hand, the PPARγ polymorphisms have been shown to be genetic modifying factors instead of susceptibility factors for PCOS in several prior studies, which is consistent in our study as well. PCOS women carrying the mutations at both the polymorphic loci of PPARγ show reduced frequency of hyperandrogenic and hyperinsulinemic traits, whereas they were more likely to be obese and show pronounced lipid abnormality features compared to the PCOS women carrying a mutation at either of the loci or at neither, representing the wild type. Our results seemed to concur with the earlier observations emphasizing a probable protective role of PPARγ polymorphisms against PCOS (Dasgupta et al., 2012d).

We obtained some significant results from the multivariate logistic regression as well as epistasis analysis including both SNP-SNP and SNP-environment interactions. In the multivariate logistic regression analysis, LHP4, LHP5, and UCSNP-19 alleles showed significant odds for risk toward PCOS, whereas LHP3 and UCSNP-56 turned out to be protective in nature with significant P-value, albeit after correction for multiple testing the significance was not retained. When the cohort was categorized according to religion, different pattern of association was observed due to contrasting socioeconomic status; in our sample, the Hindus represented a relatively higher socioeconomic status as compared to the Muslims. The association pattern was also found to be distinctly different for the lean and obese PCOS cases in contrast to the pattern observed in the individual gene analysis. Interestingly, all CAPN10 SNPs show association (either risk-conferring or protective) in the obese group in this multivariate model, highlighting the importance of this gene in the complex pathophysiology of PCOS and given that the insulin resistance features and associated metabolic abnormalities are pronounced in obese PCOS cases. Apart from CAPN10, LHβ and PPARγ also seem to play a significant role in PCOS pathology among the obese cases. It is well known that obesity influences the phenotypic expression of PCOS and plays a significant role in the pathophysiology of hyperandrogenism and chronic anovulation. Increased adiposity is associated with several abnormalities of sex steroid metabolism and results in increased androgen production (Pasquali, 2006) as also illustrated in Figure 1 (The Rotterdam ESHRE/ASRM-sponsored PCOS Consensus Workshop Group, 2004). Therefore, we could infer from our observations that in obese cases, particularly, the interplay of genes related to neuroendocrine and metabolic pathways plays a significant role in PCOS manifestation. The symptoms among the lean PCOS women are relatively milder than those among the obese cases, probably due to inadequate complex genetic interplay as reflected in the results of association.

Figure 1.

The schematic representation of the participation of multiple pathophysiological mechanisms in the manifestation of heterogeneity of PCOS (Yarak et al., 2005).

According to the SNP-SNP interaction analysis, LHP7 and UCSNP44 emerged to be the prominent SNPs with their presence in most of the significant interactions. The best SNP-SNP interaction model obtained across all the categories, however, was between UCSNP44 and PPARγ His447His. Both these SNPs belong to genes that are involved in insulin secretion and action (CAPN10) as well as adipocyte differentiation (PPARγ), which in turn implies a more significant metabolic component in the PCOS pathology. Apart from these two SNPs, the LHβ polymorphisms, LHP4 and LHP7, also exhibit significant epistasis with CAPN10 and PPARγ. Our SNP-SNP interaction analysis shows that the CAPN10, IRS-1, and PPARγ genes seem to be important candidates toward susceptibility of PCOS in the Indian context, particularly from the perspective of the metabolic anomalies involved in the PCOS manifestation.

For a complex disorder like PCOS, it is important to obtain an estimate of the genetic and environmental risk factors by accounting for their joint interactions. Such an analysis would help in dissecting disease mechanisms by using information on susceptibility (and resistance) genes to focus on the biological pathways that are most relevant to that disease, and the environmental factors that are most relevant to the pathways. Both UCSNP-44 and PPARγ His447His continued to exhibit significant results in the gene-environment interaction analysis as well by way of correlation of the respective genotypes with socioeconomic status and BMI. Consistent with our earlier observation in individual gene analysis where the UCSNP-44 CC genotype showed significant odds for PCOS phenotype, the women with the CC genotype from the upper socioeconomic strata show a distinct pattern of association with PCOS whereas the control women from this strata lack this genotype. This would underscore the highly significant role of UCSNP44 in PCOS susceptibility, particularly in the upper socioeconomic strata characterized by high calorific food intake and sedentary lifestyles. For the PPARγ His447His polymorphism, the heterozygote frequency was greater among the controls than the cases and this difference was observed both in the overweight and obese categories, although it was more pronounced in the latter group. Conversely, the wild-type genotype (CC) was significantly higher in the PCOS cases compared to controls in the obese group suggesting its role in conferring susceptibility to PCOS. Overall, our data support the fact that the CAPN10 and PPARγ genes, which are important candidates for metabolic anomalies in PCOS, are triggered by the shift toward urban lifestyles as reflected through the socioeconomic status and BMI.

At this point, it is important to note that our results for the multivariate and epistasis analysis need to be interpreted with caution given the relatively small sample size for the higher-order interactions and in all likelihood the statistical significance (as reflected through uncorrected P-values) would be lost after correction for multiple hypothesis testing. Nevertheless, the outcome of this exploratory study, which attempts for the first time to understand the nature of epistasis in PCOS etiology, would be pertinent in providing significant leads toward obtaining further biological insight into the high-dimensional etiology of PCOS with the help of much larger cohorts. Some of the recent GWAS studies have also reported and addressed the issue of low statistical power as well as lack of statistical significance for SNP-SNP interactions underlying complex traits and diseases like cholesterol levels and prostate cancer (Ma et al., 2012; Tao et al., 2012). Furthermore, we have earlier attempted to validate in our cohort of Indian women (which is by far the largest hitherto studied) the previously identified polymorphisms (studied in other populations) and have also sequenced the entire exons containing the SNPs in some genes including Follistatin, LHβ, PPARγ, and IRS-1 with the objective of identifying novel SNPs specific to our population. In contrast to our expectations, none of the exonic regions of the genes harbored any novel mutation. However, we could provide an understanding on the nature of association of genes with PCOS susceptibility in the Indian context. All the cases and the controls were recruited from the same geographic region (Hyderabad, Andhra Pradesh), which is considered to be linguistically and genetically homogenous (Reddy et al., 2005), hence ruling out the possibility of confounding association due to population stratification.

To the best of our knowledge, none of the earlier studies have attempted to assess the gene interactions incorporating major etiological pathways leading to PCOS. From our small pool of SNPs, we have shown SNP-SNP interactions suggesting biological cross-talk among genes/SNPs from neuroendocrine, insulin action, and adipocyte differentiation pathways. Therefore, applying this strategy to BMI-specific cohorts in different ethnic populations, with the aim of replicating our findings, would be warranted in future in order to identify the physiological networks in complex phenotypes such as PCOS. Functional analysis in a subset of samples harboring these interactive SNPs would also conclusively unfold the importance of these genetic variants in the etiology of PCOS. Given the genetic heterogeneity and complexity of this syndrome, it would be pertinent to opt for whole-genome sequencing, rather than studying candidate genes in isolation, which may help in identifying significant loci specific to a cohort and/or an ethnic group. The first genome-wide association study on PCOS was very recently conducted among the Han-Chinese population, which led to the identification of a more specific genomic region that may contain the candidate genes specific to the Han-Chinese population (Chen et al., 2011). A few more GWAS have been reported focusing on the obesity-related conditions in PCOS (Ewens et al., 2011; Hwang et al., 2012). A similar study utilizing whole-genome sequencing is needed among PCOS subjects representing the ethnic heterogeneity of India; this would ultimately enable us to decipher the molecular mechanisms underlying the syndrome in this specific region or ethnicity, and may assist in developing better prognostic markers for effective clinical management and treatment of this syndrome in India.


The authors thank the Director of the Indian Statistical Institute for logistic support and the Director of the Centre for Cellular and Molecular biology, Hyderabad, for providing access to the DNA-sequencing facility to run the plates processed at ISI. We thank Ms. P.V.S. Sirisha for her help in the laboratory analysis. We also thank our clinical collaborators, Dr. K. Neelaveni and Dr. K. Anuradha for recruiting the PCOS cases for the study.


Indian Statistical Institute.

Conflict of Interest

The authors declare no conflict of interest.