Fax: (626) 301-8980
A multigene test for the risk of sporadic breast carcinoma
Article first published online: 17 APR 2003
Copyright © 2003 American Cancer Society
Volume 97, Issue 9, pages 2160–2170, 1 May 2003
How to Cite
Comings, D. E., Gade-Andavolu, R., Cone, L. A., Muhleman, D. and MacMurray, J. P. (2003), A multigene test for the risk of sporadic breast carcinoma. Cancer, 97: 2160–2170. doi: 10.1002/cncr.11340
- Issue published online: 17 APR 2003
- Article first published online: 17 APR 2003
- Manuscript Accepted: 30 DEC 2003
- Manuscript Received: 14 NOV 2002
Although the identification of the BRCA1 and BRCA2 genes have been of great interest, these genes account for less than 5% of all breast carcinoma cases. The remaining cases are sporadic. Reanalysis of a large twin study suggested that genetic factors may play a significant role in sporadic breast and other carcinomas. Sporadic breast carcinoma is polygenically inherited. Multiple genes are likely to have an additive effect, each gene accounting for a fraction of the variance. One factor that may have an impact on the development of hormonally responsive breast tumors is the duration of exposure of the breast to estrogen. Therefore, one of the demographic risk factors for breast carcinoma is an early age of onset of menarche. The current study was based on the hypothesis that genes that play a role in demographic risk factors may be breast carcinoma risk genes in their own right. The authors hypothesized that six genes relevant to the timing of the onset of menarche and related risk factors might be candidate genes for breast carcinoma. These were the leptin gene (LEP), the leptin receptor gene (LEPR), the catechol-0-methyltransferase gene (COMT), the dopamine D2 receptor gene (DRD2), the estrogen 1 receptor gene (ESR1), and the androgen receptor gene (AR).
The authors examined 67 women with postmenopausal sporadic breast carcinoma and 145 gender and race-matched controls.
Five of these genes accounted for a significant percent of the variance (r2) of breast carcinoma. The following r2 and P values were calculated: LEP: 0.073, P ≤ 0.0001; LEPR: 0.064, P ≤ 0.0002; COMT: 0.073, P ≤ 0.0001; AR: 0.040, P ≤ 0.0035; and DRD2: 0.018, P ≤ 0.05. When evaluated in a multivariate regression analysis, they accounted collectively for 24% of the variance of breast carcinoma (P ≤ 0.0001). These genes accounted for 40% of the variance (P ≤ 0.00001) in a subset of age-matched cases. Individual gene scores were added to form a breast carcinoma risk score (BCRS) that ranged from 0 to 17. When the BCRS was evaluated in a receiver operator characteristic plot, the area under the curve was 0.80 for the full set and 0.869 for the age-matched set. The relative breast carcinoma risk for the different BCRS scores ranged from 0.10 to 11.9.
These results demonstrate a potentially powerful method of evaluating the additive effect of multiple breast carcinoma risk genes to form a potentially clinically useful assessment of women's risk for sporadic breast carcinoma. Cancer 2003;97:2160–70. © 2003 American Cancer Society.
Although the identification of the BRCA1 or BRCA2 genes has contributed greatly to understanding the cause of familiar forms of breast carcinoma, they account for less than 5% of the breast carcinoma risk among white women1, 2 and pose even less of a risk among African-American women.1 The remaining 95% of breast carcinoma cases are sporadic with only a modest or no family history of breast or ovarian carcinoma. A reanalysis3 of a large twin study4 has shown that genetic factors play a significant role in sporadic breast and other carcinomas. Sporadic breast carcinoma is likely to be a polygenic disorder involving multiple genes, each of which accounts for a fraction of the variance with considerable genetic heterogeneity. Although a number of genes have been shown in at least one study to be associated significantly with sporadic breast carcinoma, as is typical of polygenic disorders, there is considerable variability from study to study. The most powerful method of identifying the genetic disorders that are caused by the additive effect of multiple genes is to evaluate the additive effect of multiple candidate genes.5–7 It is possible to compensate for the small effect size of each gene by evaluating the additive variance of multiple candidate genes. This also compensates for genetic heterogeneity because a number of combinations of candidate genes can contribute to the total variance. When genes are scored such that the three genotypes are assigned a value of 0–2, depending on their relative strength as risk factors, a total risk score can be calculated by adding the scores for each gene. This risk score can be evaluated in a receiver operator characteristic (ROC) plot of specificity versus sensitivity to determine whether it has sufficient power to be useful clinically.8, 9
CHOICE OF CANDIDATE GENES
A number of demographic risk factors have been identified for sporadic breast carcinoma. The most important of these is the duration of exposure of the breast to estrogens. Estrogens increase the risk of breast carcinoma through various mechanisms and at various phases of life.10, 11 The overexpression of estrogen receptors or the overexposure of receptors to estrogen in normal breast epithelium augments the risk of breast carcinoma.11 Factors that increase or prolong the exposure to estrogens may be risk factors for breast carcinoma. These include age,12 early versus late age at menarche,13–16 nulliparity,17 late age of first full-term pregnancy,17, 18 breast-feeding,19 increased interval between menarche and birth of first child,20, 21 years of education (which delays childbirth),17 later age at menopause,22, 23 number of years since last birth,17 oral contraceptive use,17 and postmenopausal hormone therapy.24 The protection afforded by an early full-term pregnancy may be the result of the higher degree of differentiation of the mammary gland at the time in which an etiologic agent or agents act. Cell proliferation is important for cancer initiation, whereas differentiation is a powerful inhibitor.25
In general, the odds ratios (OR) for these risk factors range from 1.1 to 2.5. The factors most likely to have a direct effect on duration of estrogen exposure (i.e., age, age of onset of menarche, and years between the onset of menarche and the birth of the first child) were associated with the highest OR. The OR could be as high as 5.0 or greater.17, 21 These risk factors were also most likely to account for the breast carcinoma risk among Hispanic and African-American women.26–28 A young age at menarche is associated with elevated estradiol levels, which persist into early adulthood.29
We hypothesized that genes that regulate these risk factors might be risk factors in their own right. Because the early age of onset of menarche is one of the prominent risk factors for breast carcinoma, we chose six candidate genes that regulate the age of onset of menarche or other factors related to the duration of exposure to estrogen. These are the leptin gene (LEP), the leptin receptor gene (LEPR), the catechol-0-methyltransferase gene (COMT), the dopamine D2 receptor gene (DRD2), the estrogen receptor 1 gene (ESR1), and the androgen receptor gene (AR).
The LEP and LEPR Genes
Twin studies have indicated a strong genetic component to the age of menarche.30–32 Many animal studies have shown that leptin plays an important role in initiating puberty.33, 34 These observations suggest that leptin is the signal that informs the brain that energy stores are sufficient to support the high energy demands of reproduction and may be a major determinant of the timing of puberty.34 There is also much evidence for a major role for leptin in puberty in humans.35–37 Leptin increases with the initiation of puberty in both genders. In the later stages of puberty, it continues to increase in females but declines in males.38–40 This decline is secondary to the suppression of leptin by increasing androgen levels in males.35 Missense mutations of the LEP41 and LEPR42 genes are associated with hypogonadism and obesity. The effect of increased leptin levels on the initiation of puberty is secondary to the suppression of neuropeptide Y by leptin,43 releasing its inhibition of the pituitary-gonadotropin axis. An alternative possibility is that increased leptin levels are permissive of the development of puberty rather than the trigger for the timing of puberty.44
Several dinucleotide repeat polymorphisms in or near the human LEP gene have been identified.45 In 1996, Comings et al.46 reported an association between the D7S1875 polymorphism of the LEP gene and obesity in young females. The distribution of the alleles at the D7S11875 dinucleotide repeat showed two major peaks with the shorter alleles (S) ranging from 199 to 207 bp in length, and the longer alleles (L) ranging from 208 to 225 bp in length. Another study indicated that the different lengths of microsatellite polymorphisms play a role in gene regulation.47 These studies led us to evaluate the association of the human LEP gene with the age of onset of menarche in females. Our evaluation showed a significant three-way interaction among LEP genotypes, the age of menarche, and maternal age (age of the mothers when the probands were born).48 The S/S LEP genotype was associated with an early age of onset of menarche in women with a maternal age of 30 years or older, whereas the L/L LEP genotype was associated with a young age of menarche in women with a maternal age younger than 30 years old. These results suggested that the LEP gene might also be a breast carcinoma risk gene. If variants of the LEP gene were breast carcinoma risk factors, then it would be likely that the same would be true for variants of the LEPR gene.
A number of studies have suggested that instead of simply regulating a risk factor for breast carcinoma, leptin may have a direct effect on tumorigenesis. Leptin is present in normal breast tissue and in breast milk.49 Recently, it was added to a short list of mammary hormones.50 Both the short and long isoforms of the leptin receptors are present and functional in human breast carcinoma tissue and cell lines.51 The combination of leptin and a high glucose concentration stimulated cell proliferation in MCF-7 human breast carcinoma cells,52 whereas leptin alone stimulated the proliferation of the human breast carcinoma T47-D cell line where it produced a time-dependent activation of mitogen-activated kinases (MAPKinase 1 and 2).51 A specific MAPK inhibitor blocked cell proliferation. These observations suggest that variants of both the LEP and LEPR genes might be risk factors for breast carcinoma independent of the role of leptin in regulating the onset of menarche.
The COMT Gene
There is a significant association between the DRD2 gene and the early onset of sexual activity.59 In addition, the dopamine D2 receptor plays an important role in the regulation of prolactin, a hormone that has been implicated in breast carcinoma risk.60–62 Finally, dopamine is required for the action of leptin.63 Dopamine D2 receptors are present on the surface of breast carcinoma cells.64
The ESR1 Gene
The estrogen receptor is a ligand-mediated transcription factor. The ESR1 gene was chosen as a candidate gene because estrogen plays a central role in many of the hypotheses about the cause of breast carcinoma. Racial variations in the expression of ESR1 in normal breast tissue have been proposed to explain some of the racial differences in breast carcinoma risk.65
The AR Gene
The androgen receptor is also a ligand-mediated transcription factor. The AR gene is located on the X chromosome at Xq11-12.66 Two sets of polymorphic trinucleotide repeat sequences, CAG67 and GGC (GGN),68 which result in polyamino acid tracts in the protein, are present in the first exon of the AR gene. The shorter repeat alleles are associated with increased expression of the AR gene, whereas the longer repeats are associated with decreased expression.69–72 The S alleles of the AR gene are associated with an earlier age of onset of menarche.73 Rebbeck et al.74 reported that the AR gene helped to modify the effect of the BRCA1 gene on the risk for breast carcinoma.
MATERIALS AND METHODS
Breast Carcinoma Cases
To obtain a population-based rather than a referral-based sample of breast carcinoma, the breast carcinoma cases were ascertained from the private practice of oncologists in the Rancho Mirage, CA, area. The majority of the women with breast carcinoma were postmenopausal and did not have a strong family history of the disease. DNA was isolated using standard techniques at the Genetic Research Institute of the Desert (Rancho Mirage, CA) for polymerase chain reaction analysis on certain genes. Aliquots of DNA were transferred to the Department of Medical Genetics, City of Hope Medical Center (Duarte, CA), for more genotyping.
The controls were 145 gender, geographic area, and race-matched subjects from the Loma Linda University public health clinics.
The dinucleotide repeat, D7S1875,45 at the LEP locus was used. In 1996, an association was reported between this polymorphism of the LEP gene and obesity in young females.46 The distribution of the alleles showed a distinct separation between the S and L alleles. The SS, SL, and LL genotypes were tested. A tetranucleotide (CTTT) repeat polymorphism, comprising one to nine repeats, was used for the LEPR gene.75, 76 The alleles were divided into those with four or fewer repeats (S) and those with five or more repeats (L). The division was based on optimizing the similarity in the size of the two groups. The Taq I A1/A2 polymorphism77 was used for the DRD2 gene. The one allele is associated with a decreased expression of the dopamine D2 receptor.78 The G/A 474 Val 108 Met Nla III polymorphism was used for the COMT gene.79 The G allele shows a fourfold increase in expression compared with the A allele. As described above for the AR gene, the GGN trinucleotide repeat in exon 168 was used. The alleles were divided into 16 or fewer repeats (S) and 17 or more repeats (L). The SS, SL, and LL genotypes were evaluated. The Xba I polymorphism80 was used for the ESR1 gene. Studies of osteoporosis show an association with the 12 heterozygotes, i.e., positive heterosis.81
The ROC plots provide a pure index of the accuracy of a given test by demonstrating its ability to discriminate between alternative states of health or disease over the complete spectrum of operating conditions.8, 9 The ROC plot depicts the overlap between two distributions by plotting the sensitivity versus 1 − specificity for the complete range of decision thresholds. Sensitivity or the true-positive fraction, defined as the (number of true-positive test results)/(the number of true-positive + the number of false-positive test results), is plotted on the y-axis. This has also been referred to as positivity in the presence of a disease based on calculations for the affected group. The x-axis shows the false-positive fraction or 1 − specificity, defined as (the number of false-positive results)/(the number of true-negative + the number of false-positive results). This is an index of specificity and is calculated from the unaffected group.8 Therefore, sensitivity = true-positive results/total subjects with the disease and specificity = true-negative results/total subjects without the disease.82
Computer programs considerably enhance the ease of use of ROC curves.8 These programs allow the determination of the positive and negative likelihood ratios for the presence of disease for each of the sensitivity-specificity pairs. For the positive likelihood ratios, women with a lower risk receive a score of 1 and those with a higher risk receive progressively higher values with higher values of the test score. For the negative likelihood ratio, women with the highert risk receive a score of 1 and those with progressively lower values of the test score receive lower values. The product of the two, termed here the likelihood risk, is useful because women who a have neutral risk receive scores of approximately 1, those with a diminished risk receive scores less than 1, and those with a higher risk receive scores greater than 1. The program also calculates the area under the curve (AUC), a further measure of the effectiveness of the test.83
The chi-square test, analysis of variance (ANOVA), and regression analyses were performed using SPSS statistical software (SPSS, Chicago, IL). The logistic regression analyses were performed using SAS software (SAS Institute, Cary, NC). The ROC plots were performed using MedCalc (Mariakerke, Belgium).
The mean age of the breast carcinoma subjects was 69.0 years (standard deviation [SD] 12.54) and their age range was 30–96 years. The mean age of the controls was 43.0 years (SD 12.96) and their age range was 23–66 years. To evaluate whether the difference in age between the breast carcinoma cases and controls was a factor, we selected a subset from each group. Cases were divided to produce an equal number of cases in each group, to still have adequate power, and to utilize the older of the controls and the younger of the breast carcinoma cases. For this division, only the controls who were 51 years of age or older and only the breast carcinoma patients who were 75 years of age or younger were included. This produced a set of 45 controls with an average age of 58.3 years (SD 4.7) and 43 breast carcinoma patients with an average age of 63.4 years (SD 10.6). We termed this the age-adjusted subset.
The results for the chi-square analyses of each of the genes for breast carcinoma subjects and controls are shown in Table 1. For the LEP gene, there was a significant (P ≤ 0.00013) increase in the frequency of the LL genotype and a decrease in the frequency of SL heterozygotes in subjects with breast carcinoma compared with controls. For the LEPR gene, there was a significant (P ≤ 0.0005) increase in the frequency of the SS genotype, a decrease in the frequency of the SL genotype, and an increase in the frequency of the LL genotype, indicating negative heterosis.
|Group||No.||SS or 11||SL or 12||LL or 22||X2||P|
|Breast carcinoma||67||21 (31.3)||20 (29.9)||26 (38.8)|
|Controls||145||52 (35.9)||73 (50.3)||20 (13.8)||17.87||0.00013|
|Breast carcinoma||67||34 (50.7)||21 (31.3)||12 (17.9)|
|Controls||145||50 (34.2)||85 (58.2)||10 (6.9)||15.23||0.0005|
|Breast carcinoma||67||31 (46.3)||24 (35.8)||12 (17.9)|
|Controls||145||29 (20.0)||78 (53.8)||38 (26.2)||15.59||0.0004|
|Breast carcinoma||67||5 (7.5)||26 (38.8)||36 (53.7)|
|Controls||145||4 (2.8)||46 (31.7)||95 (65.5)||4.09||0.130|
|Breast carcinoma||67||41 (61.2)||22 (32.8)||4 (6.0)|
|Controls||145||61 (42.1)||59 (40.7)||25 (17.2)||8.47||0.014|
|Breast carcinoma||67||10 (14.9)||35 (52.2)||22 (32.8)|
|Controls||145||19 (13.1)||64 (44.1)||62 (42.8)||1.84||0.39|
For the COMT gene, there was a significant (P ≤ 0.0004) increase in the frequency of the more highly expressed 1 or G allele, which indicates an increase in the frequency of the 11 genotype and a decrease in the frequency of the 12 and 22 genotypes in breast carcinoma subjects.
There was a modest, but not significant, increase in the frequency of the 11 and 12 genotypes of the DRD2 gene in breast carcinoma subjects. For the AR gene, there was a significant (P ≤ 0.014) increase in the SS genotype and a decrease in the frequency of the remaining two genotypes of the GGN repeat polymorphism in breast carcinoma subjects. This genotype is associated with increased activity of the AR gene. By contrast, there was no significant association of the ESR1 gene with breast carcinoma.
ANOVA, Gene Coding, and Regression Analysis
For the following analyses, a CONTBRCA variable was made in which the controls were scored as 0 and the breast carcinoma cases were scored as 1. CONTBRCA was used as the dependent variable in ANOVA to determine the mean score for each genotype. Therefore, the higher the score, the more the genotype is associated with breast carcinoma. Based on these results, each genotype of each gene was scored as 0, 1, or 2 depending on its relative CONTBRCA score. Subjects with the lowest mean CONTBRCA score were scored 0, those with the highest mean score were scored 2, and the remaining genotype was scored 0 or 2 depending on whether the score was closer to the 0 or the 2 mean, and 1 if it was clearly intermediate. These were called the gene scores. By our convention, the 11 genotype is listed first, the 12 genotype second, and the 22 genotype third. If the highest mean CONTBRCA scores were associated with the 11 genotype, intermediate scores with the 12 genotype, and the lowest scores with the 22 genotype, the gene score would be 210.
To determine the percentage of the variance of breast carcinoma attributable to a given gene, regression univariate regression analysis was used with CONTBRCA as the dependent variable and the gene score as the independent variable. This produced r, the correlation coefficient, r2, the fraction of the variance attributable to that gene, and P, the significance level. Table 2 shows the ANOVA, gene scores, and r2 for each gene. The r2 and P value results are also shown for the age-adjusted subset (Table 2). Because of the smaller number of cases, this subset has considerably less power than the full set. Therefore, the critical result was the r2 value, rather than the P value, to determine if the fraction of the variance was decreased dramatically in this sample.
The ANOVA for the LEP gene was significant (P≤ 0.0001). The means were highest for the LL genotype and intermediate for the SS genotype, providing a gene score of 102. The LEP gene accounted for 7.3% of the variance for the full set (P ≤ 0.0001), which decreased slightly to 6.1% for the age-adjusted subset. The ANOVA for the LEPR gene was significant (P ≤ 0.0004). The 12 heterozygotes had the lowest mean CONTBRCA scores, whereas the mean scores for the 11 and 22 genotypes were similar, providing a gene score of 202. The LEPR gene accounted for 6.4% of the variance in the full set (P ≤ 0.0002), which increased to 10.3% in the age-adjusted subset. The ANOVA for the COMT gene was significant (P ≤ 0.0003), with the highest mean for the 11 genotype and the lowest mean for the 12 and 22 genotype, providing a gene score of 200. The COMT gene accounted for 7.3% of the variance, which increased to 18.8% in the age-adjusted subset (P ≤ 0.0001). As with the chi-square test, the ANOVA was not significant for the DRD2 gene (P ≤ 0.13). The association with the genotypes was one allele codominant, producing a gene score of 210. The DRD2 gene accounted for 1.8% of the variance and this was significant (P ≤ 0.05). The percentage of the variance increased to 2.8% in the age-adjusted subset. The ANOVA for the AR gene was significant (P ≤ 0.014). The association was S allele codominant, producing a gene score of 210. The AR gene accounted for 4.0% of the variance (P ≤ 0.0035), which increased to 9.9% in the age-adjusted subset. The ANOVA for the ESR1 gene was not significant (P ≤ 0.39). The mean CONTBRCA scores were highest for the 11 and 12 genotypes, producing a gene score of 220. The ESR1 gene accounted for only 0.9% of the variance (P ≤ 0.17), which decreased to 0.4% for the age-adjusted subset (P ≤ 0.55).
These results indicated that the difference in mean ages between the controls and breast carcinoma subjects in the full set did not explain the positive results. Except for the ESR1 gene, the percentage of the variance increased for each gene in the age-adjusted subset.
Multivariate Regression Analysis
Using the CONTBRCA variable as the dependent variable and each gene score as the independent variables, a multivariate regression analysis was performed using SPSS statistical software. Table 3 shows the results for both the full set and the age-adjusted subset. For the full set, all the genes except the ESR1 gene were included in the equation. The genes are sorted by T value. All were significant at P ≤ 0.014. The percentage of the variance ranged from 7.8% for the LEP gene to 2.3% for the DRD2 gene. The total explained variance was 24%. The adjusted value was 22%.
For the age-adjusted subset, the genes are sorted by T value. As for the full set, the ESR1 gene was excluded from the equation for the age-adjusted subset. Even though it was significant when evaluated individually, the AR gene was also excluded from the multivariate analysis. The variances for the remaining genes ranged from 17.0% for the COMT gene, 13.4% for the LEPR gene, 10.1% for the LEP gene, and 2.1% for the DRD2 gene. The total variance explained by all four genes was 40.1% and the adjusted r2 was 0.372.
The statistical program used for these studies was multivariate regression analysis (SPSS). However, because the dependent variable was dichotomous, we also performed multivariate logistic regression analysis (SAS). The results were essentially the same. The multivariate regression analysis (SPSS) program produced beta or r values, and thus r2 values, for each gene. When the multivariate logistic regression program (SAS) was used, it was necessary to progressively calculate each r2 value by sequential subtraction.
Breast Carcinoma Risk Score (BCRS) and ROC Plots
The BCRS for each individual consisted of the sum of the gene scores for each gene. For the full set, only the ESR1 gene was excluded. The BCRS varied from 0 (two cases) to 17 (two cases). The mean BRCS was 7.13 (SD 3.90) and the median BCRS was 6.0. It approximated a normal distribution with a skewness of 0.31 and a kurtosis of − 0.43. The BCRS for the total set was evaluated in a ROC plot (Fig. 1A). The AUC was 0.80. The likelihood risks ranged from 0.13 for those with a BCRS score of 1 to 11.9 for those with a score of 16. Because calculation of the likelihood risks for BCRS scores of 0 and 17 would involve multiplying or dividing by 0, these could not be calculated.
For the age-adjusted subset, the BCRS varied from 1 (seven cases) to 17 (two cases). The ESR1 gene was excluded. The mean BCRS was 7.80 (SD 4.1) and the median score was 8.0. Again, it approximated a normal distribution with a skewness of 0.14 and a kurtosis of − 0.68. Figure 1B shows the BCRS for the age-adjusted subset. The AUC was 0.869. The likelihood risk ranged from 0.10 for those with a BCRS of 1–4, to 11.5 for those with a BCRS of 10–12. It then decreased for scores of 13–17. This was likely to be due to the fact that these scores were fairly rare and subject to fluctuation.
This study incorporates a number of unique aspects and features for the investigation of the molecular genetics of breast carcinoma.
Emphasis on Sporadic Breast Carcinoma
Although the identification of single genes is an exciting development in breast carcinoma research, the majority of breast carcinoma cases are sporadic with a minimal or negative family history and are polygenically inherited rather than caused by single genes.
Emphasis on Genes for Known Breast Carcinoma Risk Factors
Because other studies of sporadic breast carcinoma have also evaluated genes for estrogen metabolism, our emphasis on the study of genes related to demographic risk factors is not unique. However, the emphasis on genes especially associated with the age of menarche and related variables is unique.
Evaluation of the Additive Effect of Multiple Genes
The major characteristic of polygenic disorders is that each gene contributes to a small percentage of the total variance. As a result, variation from study to study is the expected outcome.84 Because polygenic disorders are due to the additive effect of multiple genes, they are best studied by evaluating the additive effect of multiple genes.5 This is the approach we used. It provides added power85 and helps to diminish some of the variability among studies. The total explained variance for the fives genes was 24% for the full set and 40% for the age-adjusted set. Both were highly significant. It is likely that, in additional studies, one or more of the five genes may not contribute significantly to the risk of breast carcinoma. When evaluated individually, this would be considered a nonreplication. However, if all five genes are included, even if the total variance varies considerably among studies, it is likely that the total variance would be significant for all studies.
Test for r2 Values rather than for Significance
Although significance levels are used commonly in the studies of the genetics of complex disease, a far more important parameter is effect size, which can be measured by the calculation of the r2 score.86 For most of our own studies, r2 values ranged from 0.005 to 0.03.5–7 In the full set analyzed by multivariate regression analysis, r2 values of 0.078 and 0.056, respectively, for the LEP and LEPR genes were high. When only the age-adjusted set was evaluated, r2 values of 0.170–0.101 for these two genes and the COMT gene were even higher.
Production of the BCRS
The individual gene scores can be added to produce a composite summary gene score for the entire set. This has the advantage of providing a single variable whose magnitude is a measure of the number of risk alleles that each woman has inherited. In general, the higher the score, the higher the risk.
ROC Plot of the BCRS
The knowledge that, in general, the higher the BCRS, the greater a woman's risk for breast carcinoma is not that useful. Rather, the estimate of the relative likelihood risk for breast carcinoma for each value of the BCRS, the assessment of the sensitivity and specificity of each value, and the total area under the ROC curve provided more information. The results of these evaluations suggested that these five genes could be of considerable clinical usefulness in assessing a woman's risk for sporadic breast carcinoma. The likelihood risks varied from 0.13 to 11.9 for the full set and from 0.10 to 11.5 for the age-adjusted subset. The risk of developing breast carcinoma among women varied over a 90–115-fold range. If these results are replicated, these five genes will have a considerable impact on clinical diagnosis, prognosis, and treatment. The breast carcinoma risk assessment technique of Gail et al.87, 88 used age at menarche, age at first live birth, number of previous biopsies, and number of first-degree relatives with breast carcinoma. It did not include molecular genetic variables.
Increased Variance for Age-Adjusted Subset
Despite the smaller number of subjects in the age-adjusted subset, the r2 values were higher for the LEP, LEPR, and COMT genes. In addition, the total variance of 0.401 was considerably higher than the total varaince of 0.24 for the full set. We propose that this is a reflection of the elimination of the women with breast carcinoma who were older than 75 years of age. This would eliminate the women for whom nongenetic factors such as age were the major risk factors and increase the relative proportion of younger but still primarily postmenopausal women in whom the genetic factors we identified may be more important as risk factors.
This study illustrates how the genotypes of several breast carcinoma risk genes can be combined into a BCRS and how this score can be evaluated in a ROC plot to produce a potentially and clinically useful guide to determine a woman's risk for breast carcinoma. Related genes may also be important and can be included in future studies. The identification of a number of polymorphisms on a single chip is now possible. If these results are replicated in larger studies, the use of ROC curves could provide a simple and low cost test to identify a woman's risk for postmenopausal sporadic breast carcinoma. This would be of considerable benefit in allowing the most efficient use of screening and preventive procedures.
- 48Maternal age as a confounding variable in association studies. Am J Med Genet Neuropsychiatric Genet. 2001; 105: 564., .
- 54Molecular epidemiology of genetic polymorphisms in estrogen metabolizing enzymes in human breast cancer. J Natl Cancer Inst Monogr. 2000; 88: 125–134., .
- 82Selection and interpretation of diagnostic tests and procedures. Ann Intern Med. 1981; 94: 555–600., , , .
- 86Statistical power analysis for the behavioral sciences. Hillsdale, NJ: Erlbaum, 1988..