• Spermatogenesis;
  • experimental design;
  • testicular composition


  1. Top of page
  2. Abstract
  3. Materials and Methods
  4. Results
  5. Discussion
  6. References

ABSTRACT: Sperm production is an important variable affecting the reproductive capacity of men and other male mammals. Because spermatogenesis is highly susceptible to disruption, volume density techniques that enable the composition of testicular tissue to be characterized or sperm production rates to be quantified are used extensively to assess potential impacts of known or suspected reproductive toxins, the safety of proposed human or animal drugs, and basic studies on spermatogenesis in normal individuals. The number of subjects used per treatment group for such studies has been variable. However, the power and sensitivity of any experiment is dependent on the inherent variability associated with the end point(s) of interest and the number of replicates (ie, animals or men per treatment group) employed per treatment group. Because the reliability of one's experimental outcome should be of utmost consideration, it is important to characterize the typical levels of inherent variability associated with one's chosen end point(s) and to answer the question: how many subjects are required per treatment group to provide an experiment with a given power and sensitivity for detecting actual treatment effects? This study was undertaken to 1) characterize the inherent variability associated with the volume density of several testicular components in rodents, rabbits, and humans and among cell numbers derived from volume density data and 2) identify the approximate number of replicates that would be required to provide future studies of predictable power and sensitivity for which data were to be generated via the volume density approach. Replication requirements differed, sometimes by several orders of magnitude, among these species and among end points within a single species. In addition, for many of these species and end points, the number of replicates necessary to ensure modest power and sensitivity for detecting treatment differences exceeded that used in most investigations. These findings are discussed in relation to the design and interpretation of future investigations with these species and end points.

Volume density approaches are often applied to characterize the composition of testicular tissues. Volume density simply denotes the percentage of a tissue occupied by a particular component, and it is usually determined by a method of random hits (Chalkley, 1943; Eschenbrenner et al, 1948). With a sufficient number of “hits,” the frequency at which a given structure is identified will be equal to its volume density (eg, if the nuclei of a particular germ cell occupy 5% of the testicular tissue, these nuclei should be hit 5% of the time).

Volume density data may be used to estimate the numeric size of various cell populations or to estimate daily sperm production (DSP; Berndtson, 1977; Russell et al, 1990). To assess cell numbers, one determines the volume density of the nuclei for the cells of interest. The total volume of these nuclei is calculated as the product of its volume density and testicular parenchymal volume. One also determines the mean volume of individual nuclei for each type of cell. Dividing the total nuclear volume by the volume of an individual nucleus yields an estimate of the total number of cells within the tissue. DSP per gram of tissue or per testis is often estimated by dividing the number of spermatids by a time divisor equivalent to the number of days of sperm production represented by these cells (Amann and Almquist, 1962; Robb et al, 1978; Johnson et al, 1980a, 1981). DSP may also be estimated from the number of younger germ cells (eg, primary spermatocytes). This requires an adjustment to account for the yield of sperm cells resulting from the division of these younger cells. This adjustment can yield inaccuracies because the actual yield can differ from theoretic expectations (Johnson et al, 1984) and such cell yields may be altered by experimental treatments.

Volume density data have also been used to calculate the total length of the seminiferous tubules per gram of testis or per testis. To do so, one determines the total volume of seminiferous tubules as the product of volume density and total testicular volume. The mean diameter of these tubules is also determined. The relationship among the volume, cross-sectional area, and length of cylindric structures such as seminiferous tubules is described by the equation: volume = πr2 × l, for which r and l equal radius and length, respectively. Thus, the length of the seminiferous tubules can be calculated by entering the values for seminiferous tubular volume and radius (ie, one-half of the diameter) into this equation.

Conventional practices, cost, and other considerations have probably served as the predominant basis for selecting the number of replicates used in most studies. However, the inherent variability among normal, untreated individuals can differ markedly as a function of the end point being examined, the species and/or age of the experimental subjects, or other variables (Berndtson, 1989, 1990, 1991, 2008; Berndtson et al, 1989; Berndtson and Thompson, 1990). The number of replicates needed to provide a chosen power and sensitivity will vary as a function of these same variables.

Because reliability of experimental data is of utmost importance, investigators should consider the number of replicates (eg, animals/treatment group) necessary to provide an experiment of adequate power and sensitivity. Components-of-variance approaches permit estimation of the replication needed to provide experiments with predictable power and sensitivity (Tang, 1938; Steel and Torrie, 1960; Berndtson, 1991), and these approaches have been used to determine the replication requirements associated with several other methods for quantifying sperm production (Berndtson, 1989, 1990, 2008; Berndtson et al, 1989; Berndtson and Thompson, 1990). Because comparable information for many end points assessed via volume density approaches is limited, this study was undertaken to 1) characterize the inherent variability associated with the volume density of several testicular components and with estimates of cell numbers derived from volume density data and 2) characterize the estimated number of replicates needed to provide experiments of predictable power and sensitivity based on data derived via the volume density approach.

Materials and Methods

  1. Top of page
  2. Abstract
  3. Materials and Methods
  4. Results
  5. Discussion
  6. References

The coefficients of variability (CVs) associated with each testicular end point were assessed by surveying the published literature. Because investigators seldom report actual CVs, it was usually necessary to calculate these from the data within each article. This was only possible when either the mean and its SD or the mean, SE of the mean, and number of observations in the mean were presented. When the latter were provided, the SD was calculated as the square root of s2/n, for which s equals the SD and n equals the number of observations in the mean. Only data for normal, untreated males were used, but a CV was calculated for such populations whenever the data permitted. Although some CVs were based on populations consisting of very young, very old, or mixed-age individuals, every CV that could be calculated is presented within Supplemental Tables 1 through 11 (available online at Their inclusion avoided the introduction of any personal biases that might result from presenting only selected data, and the CVs may be of value to future investigators wishing to undertake studies with such populations. However, the characterization of breeding-age males was considered most appropriate for this study. Because some CVs were based on a larger number of individuals than others, the CV considered to be typical for each end point, as summarized in Table 1, was calculated as the weighted mean based on all of the available CVs identified. The only exceptions to the use of weighted-mean CVs are identified within and involved either the exclusion of data derived from prepubertal or senescent subjects or the fact that it was only possible to identify a CV from a single study.

Table 1. . Typical coefficients of variability (%) associated with selected end points in normal, sexually mature males
End PointRodentRabbitHuman
Volume density of seminiferous tubules1.62.57.6
Volume density of interstitial tissue7.412.720.3
Seminiferous tubular diameter5.85.15.3
Seminiferous tubule length6.916.4
Volume density of type B spermatogonia43.5
Volume density of preleptotene spermatocytes25.7
Volume density of pachytene spermatocytes12.1
Volume density of round spermatids12.314.040.5
Nuclear diameter of type B spermatogonia9.8
Nuclear diameter of preleptotene spermatocytes3.5
Nuclear diameter of pachytene spermatocytes1.92.8
Nuclear diameter of round spermatids2.92.3
Nuclear volume of type B spermatogonia27.3
Nuclear volume of preleptotene spermatocytes11.5
Nuclear volume of pachytene spermatocytes5.68.4
Nuclear volume of round spermatids9.36.4
Volume density of Sertoli cells14.212.6
Volume density of Sertoli cell nuclei21.721.3
Volume density of Sertoli nucleoli22.7
Volume density of Leydig cells7.641.9
Volume density of Leydig cell nuclei13.465.5
Volume density of Leydig cell cytoplasm9.137.4
Volume density of connective tissue cells25.8
Nuclear area of Sertoli cell6.2
Nuclear volume of Sertoli cell11.910.4
Nucleolar volume of Sertoli cell6.011.7
Volume of Leydig cell12.8
Nuclear volume of Leydig cell11.7
Volume of cytoplasm/Leydig cell13.3
Type B spermatogonia/g43.2
Preleptotene spermatocytes/g27.9
Pachytene spermatocytes/g12.7
Round spermatids/g9.447.2
Round spermatids per testis14.553.5
Sertoli cells/g11.623.7
Sertoli cells per testis12.6
Leydig cells/g or cm312.930.7
Leydig cells per testis17.1

The number of replicates needed to provide experiments of the chosen power and sensitivity were determined by applying the “typical” CVs (Table 1) in conjunction with tables from which replication can be read directly or determined via extrapolation (Berndtson, 1991). To permit relative comparisons, the number of replicates needed to provide experiments with 90% power to detect 10%, 20%, or 30% changes from the control were identified for each species and end point of interest, and these were summarized in tabular form. Others wishing to identify replication requirements for an experiment of a different power or for a sensitivity outside of these ranges may readily do so by consulting the published reference tables (Berndtson, 1991) or Supplemental Tables 12 and 13 and entering the appropriate CV.


  1. Top of page
  2. Abstract
  3. Materials and Methods
  4. Results
  5. Discussion
  6. References

In general, CVs tend to be greater for testicular components that occupy a smaller portion of the testis, for cells that are less numerous, etc. Such a tendency was evident among the CVs associated with the volume density of the seminiferous tubules vs interstitial tissue. For example, the CVs associated with seminiferous tubules (see Supplemental Table 1) for rodents and rabbits were quite small, with weighted-means of 1.6% and 2.5%, respectively, whereas the corresponding CV for humans equaled 7.6% (Table 1). In contrast, the weighted-mean CVs associated with the less voluminous interstitial tissue equaled 7.4%, 12.7%, and 20.3% for rodents, rabbits, and humans, respectively (Table 1). The aforementioned tendencies were expected. The volume densities of the seminiferous tubules plus interstitial tissue must total 100%. Because the actual volume density of the seminiferous tubules is much greater than that of the interstitial tissue of normal postpubertal males, any deviations from the mean would have a smaller impact on the CV for the former than for the latter. For example, if on average the seminiferous tubules occupied 80% of the parenchyma and interstitial tissue occupied 20%, an individual in which 84% and 16% of the testis was occupied by seminiferous tubules and interstitial tissue would deviate from that average by having 5% more tubular tissue and 20% less interstitial tissue, respectively. Accordingly, it is not surprising that the CVs associated with the volume density of the interstitial tissue would be larger than those for the seminiferous tubules (see Supplemental Table 1).

CVs associated with seminiferous tubular diameter (see Supplemental Table 2) were relatively small and similar among the different species examined. One of these studies (Iczkowski et al, 1991) provided data for rabbits that were only 1 to 9 weeks old. When data for those animals were excluded, the weighted-mean CVs equaled 5.8%, 5.1%, and 5.3% for rodents, rabbits, and humans, respectively (Table 1).

Investigations from which the variability in the length of the seminiferous tubules could be calculated (see Supplemental Table 3) were quite limited. Excluding data from 21- and 39-day-old rats, the weighted-mean CV for rodents in 3 different studies was 6.9% (Table 1). Only 1 data set was identified for the human, for which the CV equaled 16.4% (Table 1). Although CVs for rabbits appeared greater than those for rodents or men (see Supplemental Table 3), the only data located were for rabbits ranging from 1 to 7 weeks. Those values may be of interest to some readers, but it cannot be assumed that they would be similar to those for males of reproductive age.

Although the seminiferous tubules and interstitial tissue represent the 2 main components of the testicular parenchyma, each is comprised of many other components. Unfortunately, many of the reports characterizing the volume density of specific testicular components did not contain all of the information required for CVs to be calculated. Although only limited data can be presented for some components (see Supplemental Table 4), these are informative, and they should serve as a useful resource for investigators wishing to evaluate one or more of these lesser components in future studies of estimated power and sensitivity.

From data presented by Johnson et al (1980b), a CV of 2.4% was calculated for the volume density of the seminiferous epithelium in the rat. This CV was similar to those for the seminiferous tubules in rodents (see Supplemental Table 1). This similarity is expected because the seminiferous epithelium is the major constituent of the seminiferous tubules. Each of the other components listed in Supplemental Table 4 occupy a much smaller volume of the testicular parenchyma. Accordingly, the CVs associated with these tended to be much greater than for any of the more prominent components described previously. Because the data are limited, the author has not attempted to identify a typical CV or to characterize the replication that would be needed to assess these end points with a given level of power and sensitivity. Others interested in making such assessments may readily do so by consulting the reference tables described herein (Berndtson, 1991) or Supplemental Tables 12 and 13 and using the reported CVs for the end point(s) of interest.

CVs associated with the volume density of germ cell nuclei are summarized in Supplemental Table 5. Considerable difficulty was encountered while attempting to locate information on this variable. Although there are many publications reporting germ cell numbers determined via the volume density approach, the volume density data from which these cell numbers were determined has been reported infrequently. Nonetheless, the data within Supplemental Table 5 are useful and informative.

The CVs associated with the volume density of the nuclei of type B spermatogonia, preleptotene primary spermatocytes, and pachytene primary spermatocytes were determined from the data from one investigation with the rat (Johnson et al, 1984a) and averaged 43.5%, 25.7%, and 12.1%, respectively (Table 1). The corresponding weighted-mean CV associated with the volume density of round spermatid nuclei was 12.3% (Table 1). The relative magnitude of the values for these different cell populations are consistent with the general tendency cited previously for larger CVs to be associated with tissue components that occupy a smaller proportion of the tissue. Similarly, Amann (1970) reported the volume densities of round spermatids at a single stage (stage I) and for all stages during which these cells are present (stages V to I) in the rabbit, for which the CVs were 24.4% and 14.0%, respectively (see Supplemental Table 5). The observed difference between the CVs for the larger (ie, pooled stages) vs more restricted (ie, single stage) population of round spermatids is noteworthy, and it reinforces how absolute differences contribute to larger CVs for components with lower volume densities. It should be noted that the CVs associated with round spermatids in the rat (see Supplemental Table 5) were based on the total numbers of such cells, irrespective of stage of the cycle of the seminiferous epithelium. The weighted-mean value of 12.3% for that species is comparable to the value of 14.0% for the equivalent population of round spermatids in the rabbit (Table 1). However, the corresponding CVs in 2 different investigations with the human were considerably larger (see Supplemental Table 5), with a weighted mean of 40.5%. Supplemental Table 5 contains several CVs that were determined for specific combinations of cells (eg, pachytene plus diplotene spermatocytes, abnormal germ cells, total germ cells). These data may be useful to some readers but are not considered further herein.

The CVs for the diameter or volume of germ cell nuclei are summarized in Supplemental Table 6. Because of the relationship between the diameter and volume of a sphere (volume = 1/6 × πD3), for which D is the diameter, the CVs for nuclear volume were essentially 3 times greater than those for the nuclear diameter of the same cells within the same sample populations. The author was unable to identify comparable data for the rabbit, and only a single value could be identified for each of the types of germ cells recorded for the human. For each of the germ cell populations in rats other than spermatogonia, the CVs associated with nuclear diameter and volume were relatively small, with a range of 1.9% to 3.5% and 5.6% to 11.5%, respectively (Table 1).

The CVs associated with the percentage of the testis occupied by somatic cells or their components are summarized in Supplemental Table 7, and the CVs associated with various dimensions of these cells or their components are given in Supplemental Table 8. As with several other variables described previously, determination of representative CVs for somatic cell dimensions proved challenging because the data needed to calculate these were included infrequently within the literature and such data could not be identified for rabbits. However, the CVs for the volume density of Sertoli cells and Sertoli nuclei in one population of rats were determined to equal 14.2% and 21.7%, respectively (Table 1). The corresponding CVs for humans were similar, with weighted means of 12.6% and 21.3%, respectively, and the CV associated with the volume density of human Sertoli nucleoli was 22.7% (Table 1). The mean CVs associated with the volume density of Leydig cells differed substantially between rodents and the human, for which the weighted-mean values equaled 7.6% and 41.9%, respectively (Table 1). A difference owing to species was also detected for the volume density of Leydig cell nuclei and cytoplasm; the CVs associated with these nuclei in rodents and humans equaled 13.4% and 65.5%, whereas the corresponding values for the cytoplasm equaled 9.1% and 37.4%, respectively (Table 1 and Supplemental Table 7). Only 1 study could be identified from which it was possible to calculate the CV associated with the volume density of connective tissue cells. For that investigation, the CV among hamsters equaled 25.8%.

Several investigators have employed an approach for estimating the volumes of irregularly shaped nuclei that involves determining their area in serial sections of confirmed thickness. This approach has been especially useful for Sertoli nuclei, for which the CVs for nuclear area in rats of different ages ranged from 5.0% to 8.9% (see Supplemental Table 8). Excluding values for rats aged 22 days, the weighted-mean CV for this end point was 6.2% (Table 1). The CVs for Sertoli nuclear volumes were expectedly larger, with a value of 11.9% calculated for the rat and a weighted-mean value of 10.4% for 18- to 71-year-old men (Table 1). The corresponding CV for Sertoli nucleolar volume in the human was 11.7% (Table 1). Because of the geometric relationship between the diameter and volume of spherical structures cited previously, one would anticipate an approximate 3-fold greater CV for volume than for diameter measurements. On that basis, the CV of 15.3% calculated for Sertoli nucleolar diameter in 45-day-old rats seemed quite large (ie, the corresponding CV for volume would likely approach 45%, but the corresponding CV for nucleolar volume was only 11.7% for humans). For those reasons, and because the unexpectedly large CV was based on only 5 subjects, the author calculated the CVs for Sertoli nucleolar volume from 2 larger data sets in his possession that originated from previously published investigations. One of these involved 72 untreated rats, aged 60, 150, or 240 days (Berndtson and Thompson, 1990), for which the CV for Sertoli nucleolar diameter was only 5.7%. The second involved 51 hamsters that were either maintained under a 14:10-hour light/dark cycle or allowed to undergo testicular degeneration and subsequent recrudescence in response to altered photoperiod (Berndtson and Desjardins, 1974). The CVs for Sertoli nuclear diameter in that study equaled 6.9% and 5.6% for the 5 untreated controls and for the 51 pooled treated and untreated subjects, respectively. Photoperiod is a natural factor controlling testicular function in the Golden hamster, and the CVs based on the 5 subjects maintained under the 14:10-hour light/dark photoperiod vs all 51 hamsters in the study were similar. For both of those reasons, it seemed appropriate to use the data for all 51 subjects for this investigation. Based on all 3 studies (see Supplemental Table 8), the weighted mean was determined to equal 6.0% (Table 1).

Data pertaining to the volume of individual Leydig cells and to their nuclei or cytoplasm could only be identified for the hamster (see Supplemental Table 8). For that species, the CVs associated with these 3 variables equaled 12.8%, 11.7%, and 13.3%, respectively (Table 1).

CVs associated with numbers of germ cells per unit of testicular tissue are presented in Supplemental Table 9. For rodents, the CVs associated with round spermatids per gram ranged from 9.2% to 10.1% (mean, 9.4%; Table 1), whereas the CV for round spermatids per testis equaled 14.5% (Table 1). The corresponding CVs were much larger for humans, with values of 47.2% and 53.5% for the numbers of these cells per gram and per testis, respectively (Table 1). Among rats, the CVs associated with the number of germ cells per gram became much smaller during the progressive transformation of these cells to a more developmentally advanced type. For example, the CV for type B spermatogonia was 43.2%, vs only 9.2% for round spermatids within the same study. Such a general trend should be expected. Spermatogenesis involves a series of cell divisions by which a small number of spermatogonia occupying only a small portion of the testis yield large numbers of spermatids with a substantially greater volume density. As noted previously for other variables, CVs tend to be greater for testicular components that occupy a smaller proportion of the testis. Although data equivalent to that for the rat were not identified, this general trend would be expected among the various types of germ cells within all species. The remaining CVs presented in Supplemental Table 9 were based on specific combinations of cells rather than a single cell type. These may be useful to some investigators but will not be considered further herein.

The CVs associated with numbers of Sertoli cells per unit of testicular tissue are presented in Supplemental Table 10. Because the specific gravity of testicular tissue is very close to 1.0 (eg, human: Johnson et al, 1981; bull: Swierstra, 1966; stallion: Gebauer et al, 1974; Johnson and Neaves, 1981; mouse: Mori et al, 1982; rat: de Jong and Sharpe, 1977), it seemed reasonable to consider values expressed per gram or per cm3 as being equivalent. When data for 21- or 22-day-old rodents were excluded, the CVs associated with the number of Sertoli cells per gram (or per cm3) ranged from 6.3% to 16.2% in rodents, with a weighted-mean value of 11.6% (Table 1). This value was approximately one-half of the weighted-mean CV of 23.7% for the human (Table 1).

The expression of Sertoli cell numbers in rodents on a per-testis basis tended to yield CVs that were similar to those when expressed per gram of tissue. Excluding data for 21- or 22-day-old animals, the CVs associated with Sertoli cell number per testis ranged from 6.6% to 16.2%, with a weighted-mean value of 12.6% (Table 1). Supplemental Table 11 contains the CVs associated with the numbers of Leydig cells per unit of testicular tissue. For rodents, the number of Leydig cells per gram or per cm3 of tissue ranged from 9.3% to 17.2% (weighted mean, 12.9%), and the corresponding CV for this variable expressed per testis equaled 17.1% (Table 1). Greater variability might be anticipated for the numbers of most cells when expressed per testis than per gram because the former would reflect variability in both the number of cells per unit of tissue and differences in testicular size. A single value of 30.7% for Leydig cell number per cm3 was identified from one study with men (Table 1). Because this was based on subjects ranging from 65 to 87 years, it is not known whether this CV would be applicable for a younger population of normal reproductive age.


  1. Top of page
  2. Abstract
  3. Materials and Methods
  4. Results
  5. Discussion
  6. References

As evident from the data in Supplemental Tables 1 through 11, large differences exist among the CVs associated with specific end points and among different species. Variability could arise from a number of sources, including technician error, inadequate or unrepresentative sampling within a single testis or subject, and actual differences among individuals and species. Because the raw data were not available, it was not possible to compartmentalize these potential sources of variability. Nonetheless, because most CVs presented herein are based on data from a number of published studies, they do characterize the differences that are being encountered among individual males when studies are conducted with the level of attention to technical precision and the sampling protocols employed within this discipline. Although they are presented with such an understanding, inherent variability is a major factor impacting the power and sensitivity of an experiment, and it is appropriate to consider whether the replication requirements presented herein might be reduced by better control of technical and sampling errors. Because these issues have been examined quite extensively within other studies (Berndtson, 1989, 1991, 2008), they will be discussed only briefly herein.

Needless to say, measurement error should be minimized within practical limits. Numerous potential sources of such error associated with volume density and other approaches for assessing sperm production have been described (Berndtson, in press) and may include tissue shrinkage, differences between the microtome setting and actual section thickness, inaccurate estimation of the volume of nonspherical nuclei, and others. These require constant attention, and the increased efforts of many researchers to accurately assess and correct for tissue shrinkage, improve the precision of nuclear volume measurements, etc is commendable. However, one frequent shortcoming is the tendency for these same researchers to limit precise measurements to a small number of specimens (eg, to measure the volume of only a few nuclei) and to assume that the precision of these limited measurements ensures a high degree of precision or accuracy for the experiment. Clearly, one must take an adequate number of measurements per subject to quantify a given end point accurately for that individual. Unfortunately, efforts to determine the number of measurements needed to accurately characterize specific end points within a single individual appear quite limited. However, the relationship between the number of observations per animal and the replication needed for detection of a treatment effect has been examined in some other studies (Berndtson et al, 1989; Berndtson and Thompson, 1990), and the statistical approaches used in those studies might prove useful to others wishing to conduct similar determinations for other end points.

Differences have also been recorded between the 2 testes of individual males (Berndtson, 1989; Berndtson and Thompson, 1990), and this represents an ever-present possibility within any study. Unfortunately, attempts to validate reliance on evaluations of a single testis or to establish the uniformity of testicular tissue from multiple loci within a single testis are often inadequate. For example, some investigators have examined left vs right testes and concluded that these do not differ within individual males based on the absence of a statistically significant difference between the means for each side. Such an interpretation is clearly flawed. Although such analyses are appropriate to assess whether means for the left vs right testes are different, they do not provide a basis for judging differences within a single male. For example, within-animal differences in testis weight could remain undetected with such analyses if the left testis was larger than the right testis in some individuals and vice versa. Many assessments to confirm the uniformity of tissue among different loci within a single testis or to validate the use of a small sample of tissue (eg, biopsy specimens) are also frequently inadequate. This view is based on the misuse of statistical analyses as previously described (eg, concluding that testicular tissues from dorsal, ventral, or other loci are similar within a testis based on analyses designed to compare group means for these loci) and on numerous instances in which the uniformity of tissues within a testis has been judged via small, preliminary studies lacking sufficient power and sensitivity for identifying meaningful differences that might exist.

Although measurement and sampling errors merit consideration, their importance must be kept in perspective. Inherent variability is a characteristic of all living things, and this is apparent among the various end points used to assess the testis. In the author's experience, among-animal variability will usually be much greater than any variability associated with measurement error or inadequate sampling within animals. For example, Foote et al (1986) reported the testicular weights for each of 5 to 6 untreated rabbits used in one study. The largest testes were nearly twice as large as the smallest testes within this sample population. Given the accuracy and precision of laboratory balances and the simplicity of measuring organ weights, such differences cannot be attributed to measurement error. Rather, they must be attributed to actual among-animal differences that will not be reduced by choosing other methods for assessing testicular size or by adopting extreme measures for obtaining accurate testicular weights. Moreover, because testicular weight is highly correlated with sperm production (Amann and Almquist, 1962; Berndtson, 2008) and many other testicular end points, the inherent among-animal variability for many testicular end points will be substantial. For such reasons, it is unlikely that the CVs presented herein have been impacted greatly owing to the precision of individual measurements, the evaluation of one vs both testes per male, and/or the sampling intensity used for each tissue specimen. Those sources of variability are likely to be minimal in comparison to that resulting from the actual variability among males and the limited numbers of males in most studies (see Supplemental Tables 1 through 11).

Larger CVs were usually noted for humans than for rodents and rabbits. In addition to their high degree of genetic homogeneity, most studies with lab animals include excellent control of variables such as environmental conditions, age, etc. In contrast, the difficulty in obtaining tissues from men of reproductive age has often resulted in evaluation of postmortem samples from men of variable age and uncontrolled environmental backgrounds. The greater among-male variability for such populations is not surprising.

The typical CVs associated with a large number of potential end points are summarized in Table 1, and replication requirements based on these CVs are presented in Table 2. Because the number of end points is so large (Table 1) and the power and sensitivity that one might desire is variable, only selected end points will be used subsequently to demonstrate the application of these data and to illustrate the relative impact of differences in the magnitude of individual CVs on replication requirements and/or experimental power and sensitivity. The end points selected for this purpose were ones with CVs that encompassed the broad range of such values identified in this review (Table 1).

Table 2. . Number of males needed per treatment group to provide an experiment of 90% power for detecting a 10%, 20%, or 30% difference (P < .05) from the control meana
 Difference From Control, %
End Point102030102030102030
  1. a For 2-tailed tests with 2-treatment experiments. For experiments with a 1-tailed test, the replication shown would provide an experiment of 95% power at P < .025.

Volume density of seminiferous tubules3224221453
Volume density of interstitial tissue135336116882412
Volume density of type B spermatogonia40010146      
Volume density of preleptotene spermatocytes1413617      
Volume density of pachytene spermatocytes33105      
Volume density of round spermatids33105      
Round spermatids/g21744312660715369
Round spermatids per testis46137      
Sertoli cells/g3195   1203215
Leydig cells/g37116      

The number of replicates needed to detect a treatment response of only 10% in an experiment of 90% power ranged from a low of 3 for the volume density of the seminiferous tubules in the rat to a high of 607 for assessments based on the numbers of round spermatids per gram in the human. The tremendous difference noted in this comparison is directly attributable to the inherent variability associated with these 2 end points and species, for which the typical CVs were 1.6% and 47%, respectively. The inherent variability associated with many end points would appear to be of a magnitude that would permit relatively powerful and sensitive experiments to be achieved with levels of replication that may be substantial but not entirely prohibitive (Table 2). In that regard, the quantification of germ cell numbers is one of the most important end points for assessing possible effects of known or suspected reproductive toxins, proposed new human or animal drugs, or other factors affecting sperm production. An experiment of 90% power for detecting a 10% treatment response via the numbers of round spermatids per testis would require approximately 46 rats per treatment group, whereas the use of 13 rats per group should permit detection of a corresponding response of 20% (Table 2). Unfortunately, the inherent variability associated with many important end points appears quite substantial for the human, and robust experiments with this species will remain especially challenging. Indeed, because the CV for the number of round spermatids per testis was 53.5% in the human, one would expect an experiment of only 80% power for detecting a 20%, 40%, or 60% treatment response to require approximately 114, 30, and 20 subjects, respectively.

It should be noted that the replication requirements presented in Table 2 are intended for use with 2-tailed statistical tests (ie, those in which a treatment could cause an increase or decrease in a treatment mean; see Table 2 footnote). For some investigations, a 1-tailed test might be justified. For example, it might be reasonable to anticipate that a known or suspected toxin could depress sperm production but would be unlikely to enhance it. Accordingly, reference tables enabling the CV to be used to determine replication requirements applicable for 1-tailed tests were prepared and are presented within Supplemental Tables 12 and 13. Fewer replicates are required for 1-tailed tests, although the numbers are still substantially greater than those used in most experiments. In general, the replication requirements estimated for experiments of 90% or 95% power will be approximately two-thirds as great when 1-tailed tests are applicable than for corresponding studies with 2-tailed tests. Readers are cautioned that the use of 1-tailed tests is not justified for every end point in which one is investigating a potential toxin. The response by many variables may remain unpredictable. For example, if a toxin reduced the volume density of some component, the volume density of some others would be increased, simply because the total for all components must equal 100%.

The present effort required an extensive review of the scientific literature. For most of those investigations from which CVs could be determined (see Supplemental Tables 1 through 11), the number of replicates was fewer than 10; for many, it was 5 or fewer. The use of a limited number of replicates increases the potential for the disproportionate assignment of animals with high or low values to individual treatment groups. The impact of this would be of lesser consequence when the end point of interest is quite homogeneous within the sample population (ie, if the end point of interest had a small CV) than for those end points of greater variability. It is for that reason that replication requirements differ as a function of the CV.

The replication requirements presented in Table 2 are intended to serve as a representative sample for illustration. Those wishing to estimate the replication needed to provide experiments of a different power or range of sensitivities may do so by using the power and sensitivity reference tables (Berndtson, 1991; see Supplemental Tables 12 and 13) and either the CVs summarized herein or ones generated from other data sets. When using such data, investigators are cautioned not to judge the value of specific end points or evaluation methods based on a simple comparison of the number of replicates required with each. Imagine, for example, that an investigator wished to assess treatment effects on sperm production in the rat by quantifying the number of round spermatids per gram of tissue or per testis. Imagine further that a treatment caused an actual 30% decrease in sperm production. Such a decrease should be detectable in an experiment of 90% power by assessing the number of round spermatids per testis with 7 rats per treatment group (Table 2). Although a corresponding 30% decrease in the number of round spermatids per gram should be detectable with only 4 rats per treatment group (Table 2), decreases in sperm production are often accompanied by decreases in testis size. The hypothetical 30% decrease in sperm production would only produce a 30% decrease in the number of round spermatids per gram if testicular weight remained unchanged. If, for example, the treatment caused a corresponding 30% decrease in testis weight, the number of spermatids per gram would be unaltered. Unfortunately, the response of individual end points to an identical treatment is usually variable and may be difficult to predict in advance. Accordingly, predicting which method(s) might be the most powerful or sensitive for detecting a treatment response is a somewhat challenging and potentially imprecise process that cannot be based on a simple comparison of relative replication requirements alone. In another study, several methods by which one might detect changes in sperm production were ranked on the basis of their likely ability to detect changes in sperm production (Berndtson, 2008). The approach used to develop that ranking may serve as a model for similar comparisons among other end points.

It would be an unfortunate and unintended consequence for readers to regard the volume density approach as being of limited value for assessing some end points because a large number of replicates is needed to provide highly sensitive experiments of great power. In that regard, it is important to remain aware that variability is an inherent characteristic of all living things. As illustrated previously with data for testicular weights of rabbits, that variability, and not insensitivity or inaccuracy of the evaluation methods per se, is the primary factor impacting the CVs for each end point in most investigations. Readers should also avoid concluding that the inherent variability associated with reproductive end points is greater than that for many other body systems. The author's examination of data within a wide array of disciplines suggests that the among-animal variability associated with the reproductive system is not unusual or exceptionally large.

Despite pressures to reduce animal numbers, studies by regulatory agencies must be designed to permit one to draw reliable conclusions. The data provided herein should be useful during the design of such studies. At the same time, the data within Table 2 are not intended to represent recommended levels of replication that should be used in all future investigations. Indeed, factors such as the cost of animals, the time required to perform histometric evaluations, whether one is conducting a preliminary experiment or is nearing a final stage of testing, etc will continue to merit consideration. Each investigator will need to determine what level of replication is possible and/or what level of power and sensitivity is acceptable to fulfill the intended purpose(s) of their studies. However, the author strongly recommends that investigators adopt the practice of determining and reporting the estimated power and sensitivity of each study. In support of that recommendation, it should be noted that it is customary for investigators to perform statistical analyses of their data. These analyses typically focus only on the type I error probability, which represents the probability of error when the presence of a treatment effect is declared. By convention, most investigators will only declare that a treatment has had an effect if they can do so with a probability of error of 5% or less. If one cannot declare the existence of a treatment effect at P ≤ .05, the response to treatment would be declared nonsignificant (P > .05). Unfortunately, the latter is often interpreted as evidence that the treatment was without effect, although the probability that an actual treatment response may have been missed (ie, type II error probability) is seldom assessed. Imagine, for example, that one had conducted a study with a suspected reproductive toxin and did not detect a statistically significant treatment effect at P ≤ .05. To conclude from that finding that the treatment was without effect would be analogous to stating that because one cannot be 95% certain that the agent is toxic, it can be considered to be without effect and therefore to be safe. Clearly, knowledge that an experiment had sufficient power to detect a meaningful treatment response would be useful in judging an apparent lack of effect, and the reporting of such information is highly encouraged.

The reliability of one's findings should be an important consideration in every experiment. It is hoped that the information presented herein will serve as a useful resource for those investigators wishing to determine the power and sensitivity of published investigations, will facilitate the design of future studies that will provide the experimental power and sensitivity that is desired, and will stimulate greater awareness of the importance of power and sensitivity considerations when the outcome of any experiment is interpreted.


  1. Top of page
  2. Abstract
  3. Materials and Methods
  4. Results
  5. Discussion
  6. References
  • Amann RP. The male rabbit. IV. Quantitative testicular histology and comparisons between daily sperm production as determined histologically and daily sperm output. Fertil Steril. 1970; 21: 662672.
  • Amann RP, Almquist JO. Reproductive capacity of dairy bulls. VIII. Direct and indirect measurement of testicular sperm production. J Dairy Sci. 1962; 45: 774781.
  • Berndtson WE. Methods for quantifying mammalian spermatogenesis: a review. J Anim Sci. 1977; 44: 818833.
  • Berndtson WE. Sampling intensities and replication requirements for detection of treatment effects on testicular function in bulls and stallions: a statistical assessment. J Anim Sci. 1989; 67: 213225.
  • Berndtson WE. Replication requirements and number of ejaculates needed for assessing treatment effects on sperm output and seminal characteristics of electroejaculated Holstein bulls. J Anim Sci. 1990; 68: 709718.
  • Berndtson WE. A simple, rapid and reliable method for selecting or assessing the number of replicates for animal experiments. J Anim Sci. 1991; 69: 6776.
  • Berndtson WE. Comparative reliability and sensitivity of different methods for assessing treatment effects on sperm production. Anim Reprod Sci. 2008; 105: 522.
  • Berndtson WE. The importance and validity of technical assumptions required for quantifying sperm production rates: a review [published online ahead of print 29 July 2010]. J Androl. In press. doi: 10.2164/jandrol.109.008870.
  • Berndtson WE, Desjardins C.. Circulating LH and FSH levels and testicular function in hamsters during light deprivation and subsequent photoperiodic stimulation. Endocrinology. 1974; 95: 195205.
  • Berndtson WE, Neefus C., Foote RH, Amann RP. Optimal replication for histometric analyses of testicular function in rats and rabbits. Fundam Appl Toxicol. 1989; 12: 291302.
  • Berndtson WE, Thompson TL. Age as a factor influencing the power and sensitivity of experiments for assessing body weight, testis size, and spermatogenesis in rats. J Androl. 1990; 11: 325335.
  • Chalkley HW. Method for the quantitative morphologic analysis of tissues. Nat Cancer Inst. 1943; 4: 4753.
  • de Jong FH, Sharpe RM. The onset and establishment of spermatogenesis in rats in relation to gonadotrophin and testosterone levels. J Endocr. 1977; 75: 197207.
  • Eschenbrenner AB, Miller E., Lorenz E.. Quantitative histologic analysis of the effect of chronic wholebody irradiation with gamma rays on the spermatogenic elements and the interstitial tissue of the testes of mice. Nat Cancer Inst. 1948; 9: 133147.
  • Foote RH, Berndtson WE, Rounsaville TR. Use of quantitative testicular histology to assess the effect of dibromochloropropane (DBCP) on reproduction in rabbits. Fundam Appl Toxicol. 1986; 6: 638647.
  • Gebauer MR, Pickett BW, Swierstra EE. Reproductive physiology of the stallion. II. Daily production and output of sperm. J Anim Sci. 1974; 39: 732736.
  • Iczkowski KA, Sun EL, Gondos B.. Morphometric study of the prepubertal rabbit testis: germ cell numbers and seminiferous tubule dimensions. Am J Anat. 1991; 190: 266272.
  • Johnson L., Lebovitz RM, Samson WK. Germ cell degeneration in normal and microwave-irradiated rats: potential sperm production rates at different developmental steps in spermatogenesis. Anat Rec. 1984a; 209: 501507.
  • Johnson L., Neaves WB. Age-related changes in the Leydig cell population, seminiferous tubules, and sperm production in stallions. Biol Reprod. 1981; 24: 703712.
  • Johnson L., Petty CS, Neaves WB. The relationship of biopsy evaluations and testicular measurements to over-all daily sperm production in human testes. Fertil Steril. 1980a; 34: 3640.
  • Johnson L., Petty CS, Neaves WB. A comparative study of daily sperm production and testicular composition in humans and rats. Biol Reprod. 1980b; 22: 12331243.
  • Johnson L., Petty CS, Neaves WB. A new approach to quantification of spermatogenesis and its application to germinal cell attrition during human spermatogenesis. Biol Reprod. 1981; 25: 217226.
  • Johnson L., Petty CS, Porter JC, Neaves WB. Germ cell degeneration during postprophase of meiosis and serum concentrations of gonadotropins in young adult and older adult men. Biol Reprod. 1984b; 31: 779784.
  • Johnson L., Zane RS, Petty CS, Neaves WB. Quantification of the human Sertoli cell population: its distribution, relation to germ cell numbers, and age-related decline. Biol Reprod. 1984c; 31: 785795.
  • Mori H., Shimizu D., Fukunishi R., Christensen AK. Morphometric analysis of testicular Leydig cells in normal adult mice. Anat Rec. 1982; 204: 333339.
  • Robb GW, Amann RP, Killian GJ. Daily sperm production and epididymal sperm reserves of pubertal and adult rats. J Reprod Fertil. 1978; 54: 103107.
  • Russell LD, Ettlin RA, Hikim AP Sinha, Clegg ED. Histological and Histopathological Evaluation of the Testis. Clearwater, FL: Cache River Press; 1990.
  • Steel RGD, Torrie JH. Principles and Procedures of Statistics. New York, NY: McGraw-Hill Book Co; 1960.
  • Swierstra EE. Structural composition of Shorthorn bull testes and daily spermatozoa production as determined by quantitative testicular histology. Can J Anim Sci. 1966; 46: 107119.
  • Tang PC. The power function of the analysis of variance tests with tables and illustrations of their use. Stat Res Mem. 1938; 2: 126149.