Method-Based Differences in the Automated Analysis of the Three-Dimensional Morphology of Trabecular Bone
Craig A. Simmons,
Orthopedic Biomechanics Laboratory, Department of Orthopedic Surgery, Charles A. Dana Research Institute, Harvard Thorndike Laboratories, Beth Israel Deaconess Medical Center, Boston, Massachusetts, U.S.A.
Centre for Biomaterials, University of Toronto, Toronto, Ontario, Canada
Orthopedic Biomechanics Laboratory, Department of Orthopedic Surgery, Charles A. Dana Research Institute, Harvard Thorndike Laboratories, Beth Israel Deaconess Medical Center, Boston, Massachusetts, U.S.A.
Institute for Spinal Disorders, Houston, Texas, U.S.A.
John A. Hipp, Ph.D. Institute for Spinal Disorders Suite 1900 6560 Fannin Houston, TX 77030 U.S.A.
The three-dimensional (3D) morphology of trabecular bone is frequently quantified using computer programs. However, there are no standardized implementations of morphology programs and many variations are possible. Even though programs may use the same basic method, results can be significantly different because of differences in implementation. Morphology data from different laboratories therefore may not be comparable. The method of directed secants, with parallel plate assumptions, is commonly used to quantify 3D morphology. We examined the effect of several variations in the implementation of this method on measurements of trabecular plate number (Tb.N), trabecular thickness, and trabecular spacing. Three-dimensional micromagnetic resonance images of 10 bovine trabecular bone specimens were analyzed using several variations of the directed secant method. An analysis of covariance with repeated measures suggested that variations in the algorithm used to count test line intersections, variations in the criteria used to classify a test coordinate as bone or marrow, and variations in the number of test grid rotations had significant effects on Tb.N (p < 0.0001). The largest difference in Tb.N (52%) was due to the method used to count test line intersections with the bone–marrow interface. Variations in the classification algorithm and variations in the number of test line grid rotations resulted in a 6% difference in Tb.N. The spacing of the test line grids did not affect Tb.N (p = 0.28), and all differences were independent of volume fraction (p = 0.67). These data show that there can be significant differences in trabecular bone morphology measurements due only to the method used for the measurements. To facilitate comparisons between laboratories, we have made validated computer programs to measure trabecular bone morphology available over the internet.
Microstructural organization plays an important role in defining the mechanical properties of trabecular bone.(1–4) Moreover, trabecular morphology adapts to changes in mechanical environment and to aging and disease.(5–8) Understanding this adaptive behavior as well as the influences of aging and disease on trabecular properties requires characterization of the three-dimensional (3D) architecture of trabecular bone.
The traditional method for imaging trabecular bone is two-dimensional (2D) optical microscopy.(9) From these images, morphologic parameters can be determined using a technique such as the secant method.(10) This method is equivalent to scanning a 2D matrix of bone and marrow with an array of parallel test lines with a uniform spacing. The average number of intersections of the test lines with the bone–marrow interface are counted for several different test line orientations. Based on the area fraction, the number of intersections, and stereology principles and models, 3D structural parameters can be estimated from the planar sections.(9) These techniques are well established and allow estimation of basic morphometric measures such as bone volume fraction (BV/TV), trabecular number (Tb.N), thickness (Tb.Th), and separation (Tb.Sp).
More recent imaging techniques, such as microcomputed tomography,(11) micromagnetic resonance imaging,(12,13) optical serial reconstruction,(14) and X-ray tomographic microscopy(15) produce 3D images of trabecular bone. These images allow 3D morphology measurements using a 3D version of the directed secant method. These methods can be automated and have been applied routinely to describe the microstructure of trabecular bone. However, there are no standardized implementations of an automated 3D directed secant method. Furthermore, the directed secant method requires the user to define certain analysis parameters such as the test grid spacing, the number of test grid rotations, and how the bone–marrow interface is identified. The effect of varying these parameters on basic morphometric measures has not been described previously. Variations between laboratories in the implementation of morphological analysis algorithms or the selection of analysis parameters could result in significant differences that would preclude accurate comparison of morphometry data.
Thus, the objective of this study was to examine the relative effect of varying the implementation of the directed secant method on measurements of trabecular bone microstructure. Specifically, we asked whether basic morphologic parameters (Tb.N, Tb.Th, and Tb.Sp) were significantly affected by variations in: (1) the algorithm to determine when a test line crossed the bone–marrow interface; (2) the algorithm to classify a particular point as bone or marrow; (3) the number of test grid rotations; and (4) the spacing of the test grid lines.
MATERIALS AND METHODS
Ten micromagnetic resonance imaging (micro-MRI) images of bovine trabecular bone were obtained for analysis. Cylindrical trabecular bone specimens (6 mm long, 6 mm in diameter) were cored from homogeneous regions of the bovine proximal tibiae of 10 skeletally mature cows. All marrow was removed from the specimens by ultrasonically agitating the specimens in a dilute solution of bleach. The specimens were immersed in a 0.2 mg/ml solution of gadopentetate dimegumine per 100 ml of distilled water, then vacuum degassed to remove air bubbles. Specimens were imaged using a Bruker AM spectrometer with a microimaging accessory and a 25 mm H1 coil (Bruker Instruments, Bilerica, MA, U.S.A.). A 3D proton spin-echo imaging sequence was used.(12) The final images were stored in 3D arrays with a nominal isotropic resolution of 613 μm3/voxel. The images were thresholded to binary images by matching the bone–marrow interfaces in the binary image with the bone–marrow interface in the original image.(12) The average threshold for nine transverse sections (three in each of three orthogonal planes) was used.
The basis for our morphological analysis is a 3D version of the directed secant method. The program code for our implementation is publicly available (Appendix A). Our code analyzes a spherical test region from a 3D image or array, with the size and placement of the analysis sphere defined by the user. For this study, a 3.9 mm sphere centered in the middle of the image was used. The directed secant algorithm is applied using a 3D test line grid, with the user defining the test line spacing and the number of random test grid rotations. Each test line is systematically scanned, and the number of intersections between bone and marrow space are recorded.
Our program calculates the morphology parameters based on the parallel plate model(9) and uses the recommended histomorphometric nomenclature.(16) Using the parallel plate model, bone volume fraction and trabecular plate number are used to derive the remaining morphologic parameters. The bone volume fraction is determined by dividing the number of bone voxels in the analysis region by the total number of voxels in the analysis region. The trabecular plate number is determined by dividing the total number of bone–marrow intersections by the total length of the test lines applied to the analysis region for all rotations. Tb.Th and Tb.Sp are derived from BV/TV and Tb.N.(9) The resulting parameters are therefore the average values obtained from several orientations of the 3D test grid.
Algorithm and parameter variations
To explore the effect of method-based differences on the determination of morphology parameters, we first examined two algorithm variations. These algorithms are required to analyze the discretized representation of the trabecular bone (Fig. 1). The first algorithm variation was the method by which an intersection between bone and marrow was determined. Two alternatives were explored. The first method recorded an intersection between bone and marrow if, while advancing along a test line, the binary value of the current voxel differed from the binary value of the previous voxel. This was called the two-voxel method. The other method was a three-voxel algorithm, which examined three voxels in a row along a test line. This method eliminated isolated voxels by recording intersections only if the test line, after leaving one material, encountered two voxels in a row of the new material. Eliminating isolated voxels may be desirable for noisy images. We also varied the algorithm for determining whether a particular point along a test line was classified as bone or not. This algorithm is required since the test line coordinates are described in real coordinates, whereas the voxels in the image array are described in integer coordinates. Again, two alternatives were explored. The first method was to truncate the real value of the coordinate in question and look up the resulting integer coordinate in the previously thresholded image array. This was called the integer method. The second method (the weighted average method) classified a point as bone or not based on whether the weighted average of the binary values of the nine voxels neighboring the point in question was closer to bone or marrow.
We also determined the effect of analysis parameter selection on morphological analyses. Specifically, we examined whether variations in the number of test grid rotations and in the spacing of the test grid lines had significant effects on the calculation of morphology parameters. The number of test grid rotations was varied from 16 to 256, and the spacing between test lines was varied from 100 to 400 μm.
Each of the 10 images was analyzed using every possible combination of the parameter variations and the two options for each of the two algorithm variations. This resulted in 80 analyses for each image. A multiway analysis of covariance with repeated measures was performed to determine whether each of the algorithm and parameter variations had a significant effect on the determination of the trabecular plate number. Trabecular plate number was selected as the dependent variable because, along with volume fraction, it is a primary parameter from which other stereology parameters are derived. The covariate was bone volume fraction.
The 10 trabecular bone specimens had a mean bone volume fraction of 39.8% (standard deviation, 10.5%; range, 26.7–60.9%). Volume fraction was not a significant covariate (p = 0.67), indicating that the differences due to algorithm and parameter variations were independent of BV/TV.
Both of the algorithm variations had a significant effect on the calculation of Tb.N (p < 0.0001). The largest difference was between the two methods for recording an intersection of a test line with the bone–marrow interface. The Tb.N determined using the three-voxel method was, on average, 52% less than that determined using the two-voxel method (Fig. 2). The difference in Tb.N between the two methods for classifying a point as bone or marrow was smaller but still significant. The Tb.N determined using the volume averaging method was, on average, 6% less than that determined by the integer method (Fig. 2). The analysis using the volume averaging method, however, took 7.6 times longer than with the integer method. There was no difference in the solution time between the two methods for determining an intersection.
The analysis parameter variations did not have as large an impact on the calculation of Tb.N as did the algorithm variations. However, the variation in the number of test grid rotations did have a significant effect on Tb.N (p < 0.0001). The largest relative difference was 5.8% between 16 and 128 rotations (Fig. 3). The solution time increased linearly with the number of rotations (Fig. 3). The variation in test grid spacing from 100 to 400 μm did not significantly affect the Tb.N for these images (p = 0.28; Fig. 3). The solution times decreased exponentially as the spacing was increased (Fig. 3).
The objective of this study was to determine whether the calculation of basic morphologic parameters was significantly affected by variations in the automated methods used to analyze 3D images of trabecular bone. We found that Tb.N was significantly affected by variations in the algorithm to determine whether a test line had intersected bone and the algorithm to classify a particular point as bone or marrow. As well, Tb.N was significantly affected by the number of test grid rotations. Variations in the spacing of the test grid lines did not affect Tb.N. These data, based on typical trabecular bone images and a statistically powerful experimental design, demonstrate that there can be significant differences in morphological parameters that are method-based and therefore, may confound comparisons of morphologic data from different laboratories.
Unfortunately, it is impossible to determine the correct algorithms or parameters because there is no gold standard with which to validate these techniques. As a result, our automated analysis program was developed based on a combination of validation experiments, practical considerations, and an appropriate compromise between accuracy and computational effort.
The most influential of the variations considered was the method by which intersections were counted. The Tb.N determined using the three-voxel method was an average of 52% less than the Tb.N determined using the two-voxel method. Based on the average BV/TV of 39.8%, this would result in a 13.6% increase in Tb.Th and a 20.6% increase in Tb.Sp. The resolutions used in this study are typical for microcomputed tomography (micro-CT)(11,17) and micro-MR(12,13) images of trabecular bone and are considered adequate for most human applications.(15) At these resolutions, however, some osteopenic specimens may have trabecular widths on the order of a single voxel. In those cases, the three-voxel method (which filters “noise” by eliminating isolated voxels along the test line path) would significantly underestimate the true number of intersections. Kuo and Carter(18) suggested using test lines that were 3 pixels wide to reduce noise when analyzing digital images. As with the three-voxel method, this technique sacrifices resolution and could result in artificially low intersection counts. Eliminating noise is more effectively and more appropriately handled using image processing techniques before performing morphologic analyses. For instance, Engelke et al. used a six-voxel neighborhood criteria to improve the signal-to-noise ratio of micro-CT images,(19) Chung et al. applied an m-sampling scheme to smooth trabecular edges in micro-MR images,(13) and Odgaard and Gunderson used a purification scheme to remove isolated voxels.(20) The typical resolution of our micro-MR images is 603 to 903 μm3/voxel, and, therefore, we use image processing techniques to eliminate noise and the two-voxel method to count intersections.
The determination of Tb.N was also influenced by the algorithm used to classify a point as bone or marrow and by the number of test grid rotations. The differences introduced by these variations were approximately 6% for Tb.N, which is equivalent to a 2–3% difference in Tb.Th and Tb.Sp for the average volume fraction of these specimens. These differences are smaller than those introduced by variations in the intersection counting method but are significant and may confound morphologic comparisons. Our code implements the integer method to determine whether a point is bone because, compared with the volume averaging method, the difference in morphology parameters is small but the saving in computational effort is substantial (over seven times faster). We use 128 random rotations of the test grid to be conservative and because it requires only slightly longer to analyze a typical image compared with 32 or 64 rotations. Random rotations were used instead of systematic rotations, because there is the danger with systematic sampling that the periodicity of the structure may coincide with that of the sampling lattice.(21) For a sufficiently large number of rotations, however, there should be a negligible difference between random and systematic rotations. The spacing of test grid lines did not influence the Tb.N calculation over the range of 100–400 μm. The Tb.Th of these specimens was on the order of 200 μm, so the spacings examined ranged from approximately 0.5 to 2 times Tb.Th. We have implemented a 200 μm spacing in our code as an appropriate compromise between the number of intersections required to ensure accuracy and computational effort. It is also generally consistent with the recommendations of Weibel,(21) as applied to 2D analysis of trabecular bone, and based on these data and scaling arguments, should be sufficient to analyze human bone.
Although the method-based differences we observed were statistically significant, the relative importance of these differences is dependent on their magnitude when compared with differences resulting from other sources. For instance, varying the intersection counting algorithm resulted in a 52% difference in Tb.N. This difference is greater than the difference in Tb.N that results from varying the threshold values of micro-MR images (14.5%)(22) or the change in Tb.N seen with osteoporosis (28%).(23) Variations in the classification algorithm and the number of test grid rotation resulted in a 6% difference in Tb.N which is comparable to the difference in Tb.N introduced by variations in micro-MR imaging sequences (1.3%)(22) and the change in Tb.N observed with aging (6%).(24) Thus, method-based differences can be as important as differences introduced by variations in image acquisition and processing or changes that occur with aging and disease.
The foremost limitation of this study is that we have analyzed bone only from bovine proximal tibiae. Stereological analyses are often conducted on human bone from sites such as the vertebrae or iliac crest; these sites can have very different volume fractions and microstructural organization from the bovine proximal tibia. In this study, however, the differences in Tb.N due to algorithm and parameter variations were independent of bone volume fraction. While similar method-based differences would be expected for bone specimens with microstructural organization different from the bovine proximal tibia, it is likely that the magnitude of the differences would be influenced by the specimen species and location. Additional validation studies are required to determine the precise magnitude of the changes that may result from variations in the methods implemented to analyze bone specimens from human biopsy sites or commonly examined sites in other animals. These studies are also necessary to confirm the utility of our computer program for analysis of human bone, since the methods implemented in the program have been selected based on validation studies with computer-generated grids (of known dimensions and designed to represent a range of trabecular structures) and bovine bone, but not human biopsy specimens.
Our analyses assumed a parallel plate model that was adequate for the bovine proximal tibia but may not be appropriate for all bone sites, particularly the iliac crest. We examined the effect of variations in the implementation of the directed secant method on the number of intersections of the test grid with the bone–marrow interface. This parameter was selected because model-based structural indicies, such as Tb.N, Tb.Th., Tb.Sp., and trabecular diameter, and measures of anisotropy, such as mean intercept length, are derived from intersection counts. It follows that if intersection counts are affected by the algorithm and parameter variations, then morphologic measurements derived from intersection counts will be affected, regardless of the stereological model. It is also likely that differences in the implementation of other quantification techniques such as Euler connectivity, star volume, and fractal analysis (see Compston(25) for a review) will cause significant differences in the results.
Another limitation is that we have considered only four algorithm and parameter variations. The algorithms and parameters we chose to vary were selected for two reasons. First, in developing our code, these algorithm and parameters appeared to be the most likely source of discrepancy between our method and the methods applied at other laboratories. Second, the trabecular bone stereology literature is deficient in providing methodology details for the algorithms examined in this study, particularly as applied to automated analysis of 3D images. Although we have examined only four specific variations, the basic conclusion that morphologic analyses can be confounded by method-based differences is true for other variations in algorithms or parameters. For instance, the absolute volume of trabecular bone analyzed can significantly affect the results.(19)
The effects of spatial resolution and segmentation methods were intentionally not addressed because they are dependent on the imaging modality. For instance, micro-CT images require unique thresholding strategies(11,17,19) that are very different from those required for micro-MRI(12,13,22,26) or serial optical images.(14) Differences introduced by imaging modalities, segmentation techniques, and spatial resolution can be substantial, as recently demonstrated by Majumdar et al.(22) and Engelke et al.(19) However, the purpose of this study was to examine morphological analysis methods that are applicable to a 3D binary image obtained by any means. Future comparisons of morphological data obtained from different laboratories could use a standard reference, such as the digital 3D model developed by Engelke et al.,(19) to identify discrepancies that may be solely due to the implementation of the morphologic analyses.
In conclusion, we have demonstrated that differences in automated morphologic analysis techniques can result in sizeable and statistically significant differences in basic parameters such as Tb.N, Tb.Th, and Tb.Sp. Method-based differences could confound the comparison of data between independent laboratories and, therefore, it is important for researchers to consider this potential source of error when analyzing morphologic data. To facilitate comparison between laboratories, we have developed a computer program to perform morphologic analyses of trabecular bone and have made it publicly available.
We acknowledge the support of a Medical Research Council of Canada Studentship (C.A.S.). We also thank Robert Goulet and Steven Goldstein of the Orthopaedic Research Laboratories, University of Michigan, for their discussions and collaborations.
The version of the software developed at the Orthopaedic Biomechanics Laboratory at the Beth Israel Hospital and Harvard Medical School is available through anonymous ftp. The program is written in C and implements the 3D morphological analysis methods described in this paper. The code can be retrieved by ftp from obl20.bih.harvard.edu (184.108.40.206) and logging in as anonymous. The code is contained in the directory pub/MORPH. The code is also available from the International Society for Biomechanics world wide web site (http://www.kin.ucalgary.ca/isb).