• Open Access

An RNA-seq protocol to identify mRNA expression changes in mouse diaphyseal bone: Applications in mice with bone property altering Lrp5 mutations

Authors


ABSTRACT

Loss-of-function and certain missense mutations in the Wnt coreceptor low-density lipoprotein receptor-related protein 5 (LRP5) significantly decrease or increase bone mass, respectively. These human skeletal phenotypes have been recapitulated in mice harboring Lrp5 knockout and knock-in mutations. We hypothesized that measuring mRNA expression in diaphyseal bone from mice with Lrp5 wild-type (Lrp5+/+), knockout (Lrp5–/–), and high bone mass (HBM)-causing (Lrp5p.A214V/+) knock-in alleles could identify genes and pathways that regulate or are regulated by LRP5 activity. We performed RNA-seq on pairs of tibial diaphyseal bones from four 16-week-old mice with each of the aforementioned genotypes. We then evaluated different methods for controlling for contaminating nonskeletal tissue (ie, blood, bone marrow, and skeletal muscle) in our data. These methods included predigestion of diaphyseal bone with collagenase and separate transcriptional profiling of blood, skeletal muscle, and bone marrow. We found that collagenase digestion reduced contamination, but also altered gene expression in the remaining cells. In contrast, in silico filtering of the diaphyseal bone RNA-seq data for highly expressed blood, skeletal muscle, and bone marrow transcripts significantly increased the correlation between RNA-seq data from an animal's right and left tibias and from animals with the same Lrp5 genotype. We conclude that reliable and reproducible RNA-seq data can be obtained from mouse diaphyseal bone and that lack of LRP5 has a more pronounced effect on gene expression than the HBM-causing LRP5 missense mutation. We identified 84 differentially expressed protein-coding transcripts between LRP5 “sufficient” (ie, Lrp5+/+ and Lrp5p.A214V/+) and “insufficient” (Lrp5–/–) diaphyseal bone, and far fewer differentially expressed genes between Lrp5p.A214V/+ and Lrp5+/+ diaphyseal bone. © 2013 American Society for Bone and Mineral Research.

Introduction

Low-density lipoprotein receptor-related protein 5 (LRP5) functions as an important coreceptor in the canonical Wnt signaling pathway. LRP5 loss-of-function and certain missense mutations significantly reduce or increase bone mass, respectively, in humans.[1-3] These skeletal phenotypes have been recapitulated in mice harboring knockout and knock-in Lrp5 mutations.[4, 5] Studies in these mouse models indicate that LRP5 signaling is involved in the ability of bone to respond to changes in mechanical load.[6-8]

Two endogenous inhibitors of canonical Wnt signaling, Sclerostin (SOST) and Dickkopf homolog 1 (DKK1), likely exert their inhibitory effects by binding to LRP5 and/or LRP6.[9-12] Missense mutations in LRP5 that cause increased bone mass appear to do so by impairing the binding between LRP5 and its inhibitors, without impairing the binding between LRP5 and its agonists (ie, Wnt ligands).[13-15] The pathways that function upstream and downstream of LRP5-mediated signaling are incompletely understood. We hypothesized that measuring mRNA expression in cortical bone from mice with Lrp5 wild-type (Lrp5+/+), knockout (Lrp5–/–), and high bone mass (HBM)-causing (Lrp5p.A214V/+) alleles could identify genes and pathways that regulate or are regulated by LRP5 activity.

Massively parallel sequencing (also referred to as next generation sequencing) of mRNA (RNA-seq) has quickly become a powerful alternative to measuring mRNA expression by microarray and quantitative RT-PCR.[16, 17] The advantages of RNA-seq are that it can cover a broad range of transcript abundance and can identify mRNA transcripts at a single nucleotide resolution.[18, 19] To date, the application of RNA-seq in the study of mineralized tissues is limited.[20-22] Therefore, despite its potential, RNA-seq has been underutilized to study skeletal biology in a small animal model, such as the mouse. Herein, we describe methods for extracting mRNA from mouse diaphyseal bone, constructing cDNA libraries for use in massively parallel sequencing, and analyzing resultant sequence data in order to identify differentially expressed transcripts among Lrp5+/+, Lrp5–/–, and Lrp5p.A214V/+ mice. Additionally, because it is not possible to completely remove nonskeletal tissue (e.g., skeletal muscle, bone marrow, blood) from a freshly excised long bone, we also provide an analytic strategy for robustly identifying and removing nonskeletal “contaminating” transcripts from RNA-seq data. These methods should aid other investigators who use mouse models to study skeletal biology and disease.

Subjects and Methods

mRNA library preparation

The Boston Children's Hospital Institutional Animal Care and Use Committee approved these studies. The mice used in this study have been described.[4, 5] Male, 16-week-old Lrp5+/+ (n = 4), Lrp5–/– (n = 4), and Lrp5p.A214V/+ (n = 4) mice were used. One animal at a time was euthanized by a 1-minute exposure to CO2. Cervical dislocation was then performed to assure death. Within 10 minutes of the animal's death, the pair of tibias was prepared for RNA extraction: Muscle, tendon, and ligament were removed with a scalpel. The distal and proximal epiphyses were excised and the diaphyseal bone marrow was removed by centrifugation at >15,000g for 1 minute at room temperature. The resultant hollow bone shafts were individually flash frozen in liquid nitrogen and stored at –80°C until further use.

RNA was recovered from each tibia sample by pulverizing the bone inside 2-mL RNase-free microtubes with metal beads and 1 mL TRIzol (Life Technologies, Grand Island, NY, USA) using a benchtop tissue homogenizer (FastPrep24; MP Biomedicals, Solon, OH, USA). Each tibia was subjected to four cycles of 30-second pulverization in the tissue homogenizer. Total RNA was then separated from bone mineral, DNA, and protein by phenol-chloroform extraction. The RNA containing fraction (400 µL per sample) was then purified (Purelink RNA Mini Kit; Life Technologies, Grand Island, NY, USA) and treated with DNase I (RNase-Free DNase Set; Qiagen, Valencia, CA, USA) for 15 minutes. The amount of recovered total RNA (eluted in 50 µL RNase-free water) was measured with a spectrophotometer (Nanodrop 1000; Thermo Scientific, Wilmington, DE, USA) and the quality of the total RNA was assessed with a Bioanalyzer (Model 2100; Agilent Technologies, Santa Clara, CA, USA). Each fresh frozen bone RNA sample had an RNA integrity number (RIN) > 6, indicating they were of sufficient quality to prepare sequencing libraries.[23]

An mRNA library for each individual tibia was prepared using the TruSeq RNA Sample Preparation Kit, v2 (Illumina, San Diego, CA, USA). Briefly, mRNA was enriched from the total RNA (starting material >250 ng) by performing polyA+ purification twice with magnetic beads. The recovered mRNA was then chemically fragmented to yield segments from 120 to 200 bp in length, reverse transcribed using random hexamers, and ligated to bar-coded adapters following the manufacturer's recommendations. The resultant cDNA fragments were amplified for 15 PCR cycles. Finally, the cDNA libraries were washed with AMPure XP beads to remove primer-dimers (Agencourt AMPure XP; Beckman Coulter, Brea, CA, USA) and 1-µL aliquots were run on a 4–20% TBE gel (Life Technologies, Grand Island, NY) for verification. Equal amounts of DNA (∼15 ng per sample) from eight to 10 separately bar-coded cDNA libraries (each generated from a single tibia sample) were pooled and then sent for 50-bp paired-end sequencing on one lane of an Illumina HiSeq 2000 to generate a minimum of 10 million reads/library. For the quantities such as buffer volumes or time and temperature settings not specified here, the exact recommendations of the manufacturers were followed at every step of the protocol.

Raw read processing

Paired reads from each library were mapped to the mouse genome (mm9) using RNA-Seq unified mapper (RUM).[24] The expression level of each gene was quantified by counting the number of reads that uniquely aligned to the respective set of exonic sequences. Data were then normalized with respect to the total number of mapped reads in each library and the dispersion in distribution of the reads to the genome.[25]

Assessing the reproducibility and reliability of RNA-seq data

We measured the reproducibility of the cDNA sequencing and data analysis platforms by splitting a cDNA library in half, sequencing each half, and determining Pearson's correlation coefficient (R2) in a comparison of the number of mapped reads to each gene. We measured the reproducibility in preparing bone samples, extracting mRNA, and generating cDNA sequencing libraries by making a separate library for each animal's right and left tibia and determining R2 for the resultant RNA-seq data. Due to the logarithmic distribution of reads in all libraries, a small fluctuation in a gene with a very high level of expression (e.g., left tibia versus right tibia fold change <1.5) could be much more influential than large fluctuations in a moderately expressed gene (e.g., left tibia versus right tibia fold change >5) on R2. In order to remove this dispersion bias, we recalculated R2 using a trimmed mean method; ie, the top and bottom 1% of genes were excluded in the assessment of all correlation values.

We also calculated reads per kilobase of exon model per million mapped reads (RPKM)[18] values for each gene in order to determine a detectability threshold: The number of reads mapped to the coding region of each gene was normalized by the total number reads mapped within the respective library and the length of the associated coding region. The elimination of potential biases due to gene length allowed us to compare the expression levels of different genes to each other, and rank the transcript abundances within a single library. Subsequently, we used RPKM = 5 as a minimum limit for detectability.[18] This meant that for a library containing 10 million mapped 50-bp paired-end reads we limited our analyses to genes averaging five or more reads/base of cDNA. However, we did not use RPKM values in our statistical comparisons (ie, when calculating fold-change and p values) between different sample groups (see the last two paragraphs in the Methods section for our differential expression analysis protocol).

Identification and filtering of transcripts representing RNA contamination from nonmineralized tissue

We used three strategies to identify transcripts that likely represented RNA contamination from sources other than bone.

Strategy 1

Using the same methods for RNA extraction and cDNA library preparation, we generated separate bar-coded libraries from blood (n = 6), skeletal muscle (n = 6), bone marrow (n = 6), and tibial cortical bone (n = 6) from three 12-week old male mice. The source material for all libraries were extracted from the right and left hind limbs of each mouse, except for blood, which was drawn from the heart and split in two. Following RNA-seq we calculated differential expression in order to identify the transcripts with at least twofold greater abundance in blood, muscle, or bone marrow compared to cortical bone (p < 0.05).

Strategy 2

We removed the femoral bones from these same mice, excised the ends, and centrifuged the marrow from the diaphyseal bone. We then cut the diaphyseal bones into smaller pieces and digested the fragments in a collagenase solution (Collagenase I&II; Worthington Biochemical Corp, Lakewood, NJ, USA) for 45 minutes (3 × 15 minutes at 37°C) in order to enrich for native bone cells.[26, 27] We then prepared a separate RNA-seq library for each bone. To control for gene expression changes that could result from collagenase digestion and/or from ex vivo tissue culture, we also prepared individual RNA-seq libraries from six diaphyseal femur samples (extracted from 16-week-old Lrp5+/+ male mice) that had been immediately frozen rather than collagenase-digested. Following RNA-seq, we identified the transcripts whose abundances were twofold lower in the collagenase-digested samples compared to the fresh-frozen samples (p < 0.05).

Strategy 3

We compared RNA-seq data from a single animal's right and left tibia and calculated R2 values. We then employed an algorithm that would calculate the individual effect of removing each individual species of RNA transcript on the R2, and identified the transcripts whose removal had the greatest effect.

In comparing these three strategies, we noted that most identified transcripts with the largest effect on R2 using strategy 3 were also identified using strategies 1 and 2. Furthermore, transcripts identified as representing the intersection of strategies 1 and 2 accounted for most of the reduction in R2 between paired bones from the same animal. Therefore, we filtered the diaphyseal bone RNA-seq data for transcripts representing the intersection of strategies 1 and 2 before we compared expression across mice with the different Lrp5 genotypes.

Identification of differentially expressed genes

Comparisons of RNA-seq data from different sample groups, including filtered bone libraries with different Lrp5 genotypes, were made with a Fisher's exact test, and the resultant p values were corrected for multiple hypothesis testing (false discovery rate < 0.05).[28] All calculations were made in R, using the Rsamtools,[29] GenomicFeatures,[30] and edgeR[25] subroutine packages.

After correcting for multiple hypothesis testing, we performed a series of leave-one-out cross-validation tests: In each comparison between RNA-seq data representing two genotypes (e.g., Lrp5+/+ mice n = 8 versus Lrp5–/– mice n = 8), one RNA-seq sample was removed and the test was repeated (n = 7 versus n = 8) in order to ensure that the excluded library was not solely responsible for the significance of the outcome. This procedure was repeated until the effect of every sample was elucidated. Then, the same protocol was repeated by removing paired tibia libraries from the same mouse at once (n = 6 versus n = 8), in order to prevent any bias due to a single animal. All genes that lost their significance at any point (p > 0.05) were removed from the list of differentially expressed genes.

Results

Reproducibility of RNA-seq data generation

Our overall strategy for generating RNA-seq libraries is depicted in Fig. 1. On average we obtained 19 million reads/library, with 18% of reads representing PCR duplicates, and 98% of reads successfully mapping (84% uniquely) to the annotated genome (Fig. 2A). The distribution of reads within individual genes was similar across libraries (Fig. 2B). When we split an individual RNA-seq library in half and sequenced each half separately we obtained R2 values >0.99. When we prepared duplicate separately bar-coded RNA-seq libraries from paired sources within the same animal we obtained R2 values >0.97 for bone marrow and R2 values ranging between 0.83 and 0.99 for unfiltered paired cortical bone samples (Fig. 2C, D). We assumed that differences in the extent of residual skeletal muscle, blood, and bone marrow within paired bone samples accounted for their reduced R2 values. Therefore, we tested this hypothesis by (1) identifying transcripts that are more highly expressed in skeletal muscle, blood, and bone marrow compared to cortical bone; (2) identifying transcripts that are lower in collagenase-digested bone compared to fresh-frozen bone; and (3) identifying transcripts whose in silico removal has a substantial effect on R2 values when the right and left tibias from the animal with the lowest correlation (R2 = 0.83) are compared (Fig. 2E).

Figure 1.

Strategy for generating tibia diaphyseal bone cDNA libraries for RNA-seq. Both tibias were prepared within 10 minutes of euthanasia. Skeletal muscle and the metaphyseal/epiphyseal regions were removed with a scalpel and the marrow was removed by centrifugation. Bone shafts were flash frozen in liquid nitrogen and subjected to pulverization in TRIzol for RNA extraction. mRNA was enriched using oligo-dT coated magnetic beads and then chemically fragmented into 120- to 200-nucleotide-long segments prior to random hexamer primed reverse transcription. Bar-coded sequencing adapters were ligated to the cDNA fragments and the indexed cDNA was subjected to 15 cycles of PCR amplification. Each library was analyzed by gel electrophoresis to confirm that the correct fragment size range had been achieved (note the >200-bp fragments include the sequencing adaptors). Libraries constructed from 10 different mouse tibias are depicted here. Eight to 10 separately bar-coded libraries were pooled and loaded onto a single lane of an Illumina HiSeq2000 machine and 50-bp paired-end sequencing was performed.

Figure 2.

RNA-seq metrics and reproducibility. (A) Table containing average values for tibia diaphyseal bone RNA-seq data from mice with the three different Lrp5 genotypes (n = 8 libraries/genotype). Columns indicate the mouse genotype from which the RNA-seq data were obtained, the number of paired-end reads/library, the percentage of reads that mapped to the mouse genome, the percentage of reads that mapped to unique regions within the mouse genome, and the percentage of reads that likely represent PCR duplicates created during library preparation. (B) Graphs depicting the distribution of RNA-seq data across two representative genes (Mepe and Dmp1). The read depth relative to the maximum read depth is graphed in alignment to each gene's genomic DNA sequence. Thick bars below the plots indicate the positions of the exons, and the thin lines indicate the positions of the introns. Peaks overlap the exon-containing regions of each gene, consistent with this being RNA-seq data. Note the high degree of similarity with respect to RNA-seq coverage across the genes independent of the animals' Lrp5 genotypes. (C) Scatter-plot comparing the total number of uniquely mapped reads for RNA-seq data obtained from the right and left tibias of a mouse. Each dot represents an individual gene. Higher-abundance transcripts are closer to the upper right and lower-abundance transcripts are closer to the lower left of this plot. Examples of bone-expressed genes are identified by unique symbols and arrows. The Pearson correlation coefficient (R2) between the right and left tibia RNA-seq data is 0.99 for this sample pair. (D) Scatter-plot comparing total number of uniquely mapped reads for RNA-seq data obtained from the right and left tibias of another mouse. Note that the Pearson correlation coefficient (R2) between the right and left tibia RNA-seq data is 0.83 for this sample pair, although the bone-expressed genes appear to follow the y = x line. (E) Graph depicting the effect of sequentially filtering transcripts whose removal cause the greatest increases in correlation between the right and left tibia. The inset indicates the top 10 genes whose removal from the RNA-seq data depicted in D had the greatest positive effect on R2. Note that these genes are highly expressed in muscle and blood, suggesting the reduced correlation in the tibia pair depicted in D resulted from non-bone tissue contamination.

RNA-seq data obtained from blood, skeletal muscle, bone marrow, and diaphyseal bone

We observed large differences in the transcriptomes between tissues. Using an RPKM value of 5 as the lower boundary for reliable detection of gene expression, ∼5000 genes exceeded this threshold for blood and skeletal muscle cDNA libraries, whereas ∼9000 genes exceeded this threshold for bone marrow and diaphyseal bone cDNA libraries (Supplementary Table 1) (Note: Supplementary Tables 1 to 8 are available online for this article; a spreadsheet editor, e.g., Microsoft Excel, would be the ideal tool for visualization and review of these files). The distribution of expressed transcripts within the libraries was least diverse in blood, where 52% of sequence reads corresponded to the four hemoglobin genes (Hba-a1, Hba-a2, Hbb-b1, and Hbb-b2). These four genes represented only ∼0.2% of the RNA-seq reads from diaphyseal bone. Similarly, highly expressed skeletal muscle transcripts Acta1, Tnnc2, and Mylpf together accounted for ∼5% of mapped sequence reads in the skeletal muscle libraries, and only ∼0.3% of reads in the diaphyseal bone libraries; these data suggest that less than 10% of RNA-seq data in diaphyseal bone represent tissue contamination. Conversely, transcripts that are highly expressed by osteoblasts and osteocytes, such as Col1a1, Col1a2, and Spp1, represented ∼1.4%, ∼2.7%, and ∼0.9% of the diaphyseal bone RNA-seq data, respectively. When we compared transcript abundance between the libraries, we identified ∼6000 genes that were expressed at least twice as abundantly in blood, skeletal muscle, and/or bone marrow compared to diaphyseal bone (Supplementary Table 1). Of these ∼6000 genes, 2381 genes were only in the skeletal muscle libraries and 609 genes were in both the blood and bone marrow libraries (Fig. 3A,B). Because we estimated that less then 10% of diaphyseal bone RNA-seq data likely results from tissue contamination, we assumed that filtering all transcripts with twofold higher expression in the contaminating tissues compared to bone would be too stringent. This is the case, because removing all reads representing these ∼6000 genes reduced the diaphyseal bone RNA-seq data by 43%.

Figure 3.

Identification of contaminating transcripts in diaphyseal bone RNA-seq data. (A) Venn diagrams indicating the intersection of genes that have twofold or greater expression (p < 0.05) in skeletal muscle, bone marrow, or blood compared to tibia diaphyseal bone and that have demonstrated a significant reduction in transcript abundance in collagenase-digested diaphyseal bone. The percentages of tissue-specific RNA-seq data accounted for by these genes are also indicated. Color-coding represents in heat-map format (see scales on right) the average number of mapped reads/gene in each sector. This is determined by dividing the total number of reads that mapped to genes in this sector by the number of genes in this sector. For example, 2914 (2139 + 775) genes are expressed at least twice as abundantly in muscle compared to bone and 1000 (775 + 225) genes had significant reductions in transcript abundance following collagenase digestion. The intersection of these two data sets comprises 775 genes. These 775 genes account for 53% of all mapped reads in the skeletal muscle RNA-seq data. The average gene in this group of 775 has ∼17,000 reads (∼13,000,000/775) mapping to it in the muscle RNA-seq libraries. (B) Venn diagram indicating the distribution of ∼6000 genes that were twice as abundantly expressed in skeletal muscle, bone marrow, and/or blood compared to tibia diaphyseal bone. (C) Scatter-plot in log10 scale indicating the fold changes in transcript abundance when RNA-seq data from collagenase-digested bone is compared to RNA-seq data from fresh-frozen bone. Circles represent individual genes and genes having statistically significant changes in abundance (p < 0.05) are colored blue. Note genes that are highly expressed in skeletal muscle (red symbols) decrease significantly (p < 0.001), whereas genes that are highly expressed in bone (green symbols) have less pronounced increases. The large increases in transcript abundance for other genes (orange symbols) likely represent collagenase-induced changes in gene expression. (D) Box-plots indicating the ranges of trimmed-mean-normalized Pearson's correlation coefficients (R2) between paired tibias from individual animals (contralateral pairs), and between all pairs of tibias representing mice with the same Lrp5 genotype. Correlations before and after in silico transcript filtering of ∼900 genes are shown; note that filtering increases R2 for all data comparisons (p < 0.01).

Enzymatic removal of contaminating tissue from diaphyseal bone

There is a tradeoff with regard to processing bone specimens quickly in order to preserve RNA integrity and dissecting the specimens thoroughly in order to remove contaminating tissues, such as muscle, blood, and bone marrow. One approach for removing contaminating tissues relies on ex vivo tissue culture in medium containing collagenase.[26, 27] We compared RNA-seq data obtained from fresh-frozen diaphyseal bone samples to RNA-seq data obtained from samples that had been treated with collagenase. We found that the transcript abundance of ∼1000 genes was at least twofold lower in collagenase-digested bone compared to fresh-frozen bone (Fig. 3C; Supplementary Table 2). Most of these genes (∼900) were more highly expressed in the skeletal muscle, blood, and/or bone marrow libraries compared to the diaphyseal bone libraries (Supplementary Table 1). As would be expected when contaminating tissues are enzymatically removed, the relative abundance of bone cell (e.g., osteoblast, osteocyte, and osteoclast) transcripts increased (Fig. 3C; Supplementary Table 3). However, more than 700 genes exhibited differences in abundance, ranging from twofold to 80,000-fold, when collagenase-digested bone and fresh-frozen bone were compared. The increased abundance of these transcripts far exceeds the increase expected from simply removing contaminating tissue. For example, transcripts for osteoinductive factors, such as Egr1,[31, 32] Il-6,[33] and Cxcl2[34] were 522-fold, 6277-fold, and 11,799-fold more abundant, respectively, and components of the transcription factor Ap-1 complex,[35, 36] such as Fos, Fosb, Jun, and Atf3, were 82,070-fold, 4394-fold, 96-fold, and 1775-fold more abundant, respectively, in collagenase-digested bone compared to fresh-frozen bone (Fig. 3C; Supplementary Table 3). These data suggest that short-term ex vivo culture and/or collagenase digestion in addition to removing contaminating tissues also alters gene expression in the remaining cells.

Improved in silico filtering of contaminating transcripts from diaphyseal bone RNA-seq data

All the transcripts whose abundance was lower in collagenase-digested bone are expressed in skeletal muscle, blood, and bone marrow. Therefore, we used these genes to generate one set of transcripts that could be considered “contaminating” and filtered in silico from fresh-frozen diaphyseal bone RNA-seq data. We then intersected this set with another set of transcripts that represent genes whose abundances are at least twofold higher in skeletal muscle, blood, and bone marrow, compared to diaphyseal bone. This intersecting set accounted (Supplementary Tables 1 and 4) for ∼53% of the entire set of mapped reads in the skeletal muscle libraries, ∼2% of the mapped reads in blood libraries, ∼4% of the mapped reads in bone marrow libraries, and ∼9% of mapped reads in diaphyseal bone libraries, leading us to conclude that this set is highly enriched for contaminating transcripts in the diaphyseal bone libraries. Further support for this conclusion derives from the significant increases in R2 between paired tibias from individual mice, and from randomly paired tibias from mice with the same Lrp5 genotype, when the RNA-seq reads representing this set of genes were removed from the analyses (Fig. 3D). Therefore, this intersecting set of genes (Supplementary Table 4) was computationally filtered from the diaphyseal bone RNA-seq data before data were compared between mice with different Lrp5 genotypes.

Gene expression differences between Lrp5+/+, Lrp5–/–, and Lrp5p.A214V/+ mice

We compared in silico–filtered RNA-seq data obtained from pairs of tibia diaphyseal bone mRNA representing Lrp5+/+, Lrp5–/–, and Lrp5p.A214V/+ mice (n = 4 pairs of tibias/genotype). We controlled for differences in gene expression that could result from one outlier bone sample (ie, leave-one-bone out) and one outlier mouse (ie, leave-both-bones-out). We excluded genes whose expression levels would be too low to yield meaningful differences (ie, RPKM values <5 in both genotypes being compared) and adjusted our significance thresholds to account for multiple hypothesis testing (false discovery rate < 0.05) (Fig. 4). Using this strategy we identified 302 differentially expressed genes between Lrp5–/– and Lrp5+/+ (Supplementary Table 5), 166 differentially expressed genes between Lrp5–/– and Lrp5p.A214V/+ (Supplementary Table 6), and 28 differentially expressed genes between Lrp5p.A214V/+ and Lrp5+/+ (Supplementary Table 7) mouse diaphyseal bone.

Figure 4.

Overview of data filtering and analysis pipeline. A list of contaminating transcripts (n = ∼900) was compiled based upon tissue-specific expression profiling and transcript abundances in collagenase-digested versus fresh-frozen bone samples. Transcripts representing the genes on this list were in silico–filtered from the diaphyseal bone RNA-seq data prior to downstream analyses. In each statistical comparison, p values were computed with Fisher's exact test (with significance set at p < 0.05) and corrected for multiple hypothesis testing (false discovery rate < 0.05), grouped samples were subjected to leave-one-out cross validation, and genes expressed below the detectability threshold (RPKM < 5) were eliminated. The numbers of differentially expressed (DE) genes remaining after each step are noted. Last, predicted genes, noncoding RNAs, and microRNAs that overlapped with the introns of highly expressed protein coding genes were eliminated in order to ensure that the reads mapped to these regions due to incomplete transcription events were not registered in differential expression calculations. A comparison between RNA-seq data from the tibia diaphyseal bones of LRP5 sufficient (Lrp5p.A214V/+ and Lrp5+/+) and insufficient (Lrp5–/–) mice yielded 84 differentially expressed protein-coding genes.

Interestingly, there was substantial overlap between genes that were differentially expressed in Lrp5–/– mice compared to either Lrp5+/+ or Lrp5p.A214V/+ mice. This overlapping list contains 84 protein-coding genes (Table 1; Supplementary Table 8), which likely represents the consequence of “insufficient” versus “sufficient” LRP5-mediated Wnt signaling. Lrp5 serves as an important positive control because it is knocked out in the Lrp5–/– mice. As expected, Lrp5 was the most significantly reduced transcript when Lrp5–/– bone was compared to Lrp5+/+ or Lrp5p.A214V/+ bone (p < 1 × 10−88). Also demonstrating significantly reduced expression (p < 0.001) in diaphyseal bone from Lrp5–/– mice were several collagens (Col1a1, Col1a2, and Col11a2), osteocalcin (Bglap and Bglap2), and the zinc binding proteins (Mt1 and Mt2). Two genes predicted to increase canonical Wnt signaling, Wnt10b (a Wnt ligand) and Fzd4 (a Wnt coreceptor), exhibited significantly increased expression (p < 0.001), whereas an inhibitor of the canonical Wnt pathway Apcdd1 exhibited significantly decreased expression (p < 0.001). Interestingly, we detected no significant changes in the expression levels of other Wnt pathway components, including Sost, Dkk1, Lrp4, and Lrp6, each of which can significantly affect bone mass in humans and/or mice.[10, 37, 38]

Table 1. The Top 30 Genes (Based on p Value) That Exhibited Significant Changes in Gene Expression Between LRP5-Insufficient (ie, Lrp5–/–) and LRP5-Sufficient (ie, Lrp5p.A214V/+ and Lrp5+/+) Diaphyseal Bone
Gene symbolFold changepWT RPKMWT rankReferences
  1. Each gene's RPKM value and abundance relative to other genes in the Lrp5+/+ diaphyseal bone transcriptome are also indicated.
  2. LRP5 = low-density lipoprotein receptor-related protein 5; RPKM = reads per kilobase of exon model per million mapped reads; WT = wild-type.
Lrp50.021.24E–8926.51921[1-4]
Mt20.134.67E–2935.91340[46, 47]
Zfp4456.717.57E–262.811878[48]
Cpz0.161.08E–2411.64755[49, 50]
Slc13a50.179.76E–2464.1634[51]
Cyp2f20.193.25E–2122.32334[52]
Col1a10.231.48E–171245.318[53]
Prss350.241.16E–1534.61403[54]
Col11a20.252.14E–1594.4402[55-57]
Dapk23.911.83E–1411.94611[58]
Cfhr23.874.23E–142.612272[59]
1810055G02Rik3.801.47E–1315.13623 
Gm144343.761.92E–134.110141 
Mt10.288.59E–1323.22234[46, 47]
Cfd0.318.85E–12103.1353[60]
Bglap20.313.94E–11339.089[61]
Cd5l0.294.33E–119.75599[62]
Mdm43.318.07E–115.78429[63, 64]
Phf20l13.211.83E–104.110084[65]
Il340.326.31E–1011.44807[66, 67]
Epor0.342.54E–0917.13148[68]
Col1a20.353.18E–09981.325[53]
Cyr610.343.51E–0972.1538[69, 70]
Angptl40.344.33E–0910.25323[71]
Npy0.361.63E–08127.1283[72, 73]
Plin10.361.88E–0814.53774[74]
Eya42.833.51E–082.412598[75]
Ighg30.376.30E–0851.6836[76]
Bglap0.388.22E–08261.7123[61]
Bcr0.381.02E–078.76154[77]

Lrp5 also serves as an important positive control for the RNA-seq data involving Lrp5p.A214V/+ mice. This mouse strain was intended to contain a true knock-in mutation, so the expression of the mutant allele should be equal to the expression of the wild-type allele in heterozygous mice. Because RNA-seq provides resolution at the single-nucleotide level, we were able show that there was no significant difference in expression between the Lrp5p.A214V and Lrp5+ allele (83 reads versus 107 reads; p = 0.18) in the RNA-seq data from Lrp5p.A214V/+ mouse bone. We found only 28 genes whose expression differed significantly between Lrp5p.A214V/+ and Lrp5+/+ mice (Supplementary Table 7) and would not have predicted any of these genes to have had altered expression a priori.

Discussion

Massively parallel sequencing of RNA from cells and tissues is increasingly being used to identify the repertoire of expressed mRNAs, their various splice-forms, and their changes in expression following genetic, pharmacologic, and environmental manipulation. RNA-seq has several advantages and disadvantages compared to other expression profiling strategies. An advantage of RNA-seq over array-based hybridization technologies, is the former's ability to (1) detect a larger range of transcripts and transcript abundances without the need for a priori target specification, (2) identify alternative transcription start sites and splice-forms, and (3) quantify allele-specific expression at single-nucleotide resolution. For example, in our RNA-seq data from diaphyseal bone, we detected ∼9000 mRNA species, known alternative transcription start sites and splice-forms for genes such as Dcn and Bglap (data not shown), and we were able to show that Lrp5p.A214V/+ mice express the knock-in and wild-type alleles equivalently in their diaphyseal bone cells.

Both RNA-seq and array-based hybridization can be limited in the accuracy with which they quantify changes in gene expression. One reason for inaccuracy derives from the need to perform a PCR amplification step during the cDNA sample preparation process. A transcript's DNA sequence (e.g., G/C content) and initial abundance can affect the ability of the PCR reaction to increase that transcript's abundance in proportion to all other transcripts in the sample. For example, high G/C content transcripts typically amplify inefficiently whereas highly abundant transcripts tend to amplify more efficiently.[39-41] Therefore, PCR amplification-induced inconsistency could lead to false-positive differences in gene expression. Expression profiling technologies that do not rely on amplification steps are available,[42, 43] but they cannot yet simultaneously quantify expression levels for large numbers of transcripts. Therefore, although imperfect, our RNA-seq strategy does appear capable of quantifying mRNA expression levels with a high degree of accuracy. For example, when we made two independent libraries from paired bones from the same animal, they exhibited high correlations in their rank order of gene expression (range, 0.96–0.97).

Additional challenges exist when RNA-seq is applied to organs and tissues in contrast to its application on homogeneous populations of cells in culture. In diaphyseal bone, osteocytes have a low cell/matrix ratio because they are surrounded by abundant extracellular matrix. Therefore, small amounts of contamination by tissues whose cell/matrix ratios are high, such as muscle and bone marrow, can significantly affect the RNA-seq data. For example, when RNA-seq data are compared between paired bones from the same animal, the transcripts whose expression levels were most discrepant between the specimens predominantly represented genes that are abundantly expressed in muscle (e.g., Atp2a1 and Myh4) and blood and bone marrow (e.g., Hbb-b1). Preparing diaphyseal bone samples more carefully would lessen the problem of contamination, but could itself induce changes in gene expression due to the extra time the specimen is kept ex vivo. One proposed strategy for reducing sample contamination by muscle, blood, and bone marrow, involves a short-term (∼1 hour) ex vivo digestion of bone specimens with collagenase.[26, 27] We confirmed that this approach reduces the abundance of many muscle, blood, and bone marrow–derived transcripts in diaphyseal bone RNA-seq libraries, presumably by enzymatically removing the contaminating tissues. However, we also found that this approach increased the expression of many transcripts, some by as much as 80,000-fold. Consequently, whereas collagenase digestion may remove contaminating tissues it can also alter gene expression in the remaining cells. As an alternative to collagenase treatment, we developed a numerical filtering strategy for eliminating transcripts that likely originate from contaminating tissue. We identified genes (Supplementary Table 4) whose expression is (1) higher in skeletal muscle, blood, and bone marrow, compared to bone; and (2) whose transcript abundance dropped significantly following collagenase digestion. By in silico–filtering diaphyseal bone RNA-seq data for these genes, we significantly improved correlation between paired samples from the same animal (Fig. 3). This method of numerically controlling for tissue contamination during the specimen preparation process also increased the consistency of correlations between animals with the same genotype. Therefore, this numerical filtering method should be useful to all investigators who perform RNA-seq on freshly processed diaphyseal bone specimens.

RNA-seq can detect a large range of transcript abundances. However, detecting significant differences in transcript abundance depends upon the frequency with which a transcript is detected in RNA-seq data; as a consequence, RNA-seq is less sensitive at detecting differences in low-abundance transcripts. Obtaining larger total numbers of RNA-seq reads will increase sensitivity for low-abundance transcripts and for detecting more subtle fold-changes in gene expression among adequately expressed transcripts. We pooled eight to 10 separately bar-coded RNA-seq libraries in order to obtain >10 million uniquely mapping reads/library/sequencing run. At current pricing, the material cost of generating this RNA-seq data was ∼$400/library. With this amount of data, we did not expect to reliably detect differences in gene expression for transcripts whose RPKM values were less than five. Therefore, for genes like Axin2 (RPKM < 1), a transcriptional target of Wnt signaling whose expression level is sensitive to mutations in Lrp5,5 we were unable to use RNA-seq to independently confirm this result. In addition, 10 million reads/library is not sensitive enough to accurately detect less than a 1.7-fold change in gene expression. Increasing the number of reads/library would improve the sensitivity of RNA-seq at detecting low-abundance transcripts and lower-fold changes in gene expression across samples, but this would also significantly increase cost.

Signaling via LRP5 clearly affects bone mass.[4-7, 44] Several missense mutations, including LRP5p.A214V, cause an HBM phenotype[1, 45] and LRP5 loss-of-function mutations cause very low bone mass.[2] We performed RNA-seq on diaphyseal bone samples from mice with three different Lrp5 genotypes (ie, Lrp5+/+, Lrp5p.A214V/+, and Lrp5–/–) to identify changes in gene expression that could inform us about pathways that function upstream and downstream of the LRP5 receptor. Despite the profound effect the Lrp5p.A214V allele has on mouse bone mass (ie, a fourfold increase in trabecular bone volume/total volume and a 1.5-fold increase in mid-diaphyseal cortical bone area), we observed few greater than twofold differences in gene expression between Lrp5p.A214V/+ and Lrp5+/+ mice (Supplementary Table 7) and no changes in expression that appear to represent “smoking guns.” It is possible that we would have found more differences in gene expression had we studied younger mice when they are more actively increasing bone mass. Alternatively, enhanced LRP5 signaling from the Lrp5p.A214V allele may only cause changes in gene expression that are below the threshold of detection in our experiment. For example, we cannot detect a ∼1.01-fold increase or decrease in gene expression. However, a ∼1.01-fold daily increase in bone mass accrual would be sufficient to explain the fourfold increase in trabecular bone volume in Lrp5p.A214V/+ mice compared to wild-type mice by 16 weeks of age.

Interestingly, we observed 84 differentially expressed protein-coding genes when RNA-seq data from either Lrp5p.A214V/+ or Lrp5+/+ mice were compared to data from Lrp5–/– mice. Because the Lrp5p.A214V allele is able to transduce canonical Wnt signaling as efficiently as the Lrp5+ allele, differences in gene expression likely result from deficient LRP5-mediated Wnt signaling. Genes whose protein products contribute to the synthesis of bone's extracellular matrix (e.g., Col1a1, Col1a2, Bglap, Bglap2, Mt1, and Mt2) had reduced abundance in Lrp5–/– mice, as did the Wnt pathway inhibitor Apcdd1. Lrp5–/– mice also had increased expression of Wnt10b, a Wnt ligand that is anabolic in bone, and the Wnt coreceptor Fzd4, which can transduce Wnt signal via LRP5 or LRP6. These data suggest that in the absence of LRP5, bone cells are less able to produce critical matrix components. In addition, the cells alter their expression of specific Wnt pathway components in an attempt to increase Wnt signaling. Also, within the list of genes whose expression levels differ between LRP5-sufficient and LRP5-insufficient mice are genes with unknown function (e.g., Zfp445). These genes may now be considered to be targets of LRP5-mediated signaling and candidates for affecting bone properties.

Disclosures

All authors state that they have no conflicts of interest.

Acknowledgments

This work was supported by NIAMS/NIH R01AR053237 (to AGR), NIAMS/NIH R21AR062326 (to MLW), the Osteogenesis Imperfecta Foundation (to CMJ), and the Howard Hughes Medical Institute (to MLW and CES). We thank Joshua Levin at the Broad Institute and Victor Johnson at the Boston Children's Hospital for thoughtful discussions.

Authors' roles: UMA contributed to experimental design, data collection, data analysis, and writing the first draft of the manuscript. CMJ contributed to data collection. DCC, JG, JGS, CES, and AGR contributed to experimental design. MLW contributed to experimental design, data analysis, and writing the first draft of the manuscript. All authors reviewed and approved the final version of the manuscript. MLW accepts responsibility for the integrity of the data analysis.

Ancillary