• gene expression;
  • non-model organism;
  • reference gene;
  • salinity;
  • spermatogenesis;
  • tilapia


  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data accessibility
  10. Supporting Information

The black-chinned tilapia Sarotherodon melanotheron heudelotii is an ecologically appealing model as it shows exceptional adaptive capacities, especially with regard to salinity. In spite of this, this species is devoid of genomic resources, which impedes the understanding of such remarkable features. De novo assembly of transcript sequences produced by next-generation sequencing technologies offers a rapid approach to obtain expressed gene sequences for non-model organisms. It also facilitates the development of quantitative real-time PCR (qPCR) assays for analysing gene expression under different environmental conditions. Nevertheless, obtaining accurate and reliable qPCR results from such data requires a number of validations prior to interpretation. The transcriptome of S. melanotheron was sequenced to discover transcripts potentially involved in the plasticity of male reproduction in response to salinity variations. A set of 54 candidate and reference genes was selected through a digital gene expression (DGE) approach, and a de novo qPCR assay using these genes was validated for further detailed expression analyses. A user-friendly web interface was created for easy handling of the sequence data. This sequence collection represents a major transcriptomic resource for S. melanotheron and will provide a useful tool for functional genomics and genetics studies.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data accessibility
  10. Supporting Information

Inland water ecosystems are subjected to natural, seasonal and between-year variations in climate. Depending on their nature, duration and magnitude, these variations have contributed to the evolution of physiological adaptations in fish species. These adaptations consist in modifications of life-history traits such as growth, age at sexual maturity (Stearns & Crandall 1984; Stewart 1988; Duponchelle & Panfili 1998), fecundity (Legendre & Ecoutin 1989; Duponchelle et al. 2000) or trophic demand (Ogari & Dadzie 1988). In this context, a better delimitation of the adaptive capacities of species and a deeper understanding of their inner mechanisms are tremendously needed to determine the threats upon these species. This issue is particularly overwhelming in the field of reproductive biology, which has a more straightforward impact on fitness than any other biological function.

Species that already perform well or are tolerant to a broad range of environmental conditions are thus excellent templates for investigating the responses to such fluctuations. In this regard, the black-chinned tilapia Sarotherodon melanotheron heudelotii Rüppell 1852 (Teleostei, Cichlidae) is supposedly one of the record holders, as it has been reported to reproduce at salinities ranging from 0 to 120 psu (Panfili et al. 2004, 2006). This tilapia is a mouthbrooding fish in which the males pick up the fertilized eggs and incubate them until they are released as free-swimming fry. In addition, the black-chinned tilapia S. melanotheron is an excellent model for studying the plasticity of reproductive traits as (i) it shows a remarkable adaptation to salinity and is, to our knowledge, the most plastic fish in this respect; (ii) natural populations occur in many different habitats from freshwater to hypersaline waters; (iii) in culture conditions, it is capable of spawning spontaneously and has brief and frequent reproduction cycles all year round; (iv) it has a relatively small size, and it is thus easy to maintain adult fishes in different controlled conditions.

Using suppressive subtractive hybridization from gills of S. melanotheron, Tine et al. demonstrated how salinities impacted the expression of a small number of genes involved in osmotic homoeostasis and energy metabolism (Tine et al. 2008, 2012), thereby highlighting a plastic regulation of gene expression in the gills. If the plasticity of the reproductive traits of S. melanotheron induced by salinity is now acknowledged (Panfili et al. 2006; Legendre et al. 2008), the underlying biological processes are still poorly understood, mainly because of the lack of genomic resources available for this species. This study aimed at filling this gap, by generating a large transcript sequence collection. In non-model organisms for which there are no or limited genomic resources, next-generation sequencing (NGS) represents a valuable tool for characterizing genes involved in particular biological functions or traits (Wang et al. 2009; Fraser et al. 2011). Once a reference transcriptome is available, tag-based sequencing or digital gene expression (DGE), represents a sensitive and cost-effective alternative for gene expression profiling of specific phenotypes or adaptive traits (t Hoen et al. 2008; Hong et al. 2011). Then, real-time PCR (qPCR) remains the simplest and probably the most accurate method to substantiate quantitative data derived from NGS. Yet, obtaining, analysing and interpreting qPCR data are not a trivial issue, and require a thorough validation of every step of any de novo assay design (Bustin et al. 2010). Therefore, the present article describes not only the development of an important transcriptomic resource for S. melanotheron, but also the detailed validation of a set of candidate and reference genes that will enable in-depth expression studies on both wild and experimental fish populations, complying with the MIQE (Minimum Information for publication of Quantitative real-time PCR Experiments) guidelines (Bustin et al. 2009).

Material and methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data accessibility
  10. Supporting Information

Fish sampling

Natural populations of S. melanotheron heudelotii were sampled in Senegal during the dry season (May 2010) at three locations along the salinity gradient of the Sine Saloum estuary, namely Missirah (40 psu), Foundiougne (53 psu) and Kaolack (95 psu). Adult fish were caught using a cast net and then anaesthetized in icy water. Size (fork length) and weight were measured and the sex determined for each fish. Animals were dissected and a portion of both liver and gonad was immediately immersed in a tube containing 10–20 volumes of RNAlater (Ambion) and placed on ice. Tubes were maintained at 4 °C all along the field campaign (3 days) and stored at −20 °C upon arrival to the laboratory. The stage of sexual maturity was determined macroscopically according to Legendre and Ecoutin (Legendre & Ecoutin 1989). A total of 10 males and 10 females were collected from each station.

RNA extraction

RNA was extracted with the Nucleospin-8 total RNA isolation kit (Macherey-Nagel). Fifteen to twenty mg of tissue preserved in RNA later (Ambion) was weighed and transferred into 2-mL tubes containing a 5-mm steel bead (Qiagen) as well as 360 μL lysis buffer supplemented with 1% β-mercaptoethanol (Sigma-Aldrich). Tissues were homogenized with a tissue lyzer (Qiagen) for 2 min at 50 Hz. Tubes were then centrifuged for 5 min at 20 000 g, and the supernatants were transferred to new tubes and kept at −20 °C overnight. RNA was extracted the following day according to the manufacturer's instructions, using a Janus automated Workstation (Perkin Elmer), and eluted in 70 μL RNase-free H2O. To remove any trace of contaminating genomic DNA, RNA eluates were subjected to a second DNase treatment. Briefly, a mix of 0.2 μL RNase-free DNase and 2 μL of reaction buffer (Macherey-Nagel) was added to 20 μL of each RNA eluate, and digestion was carried out for 15 min at 37 °C. RNA quantity was measured by UV spectrophotometry (Nanodrop 1000, Thermoscientific), and its integrity was verified by capillary electrophoresis (Agilent Bioanalyzer 2100). Only samples displaying an RNA integrity number (RIN) ≥8 were used for subsequent analyses. Each RNA sample was diluted to a concentration of 50 ng/μL in H2O and stored at −80 °C.

RNA-seq library

Construction and sequencing

A large transcript library was generated from both liver and gonads of wild fish sampled in the Sine Saloum estuary. RNA from liver and gonads of 18 males and 16 females (~5–6 fish per salinity location) was mixed in an equimolar way. Five μg of this RNA mixture was used as template for the construction of a cDNA library, using the Illumina mRNA-Seq Paired-End kit with several modifications. In brief, polyA-containing mRNA molecules were fragmented for 5 min to yield fragments of ~250 bp. Second-strand cDNA was synthesized and further subjected to end repair, A-tailing and adapter ligation in accordance with the manufacturer supplied protocols. Purified cDNA templates were enriched by 15 cycles of PCR for 10 s at 98 °C, 30 s at 65°C and 30 s at 72 °C using PE1.0 and PE2.0 primers and with Fastart taq DNA polymerase (Roche). The samples were cleaned using QIAquick PCR purification columns and eluted in 30 μL elution buffer. The purified cDNA library was quantified using Bioanalyzer DNA 100 Chips (Agilent Technology 2100 Bioanalyzer). Cluster generation was performed by applying 4 pM of cDNA to an Illumina 1G flowcell. Hybridization of the sequencing primer, base incorporation, image analysis and base calling was carried out using the Illumina Pipeline.

Contig assembly and functional annotation

Analysis and assembly of the RNA-seq library, which consisted in 50-bp paired-end sequences, were performed by Skuldtech Company ( A first assembly was carried out using Velvet 1.0.09 (k-mer = 41). Sequences were then assembled into clusters using MIRA version 3.1. Overlapping identity percentage and minimum overlapping length parameters were set to 90% and 60 bp, respectively, to obtain highly reliable consensus sequences. Sequences that could not be assembled at this stage were referred to as singletons and were not taken into consideration in the following steps. In contrast, the resulting contigs were translated into six reading frames and used as a query to search the non-redundant protein databases available at the National Center for Biotechnology Information (NCBI) using the BlastX algorithm with an E-value ≤10−3 (version # 2.2.15, GenBank release number #166) ( Sequences with BlastX hits were assigned to the following five sequence categories: known, uncharacterized, predicted, unknown or unnamed, and hypothetical proteins. These terms correspond to the ‘definition’ category of available protein sequences deposited on GenBank ( All unique sequences with BlastX hits (E-value ≤10−3) were functionally annotated using Blast2GO ( by mapping against gene ontology (GO) resources.

Construction and sequencing of digital gene expression (DGE) libraries

Two DGE libraries were constructed with testis RNA obtained from five males collected at Missirah (salinity 40 psu) and five males collected at Kaolak (salinity 95 psu). Sequence tag preparation was achieved with Illumina's Digital Gene Expression Tag Profiling Kit according to the manufacturer's protocol (version 2.1B). For each library, 5 μg of an equimolar mix of the five total RNA samples was incubated with oligodT beads. Synthesis of first- and second-strand cDNA was performed using superscript II reverse transcription kit according to the manufacturer's instructions (Invitrogen). The cDNAs were cleaved using the NlaIII anchoring enzyme. Subsequently, digested cDNAs were ligated with the GEX adapter 1 containing a restriction site of MmEI. A second digestion with MmeI was then performed, which cuts 17 bp downstream of the CATG site. At this point, the fragments detached from the beads. Then, the GEX adapter 2 was ligated to the 3′ end of the tags. In view of enriching the samples with the desired fragments, a PCR amplification with 12 cycles using Phusion polymerase (Finnzymes) was performed with primers complementary to the adapter sequences. The resulting fragments of 85 bp were purified by excision from a 6% polyacrylamide TBE gel. DNA was eluted from the gel debris with NEBuffer 2 by gentle rotation for 2 h at room temperature. Gel debris was removed using Spin-X Cellulose Acetate Filter (2 mL, 0.45 mm), and DNA was precipitated by adding 10 mL of 3 M sodium acetate (pH 5.2) and 325 mL of cold ethanol, followed by centrifugation at 13 000 g for 20 min. After washing the pellet with 70% ethanol, the DNA was resuspended in 10 mL of 10 mM Tris-HCl (pH 8.5) and quantified using Nanodrop 1000 spectrophotometer. Cluster generation was performed by applying 4 pM of each sample to individual lanes of an Illumina 1G flowcell. After hybridization of the sequencing primer to the single-stranded products, 35 cycles of base incorporation were carried out on the 1G analyzer according to the manufacturer's instructions. Image analysis and base calling were performed using the Illumina Pipeline, where sequence tags were obtained after purity filtering. This was followed by sorting and counting the unique tags.

Tag comparison between DGE libraries, gene selection and primer design

The sequence files of each DGE library were analysed by Skuldtech company (Montpellier, France). Comparisons of DGE libraries were performed using the exact number of tags in each library and assumed that each tag has an equal chance of being detected (Piquemal et al. 2002). The associated statistical values were obtained from Pearson correlations between tag counts and expressed as P-values (Appendix S1). To identify potentially differentially expressed genes, the two DGE libraries were scrutinized for tags that showed the most differential counts, using a P-value <0.001. Only tags showing a minimum of 10 occurrences in at least one of the two libraries were considered. This resulted in 2214 distinct tags that showed different counts (Fig. 1). Among them, 711 could be assigned to the EST library. Over these 711 tags, 60 were randomly chosen such that they were over-represented in one of the two salinity conditions, with a two-fold count difference threshold. Conversely, a P-value >0.1 was applied to identify tags that showed conserved counts between the two salinities. Under such conditions, a total of 2959 distinct tags were identified (Fig. 1), among which 785 could be assigned to the EST library. Twelve of them were selected according to their apparent highest stability. The selected tags were locally blasted against the RNA-seq library, and all the sequences corresponding to the selected tags (100% identity) were aligned with ClustalX version 2.1 software using standard settings (Larkin et al. 2007). Primers were designed from each resulting consensus sequence with the online RealTime PCR software tool from Integrated DNA Technologies (, using the following settings: optimal Tm of 62 °C, optimal length of 22 nt and optimal GC content of 50%.


Figure 1. Comparison of the tag counts in the 2 DGE libraries constructed from fish at two salinity extremes (40 and 95 psu). Comparison was obtained from Pearson correlations between actual tag counts and results expressed as P-values. A low P-value indicates a high level of biological significance. For easier visualization, values were normalized to the number of total tags of each library (count/total number of reads * 10 000), and each tag was colour-coded according to its representation in each library (reflected by its P-value).

Download figure to PowerPoint

cDNA synthesis and real-time PCR

Reverse transcription of male RNA extracts was performed with oligodT primers on 250 μg RNA, using the transcriptor first-strand cDNA synthesis kit (Roche). A template-primer mixture consisting of 250 μg RNA and 2.5 μM oligodT was denatured at 65 °C for 10 min and immediately cooled on ice. The reaction (in 20 μL final) was supplemented with reaction buffer (1X), dNTPs (1 mM each), RNase inhibitor (20 U) and reverse transcriptase (10 U), incubated for 1 h at 50 °C, then heated for 5 min at 85 °C and immediately cooled on ice. The resulting cDNAs were diluted 10 times with 180 μL H2O and stored at −20 °C until use.

PCR amplifications were carried out in 384-well plates with a LightCycler 480 (Roche) in a final volume of 6 μL containing 3 μL of SYBR Green I Master mix (Roche), 2 μL of cDNA and 0.5 μM of each primer. Amplifications were performed in duplicate or in triplicate with an initial denaturation step of 10 min at 95 °C followed by 40 cycles of denaturation at 95 °C for 10 s, annealing at 60 °C for 10 s and elongation at 72 °C for 10 s. Amplifications were followed by a melting procedure, consisting of a brief denaturation at 95 °C for 5 s, a cooling step at 65 °C for 1 min and a slow denaturation to 97 °C. Amplification efficiency of each primer pair was calculated from dilution curves generated using serial dilutions (1:1, 1:2, 1:5, 1:10, 1:20, 1:50, 1:100) of a unique cDNA pool, consisting of a mix of 12 cDNAs (4 cDNAs per – 0, 35 and 70 psu). A linear regression was applied on the resulting dilution curves, and the regression coefficient (R2) as well as the slope was calculated. Primer pairs were validated only when their corresponding R2 was higher than 0.99. Amplification products were also verified by analysing the shape of their corresponding melting curve and by measuring their size on agarose gel electrophoresis. Only the primers yielding a single product, without any primer-dimers, were validated. Each qPCR run contained a no-template control for every primer pair. Cycle of quantification (Cq) values were calculated with the LightCycler software, using the second derivative method. Results were expressed as changes in relative expression according to the inline image method (Pfaffl 2001). Cq values were first corrected with the amplification efficiency of each primer pair according to the following equation: inline imagewhere E is the efficiency and inline image the uncorrected Cq values. Then, the corrected Cqs of each gene of interest were normalized with the mean Cq of reference genes (ΔCq), and ΔCq values were related to the average ΔCq value of all samples. All qPCR results were analysed with the GenEx Pro package (MultiD Analyses, Sweden).


  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data accessibility
  10. Supporting Information

Main features of the sequence data

Considering the scarcity of tilapia sequences in public databases, the first step of this study consisted in establishing a large collection of expressed sequences. It was generated from fish collected in the Sine Saloum estuary. To make this transcript collection as comprehensive as possible, individuals from the three locations (with salinities of 40, 53 and 95 psu) and at all stages of sexual maturity were represented. RNA-seq generated a total of 28 981 363 bp. Sequence assembly resulted in 30 022 contigs and 86 291 singletons, and contig length ranged from 150 to >3000 bp. Nearly 60% of them could be annotated from public databases. The main features of sequence data are displayed in Table 1.

Table 1. Summary statistics of the RNA-seq data
Statistics for contigs
Number of bases in all contigs11 368 093
Number of contigs30 022
Number of contigs in N508880
Minimum contig length150
Maximum contig length3099
Median contig length423
GC content of contigs53.13%
Statistics for singletons
Number of bases in all singletons17 613 270
Number of singletons86 291
Number of singletons in N5031 086
Median singleton length202
GC content of singletons52.63%

As a starting point to investigate differential gene expression in the testis of fish reproducing under different salinities, two DGE libraries were also constructed from five males collected at the locations displaying the most extreme salinities: Missirah (salinity 40 psu) and Kaolak (salinity 95 psu). Their sequencing resulted in a total of 367 813 and 537 303 tags, respectively, and represented 39 687 and 69 499 unique tags. Among these unique tags, 7119 and 11 850 could be assigned to the transcript library, respectively.

All this sequence data were organized into an interactive navigation system. This platform includes a sequence viewer that enables exploration of consensus sequences, gene families, putative associated proteins, SNPs or allelic mutations, as well as a local BLAST alignment tool to search for peptidic and nucleotidic motifs in the database. It also allows comparisons of DGE libraries under various stringency conditions. In addition, it gives access to raw sequence data and allows exportation of sequences in fasta format. This platform has been made publicly available and can be accessed through the following address: Details about the functions of this platform may be provided upon request.

Primer validation

The 72 novel primer pairs designed in this study were first verified for their ability to amplify one single product with an acceptable efficiency. Under the conditions tested, 15 primer sets gave rise to either a lack of amplification or secondary products, as revealed by melting curve analysis and agarose gel electrophoresis. Furthermore, three additional pairs yielded poor amplification efficiencies, with linear regression coefficients <0.99. For these reasons, 18 primer sets were excluded from the analyses. Amplification efficiencies of the 54 remaining primer pairs (43 potential genes of interest and 11 potential reference genes) ranged between 0.8 and 1.1. The sequence of these primers, together with the amplicon length and the amplification efficiency, is displayed in Table 2.

Table 2. List of the genes validated by qPCR
Sequence nameaPrimer sequenceAmplicon lengthAmplification efficiency
  1. a

    Names of the sequences as they appear on the web sequence viewer (

Potential candidate genes

































































































































Potential reference genes


































Because genomic information regarding intron-exon boundaries was not available for Sarotherodon melanotheron, it was not possible to design primers spanning different exonic regions. For this reason, two DNAse treatments were applied on each RNA sample: one directly on the columns during the extraction procedure and a second one in solution on the RNA eluates. Relatively high levels of background genomic DNA were detected in single DNAse-treated RNA extracts (Cq ranged from 23.9 to 31.5). This signal was not detected in twice DNAse-treated samples (Cq>35), indicating the necessity of two DNAse treatments for elimination of genomic DNA in cDNA samples.

The optimal primer concentration was also assessed. For each primer pair, four concentrations (0.25, 0.5, 0.75 and 1 μM) were tested. Comparison of amplification plots showed that Cq values were steady for the three highest concentrations, whereas they were in most cases higher for 250 nM. Besides, melting profiles indicated the absence of primer-dimer or secondary peak for all tested concentrations. For these reasons, primers were used at a final concentration of 500 nM in all subsequent experiments.

Estimation of experimental reproducibility

As for any new assay, evaluation of the experimental biases that may impair quantitative results is also essential. To address this critical issue, experimental reproducibility was first assessed through the following nested protocol: RNA was extracted in duplicate, and reverse transcription and qPCR were both performed in triplicate, which resulted in 18 Cq measurements per sample and per gene. This protocol was applied with two different genes (transcript_AVA2_10563 and transcript_AVA3_453) that produced nearly similar mean Cq values (~20), on three individual fish samples originating from three distinct salinities, and repeated two times independently. Results revealed that the highest source of variation (expressed as SD of Cqs), after that originating from samples, could be attributed to the reverse transcription reaction (SD ranged from 0.095 to 0.409); conversely, RNA extraction produced the lowest variation (SD ranged from 0 to 0.166), while SD of qPCR repeats varied from 0.076 to 0.130. When the same experiment was repeated using 1 μL of template cDNA instead of 2 μL, the SD of qPCR replicates dramatically increased as it varied from 0.283 to 0.369. For this reason, the amount of cDNA used was always 2 μL, as stated in the MM section.

As reverse transcription was the main source of variability, we also evaluated its reproducibility across a range of RNA concentrations. For this purpose, serial dilutions (1:1, 1:2, 1:5, 1:10, 1:20, 1:50, 1:100) were prepared from a pool of RNAs (50 ng/μL), and each dilution (50–0.5 ng/μL) was reverse-transcribed. The corresponding cDNAs were amplified with two primer pairs (transcript_AVA2_10563 and transcript_AVA3_453), and Cq values were plotted against the logarithm of the initial RNA concentration. This experiment was repeated twice independently. In each case, it revealed a good linearity with a R2 ≥ 0.99, and amplification efficiencies were comprised between 0.63 and 0.86. Results obtained with the primer pair transcript_AVA3_453 are displayed in Fig. 2.


Figure 2. Reproducibility of the reverse transcription step. Serial dilutions were prepared from a pool of RNAs, and each dilution (50–0.5 ng/μL) was reverse-transcribed. The corresponding cDNAs were amplified with the primer pair transcript_AVA3_453 (a), and Cq values were plotted against the logarithm of the initial RNA concentration (b).

Download figure to PowerPoint

Taken altogether, these results suggest that our experimental workflow is trustworthy and should not supply substantial experimental variability to the biological results.

Validation of reference genes

Another crucial point consisted in validating suitable genes that could be used as appropriate reference genes for subsequent relative quantifications. The twelve genes previously identified as potential references were assayed with geNorm (Vandesompele et al. 2002) and NormFinder (Andersen et al. 2004) algorithms. For this end, their Cq values were measured in a set of 12 fish samples collected from three salinities (0, 35 and 70 psu, 4 fish/salinity) and their stability examined. Only the genes displaying an M-value <0.55 (with geNorm software) and an SD <0.5 (with NormFinder software) were conserved, which resulted in the selection of five of them. Then, these five genes were amplified in a set of 22 fish originating from the three salinity locations. Analysis of their expression stability across samples with geNorm and NormFinder (both ignoring and taking salinity groups into account) revealed very congruent results. It led to the validation of four of these genes showing an M-value <0.42 and an SD <0.36 (Table 3). According to NormFinder, using these 4 genes as reference instead of only one would decrease the accumulated SD by nearly 2 (Table 3).

Table 3. Expression stability of the four selected reference genes
Sequence namegeNormNormFinder 
M-valueSDAcc. SD
  1. A low M-value or standard deviation indicates high expression stability. The combined use of the 4 genes as reference decreases the accumulated standard deviation.



  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data accessibility
  10. Supporting Information

The present study is the first large-scale analysis of the transcriptome of the black-chinned tilapia, an ecologically fascinating fish species with exceptional adaptive capacities, especially in regards to its reproductive behaviour. As Sarotherodon melanotheron is a non-model species, the procedure was divided in two separate stages, to provide a resource that will be valuable in any study dealing with the reproduction of this fish. The first one consisted of building a large transcript collection from two major organs involved in reproduction, that is liver and gonads (Mommsen & Walsh 1988; Wiegand 1996), collected from fish at all stages of sexual maturity and under different salinities. The purpose of this collection was neither to evaluate the total number of transcripts expressed in these organs nor to estimate a transcriptome coverage, as is now the case for most RNA-seq projects in model animals (Wang et al. 2009), but rather to provide a first genomic resource in this species for which only a very limited number of sequences were available so far (Tine et al. 2008). It is worth mentioning that when these newly obtained sequences were annotated, the genome sequence of the Nile tilapia, Oreochromis niloticus, had not yet been released, which is the reason why almost none of these annotations that can be found on the tilapia database website ( refer to O. niloticus. However, a new annotation round performed on a selected set of sequences revealed no major changes in the protein prediction (data not shown), probably because the genome annotation of O. niloticus was performed automatically. Moreover, the ‘export’ function of the sequence viewer enables easy updates of alignments and annotations.

In contrast, the second stage of this project aimed at addressing a more specific question on the reproductive biology of S. melanotheron, that is, the identification of genes in male gonads subjected to changes in their expression according to salinity. As demonstrated by several groups, DGE is particularly suited for quantification of transcript abundance (Asmann et al. 2009), especially in non-model organisms for which no reference genome is available (Hong et al. 2011). For this reason, DGE libraries were compared between fish living in the most extreme salinity environments of the Sine Saloum estuary. The library comparison enabled identification of hundreds of genes potentially differentially expressed between the two environments. This list of genes is likely to serve as a wealthy basis for the deeper understanding of the molecular mechanisms that allow S. melanotheron heudelotii to reproduce in such a wide range of salinities. Furthermore, a set of 43 genes of interest has been validated in the present work. Even though analysis of their putative role in the adaptation of male spermatogenesis to salinity is beyond the scope of this article and will be the focus of a complementary study, a first look at their predicted function indicates that several of them have already been described as playing a key role in spermatogenesis or in homoeostasis. For instance, contig_Tilapia_90_27008 matches a MORC family CW-type zinc finger 2 protein, which absence was shown to trigger the stop of spermatogenesis in mice (Perry & Zhao 2003); contig_Tilapia_90_21432 corresponds to a seminal plasma glycoprotein, which harbours the faculty to immobilize sperm cells in mice as well (Mochida et al. 2002); contig_Tilapia_90_947 corresponds to a Na+/K+ -transporting ATPase subunit alpha-1, which is involved in the active ion excretion and uptake for maintaining the intracellular ionic balance (Lorin-Nebel et al. 2012). Finally, of these 43 genes, six did not match any known protein.

Although often overlooked, validation of candidate genes identified from NGS data by laboratory-bench-scale routine methods, such as real-time PCR, requires a number of prior evaluations. This is especially true for the accurate selection of reference genes in relative expression, as it was extensively demonstrated that stability of housekeeping genes greatly depend on the species, tissue, developmental stage and experimental conditions (McCurley & Callard 2008; Tang et al. 2012). Here, the use of geNorm and NormFinder algorithms led to very congruent results and identified four genes as the most stably expressed across fish individuals and environmental salinities. It also indicated that using four genes simultaneously would result in lower standard deviations. Those four genes, which could be attributed a putative function with good confidence, all belong to the list of potential housekeeping genes described in humans (Eisenberg & Levanon 2003).

It is acknowledged that the reverse transcription step accounts for a large part of variability in a qPCR assay (Bishop et al. 1997). To limit this bias, all the primers were selected in the most 3′ region of the transcripts; this was made easy by the 3′ tag approach that was used for DGE. Then, combined with the use of oligo-dTs, this dramatically reduced the probability to obtain cDNAs that could not be amplified by the designed primers because of incomplete reverse transcription. Although reverse transcription was the main source of variability in the present case, it was yet very limited, as illustrated by the RNA dilution curve that showed a good linearity. Likewise, cDNA dilution curves showed excellent linearity over two logs for the 54 primer pairs. The Cq values measured with all the primer pairs on 22 individual cDNAs from fish collected at different salinities were all comprised within this range (not shown). This indicates that the range of dilutions used to measure the amplification efficiencies was sufficient to cover most, if not all, RNA concentrations of the targeted genes that can be found in individual samples. This was expected as the sample used for dilutions consisted of a mix of 12 different cDNAs and as such was supposed to comprise most expressed RNAs at highly variable concentrations.

In conclusion, the present study has generated a large transcriptomic resource that will be valuable for a great number of studies focusing on the functional genomics of this interesting fish and more broadly of any species presenting salinity-related plasticity. It also identified and validated a large set of genes that will provide a significant tool for the deeper understanding of the molecular mechanisms that allow S. melanotheron heudelotii to reproduce in a wide range of salinities. Finally, this resource will also provide useful tools for population genetics studies on S. melanotheron (Consortium et al. 2013), but also on many other phylogenetically related species.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data accessibility
  10. Supporting Information

This study was supported by an INSU-EC2CO grant (IPREP, 2010–2012). We are thankful to Dr Bruno Guinand for critical reading of this manuscript. We are also grateful to Laurent Manchon and Fabien Pierrat for their input regarding mathematical treatment of the sequence data.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data accessibility
  10. Supporting Information
  • Andersen CL, Jensen JL, Ørntoft TF (2004) Normalization of real-time quantitative reverse transcription-PCR data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Research, 64, 52455250.
  • Asmann YW, Klee EW, Thompson EA et al. (2009) 3 ‘tag digital gene expression profiling of human brain and universal reference RNA using Illumina Genome Analyzer. BMC Genomics, 10, 11.
  • Bishop GA, Rokahr KL, Lowes M et al. (1997) Quantitative reverse transcriptase-PCR amplification of cytokine mRNA in liver biopsy specimens using a non-competitive method. Immunology and Cell Biology, 75, 142147.
  • Bustin SA, Benes V, Garson JA et al. (2009) The MIQE Guidelines: minimum information for publication of quantitative real-Time PCR experiments. Clinical Chemistry, 55, 611622.
  • Bustin SA, Beaulieu JF, Huggett J et al. (2010) MIQE precis: practical implementation of minimum standard guidelines for fluorescence-based quantitative real-time PCR experiments. Bmc Molecular Biology, 11, 5.
  • Consortium MERPD, Arranz SE, Avarre JC et al. (2013) Permanent genetic resources added to molecular ecology resources database 1 December 2012-31 January 2013. Molecular Ecology Resources, 13, 546549.
  • Duponchelle F, Panfili J (1998) Variations in age and size at maturity of female Nile tilapia, Oreochromis niloticus, populations from man-made lakes of Cote d'Ivoire. Environmental Biology of Fishes, 52, 453465.
  • Duponchelle F, Cecchi P, Corbin D, Nunez J, Legendre M (2000) Variations in fecundity and egg size of female Nile tilapia, Oreochromis niloticus, from man-made lakes of Cote d'Ivoire. Environmental Biology of Fishes, 57, 155170.
  • Eisenberg E, Levanon EY (2003) Human housekeeping genes are compact. Trends in Genetics, 19, 362365.
  • Fraser BA, Weadick CJ, Janowitz I, Rodd FH, Hughes KA (2011) Sequencing and characterization of the guppy (Poecilia reticulata) transcriptome. BMC Genomics, 12, 14.
  • t Hoen PAC, Ariyurek Y, Thygesen HH et al. (2008) Deep sequencing-based expression analysis shows major advances in robustness, resolution and inter-lab portability over five microarray platforms. Nucleic Acids Research, 36, 11.
  • Hong LZ, Li J, Schmidt-Kuntzel A, Warren WC, Barsh GS (2011) Digital gene expression for non-model organisms. Genome Research, 21, 19051915.
  • Larkin MA, Blackshields G, Brown NP et al. (2007) Clustal W and clustal X version 2.0. Bioinformatics, 23, 29472948.
  • Legendre M, Ecoutin JM (1989) Suitability of brackish water tilapia species from the Ivory Coast for lagoon aquaculture. I – Reproduction. Aquatic Living Resources, 2, 7179.
  • Legendre M, Cosson J, Hadi Alavi SM, Linhart O (2008) Activation of sperm motility in the euryhaline tilapia Sarotherodon melanotheron heudelotii (Dumeril, 1859) acclimatized to fresh, sea and hypersaline waters. Cybium, 2, 181182.
  • Lorin-Nebel C, Avarre JC, Faivre N et al. (2012) Osmoregulatory strategies in natural populations of the black-chinned tilapia Sarotherodon melanotheron exposed to extreme salinities in West African estuaries. Journal of Comparative Physiology B-Biochemical Systemic and Environmental Physiology, 182, 771780.
  • McCurley A, Callard G (2008) Characterization of housekeeping genes in zebrafish: male-female differences and effects of tissue type, developmental stage and chemical treatment. Bmc Molecular Biology, 9, 102.
  • Mochida K, Matsubara T, Andoh T et al. (2002) A novel seminal plasma glycoprotein of a teleost, the Nile tilapia (Oreochromis niloticus), contains a partial von Willebrand factor type D domain and a zona pellucida-like domain. Molecular Reproduction and Development, 62, 5768.
  • Mommsen TP, Walsh PJ (1988) Vitellogenesis and oocyte assembly. In: Fish Physiology, Vol XI, The Physiology of Developing Fish, Part A. Eggs and Larvae (eds Hoar WS & Randall DJ), pp. 347406. Academic press, San Diego.
  • Ogari J, Dadzie S (1988) The food of the Nile perch, Lates niloticus (L.), after the disappearance of the haplochromine cichlids in the Nyanza Gulf of lake Victoria (Kenya). Journal of Fish Biology, 32, 571577.
  • Panfili J, Mbow A, Durand JD et al. (2004) Influence of salinity on the life-history traits of the West African black-chinned tilapia (Sarotherodon melanotheron): comparison between the Gambia and Saloum estuaries. Aquatic Living Resources, 17, 6574.
  • Panfili J, Thior D, Ecoutin JM, Ndiaye P, Albaret JJ (2006) Influence of salinity on the size at maturity for fish species reproducing in contrasting West African estuaries. Journal of Fish Biology, 69, 95113.
  • Perry J, Zhao Y (2003) The CW domain, a structural module shared amongst vertebrates, vertebrate-infecting parasites and higher plants. Trends in Biochemical Sciences, 28, 576580.
  • Pfaffl MW (2001) A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Research, 29, 6.
  • Piquemal D, Commes T, Manchon L et al. (2002) Transcriptome analysis of monocytic leukemia cell differentiation. Genomics, 80, 361371.
  • Stearns SC, Crandall RE (1984) Plasticity for age and size at sexual maturity: a life-history response to unavoidable stress. In: Fish Reproduction: Strategies and Tactics (eds Potts GW, Wootton RJ), pp. 1333. Academic Press, London.
  • Stewart KM (1988) Changes in condition and maturation of the Oreochromis niloticus L. population at Freguson's Gulf, Lake Turkana, Kenya. Journal of Fish Biology, 33, 181188.
  • Tang Y-K, YU J-H, Xu P et al. (2012) Identification of housekeeping genes suitable for gene expression analysis in Jian carp (Cyprinus carpio var. jian). Fish & Shellfish Immunology, 33, 775779.
  • Tine M, de Lorgeril J, D'Cotta H et al. (2008) Transcriptional responses of the black-chinned tilapia Sarotherodon melanotheron to salinity extremes. Marine Genomics, 1, 3746.
  • Tine M, Guinand B, Durand JD (2012) Variation in gene expression along a salinity gradient in wild populations of the euryhaline black-chinned tilapia Sarotherodon melanotheron. Journal of Fish Biology, 80, 785801.
  • Vandesompele J, De Preter K, Pattyn F et al. (2002) Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes. Genome Biology, 3, research0034.1–research0034.11. (
  • Wang Z, Gerstein M, Snyder M (2009) RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics, 10, 5763.
  • Wiegand MD (1996) Composition, accumulation and utilization of yolk lipids in teleost fish. Reviews in Fish Biology and Fisheries, 6, 259286.

J.C.A. and J.D.D. designed the project. R.D., P.A., A.D., C.J. and N.F. contributed to the experiments. R.D. and C.C. were in charge of fish care. J.C.A., P.A., A.D. and C.J. analysed the data. J.C.A. and J.D.D. wrote the manuscript.

Data accessibility

  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data accessibility
  10. Supporting Information

RNA-seq and DGE libraries have been organized into an interactive database that is freely accessible ( Moreover, the whole project, including raw DNA sequences, can be found under the SRA study accession SRP022935.

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Material and methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References
  9. Data accessibility
  10. Supporting Information
men12148-sup-0001-AppendixS1.pdfapplication/PDF13KAppendix S1 Mathematical approach: analysis of differential expression.

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.