Volume 95, Issue 2
Original Article
Free Access

Differential expression networks and inheritance patterns of long non‐coding RNAs in castor bean seeds

Wei Xu

Department of Economic Plants and Biotechnology, and Yunnan Key Laboratory for Wild Plant Resources, Kunming Institute of Botany, Chinese Academy of Sciences, 132 Lanhei Road, Kunming, 650201 China

Search for more papers by this author
Tianquan Yang

Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201 China

Search for more papers by this author
Bin Wang

Department of Economic Plants and Biotechnology, and Yunnan Key Laboratory for Wild Plant Resources, Kunming Institute of Botany, Chinese Academy of Sciences, 132 Lanhei Road, Kunming, 650201 China

Graduate University of the Chinese Academy of Sciences, Beijing, 100049 China

Search for more papers by this author
Bing Han

Department of Economic Plants and Biotechnology, and Yunnan Key Laboratory for Wild Plant Resources, Kunming Institute of Botany, Chinese Academy of Sciences, 132 Lanhei Road, Kunming, 650201 China

Graduate University of the Chinese Academy of Sciences, Beijing, 100049 China

Search for more papers by this author
Huangkai Zhou

Guangzhou Gene denovo Biotechnology, Guangzhou, 510006 China

Search for more papers by this author
Yue Wang

Department of Economic Plants and Biotechnology, and Yunnan Key Laboratory for Wild Plant Resources, Kunming Institute of Botany, Chinese Academy of Sciences, 132 Lanhei Road, Kunming, 650201 China

Search for more papers by this author
De‐Zhu Li

Germplasm Bank of Wild Species, Kunming Institute of Botany, Chinese Academy of Sciences, Kunming, 650201 China

Search for more papers by this author
Aizhong Liu

Corresponding Author

E-mail address: liuaizhong@mail.kib.ac.cn

Department of Economic Plants and Biotechnology, and Yunnan Key Laboratory for Wild Plant Resources, Kunming Institute of Botany, Chinese Academy of Sciences, 132 Lanhei Road, Kunming, 650201 China

Key Laboratory for Forest Resources Conservation and Utilization in the Southwest Mountains of China, Ministry of Education, Southwest Forestry University, Kunming, 650224 China

For correspondence (e‐mail

liuaizhong@mail.kib.ac.cn

).Search for more papers by this author
First published: 08 May 2018
Citations: 7

Summary

Long non‐coding RNAs (lncRNAs) serve as versatile regulators of plant growth and development. The potential functions and inheritance patterns of lncRNAs, as well as the epigenetic regulation of lncRNA itself, remain largely uncharacterized in plant seeds, especially in the persistent endosperm of the dicotyledons. In this study, we investigated diverse RNA‐seq data and catalogued 5356 lncRNAs in castor bean seeds. A small fraction of lncRNAs were transcribed from the same direction as the promoters of protein‐coding genes (PCgenes) and exhibited strongly coordinated expression with the nearby PCgene. Co‐expression analysis with weighted gene co‐expression network analysis (WGCNA) showed these lncRNAs to be involved in differential transcription networks between the embryo and endosperm in the early developing seed. Genomic DNA methylation analyses revealed that the expression level of lncRNAs was tightly linked to DNA methylation and that endosperm hypomethylation could promote the expression of linked lncRNAs. Intriguingly, upon hybridization, most lncRNAs with divergent genome sequences between two parents could be reconciled and were expressed according to their parental genome contribution; however, some deviation in the expression of allelic lncRNAs was observed and found to be partially dependent on parental effects. In triploid endosperm, the expression of most lncRNAs was not dosage sensitive, as only 20 lncRNAs had balanced dosage. Our findings not only demonstrate that lncRNAs play potential roles in regulating the development of castor bean endosperm and embryo, but also provide novel insights into the parental effects, allelic expression and epigenetic regulation of lncRNAs in dicotyledonous seeds.

Introduction

Long non‐coding RNAs (lncRNAs) are transcripts of longer than 200 bp that lack coding potential (Guttman et al., 2009; Rinn and Chang, 2012; Zhu and Wang, 2012). They are mainly transcribed by RNA polymerase II, and are then capped, spliced and polyadenylated (Struhl, 2007; Koch et al., 2008). In plants, lncRNAs can also be transcribed by RNA polymerase IV and V (Böhmdorfer et al., 2014). In general, lncRNAs are grouped into long intergenic non‐coding RNAs (lincRNAs), long non‐coding natural antisense transcripts (lncNATs) and intronic lncRNAs according to their positions in the genome (Derrien et al., 2012; Rinn and Chang, 2012). Compared with protein‐coding genes (PCgenes), most lncRNAs exhibit low conservation among species (Marques and Ponting, 2009; Li et al., 2014; Necsulea et al., 2014), low expression levels and strong tissue‐specific patterns of expression (Cabili et al., 2011; Liu et al., 2012; Li et al., 2014; Wang et al., 2015). In animals, lncRNAs are involved in diverse biological processes, such as dosage compensation of X chromosomes, genomic imprinting and brain development (Penny et al., 1996; Mehler and Mattick, 2007; Geisler and Coller, 2013). Intricate mechanisms for lncRNAs in the control of gene expression at the transcriptional and post‐transcriptional levels have been described (Geisler and Coller, 2013).

With the rapid development of deep RNA‐seq approaches, thousands of lncRNAs have been identified and characterized in several model plants, including Arabidopsis (Liu et al., 2012; Wang et al., 2014), rice (Oryza sativa; Zhou et al., 2009; Zhang et al., 2014), maize (Zea mays; Li et al., 2014) and cotton (Gossypium spp; Wang et al., 2015). Although only a few lncRNAs have been functionally characterized in plants to date, the functions and regulatory mechanisms of lncRNAs appear to be diverse. For example, the vernalization‐induced lncRNAs (COOLAIR and COLDAIR) can suppress the expression of FLOWERING LOCUS C (FLC) via an epigenetic silencing mechanism, resulting in flowering in Arabidopsis (Heo and Sung, 2011). An endogenous lncRNA, INDUCED BY PHOSPHATE STARVATION 1 (IPS1), works as a ‘decoy’ of miR399 in a target mimicry mechanism to increase the expression of the miR399 target PHOSPHATE2 (PHO2), consequently altering the phosphate content in the plant shoot (Franco‐Zorrilla et al., 2007). A long day‐specific male fertility‐associated lincRNA (LDMAR) is required for normal pollen development in rice (Ding et al., 2012). Indeed, several reproduction‐related lncRNAs have been functionally characterized in rice (Zhang et al., 2014). LncRNAs transcribed by Pol IV and V can work as RNA scaffold molecules to modulate DNA methylation through RNA‐directed DNA methylation (RdDM; Böhmdorfer et al., 2014; He et al., 2014; Matzke and Mosher, 2014). In addition, some lncRNAs play important roles in plant responses to biotic and abiotic stresses (Navarro et al., 2012; Shuai et al., 2014; Wang et al., 2017). In short, lncRNAs might have important biological roles in plant growth and development.

In flowering plants, seed development is a key and carefully‐regulated process subject to both genetic and epigenetic control (Sreenivasulu and Wobus, 2013), but lncRNAs have not yet been thoroughly interrogated in seeds, especially in the endosperm tissue. Studies of the lncRNAs in plant seeds would facilitate the further elucidation of the molecular or expression network differences between the development of embryo and endosperm, two seed compartments that contain identical genetic materials with different maternal and paternal genome dosages. Importantly, the endosperm might be an excellent system to discover novel features of lncRNAs in a way that would be difficult in other tissues. For example, the endosperm is a unique triploid tissue containing two copies of the maternal genome and one copy of the paternal genome, allowing us to study how unequal contributions of the paternal and maternal genomes affect the expression of lncRNAs and whether the expression of lncRNAs is dosage sensitive. Also, the endosperm is the most epigenetically diverse plant tissue, exhibiting extensive demethylation and genomic imprinting, this provides a unique chance to uncover the effect of these epigenetic factors (including DNA methylation and parent‐of‐origin effects) on the expression of lncRNAs, which has not yet been investigated.

The seed of castor bean (Ricinus communis, Euphorbiaceae) has a relatively large and persistent endosperm (Greenwood and Bewley, 1982), unlike other dicotyledons such as Arabidopsis that have an extremely small and transient endosperm (Sreenivasulu and Wobus, 2013). Thus, castor bean is used as a model plant for studying endosperm and seed development in dicotyledons (Houston et al., 2009). In addition, castor bean endosperm accumulates abundant and unique ricinoleic acid, which is widely used in aviation oil and lubricants, and as feedstock for biodiesel production (Ogunniyi, 2006). Currently, the castor bean genome has been sequenced and is publicly available (Chan et al., 2010). In this study, we obtained strand‐specific RNA sequencing (ssRNA‐seq) and DNA methylation data, and comprehensively characterized the genomic expression, DNA methylation and inheritance patterns of lncRNAs in endosperm and embryo tissues of castor bean seeds. Our results further our understanding of the potential functions, parental effects, allelic expression and epigenetic regulation of lncRNAs in dicotyledonous seeds.

Results

Identification and characterization of lncRNAs in castor bean seeds

After high‐depth ssRNA‐seq, about 88 million reads were generated from each sample, and over 80% reads were successfully aligned to the castor bean genome (Table S1). Combined with our previous mRNA‐seq data (Xu et al., 2014), we created de novo transcript assemblies and generated 80 237 transcripts (Figure 1a). By removing the known PCgenes, we obtained an initial pool of 47 650 transcripts that had no overlap with the PCgenes in the sense strand. These transcripts were filtered based on their sequence length (less than 200 nucleotides) and expression level [fragments per kilobase of transcript per million mapped reads (FPKM) < 0.5], and 42 842 transcripts were retained. Furthermore, the remaining transcripts with known protein domains were excluded by BLASTX searches against the Pfam database, and transcripts derived from the structural non‐coding RNAs were removed. Finally, the transcripts were subjected to the Coding Potential Calculator (CPC) and Coding‐Non‐Coding Index (CNCI). After stringent filtration, 5356 transcripts were identified as lncRNAs in castor bean seeds, including 4640 lincRNAs and 716 lncNATs (Figure 1a; Table S2). To validate these lncRNAs, 10 highly expressed lncRNAs in castor bean seeds were selected for polymerase chain reaction (PCR) and sequencing. All lncRNAs exhibited the expected sequences and exon–intron splicing patterns (Figure S1).

image

Identification and characterization of long non‐coding RNAs (lncRNAs) in castor bean seeds.

(a) Schematic pipeline for the identification of lncRNAs in castor bean seeds.

(b) Length density distributions of long intergenic non‐coding RNAs (lincRNAs), long non‐coding natural antisense transcripts (lncNATs) and protein‐coding genes (PCgenes).

(c) Distribution of exon numbers in lincRNAs, lncNATs and PCgenes.

(d) Cumulative frequency of A/U content in lincRNAs, lncNATs and PCgenes, including their 5′‐UTR, 3′‐UTR and body region.

(e) Numbers of castor bean lncRNAs conserved in Arabidopsis, Jatropha curcus, rice (Oryza sativa) and maize (Zea mays).

(f) Percentage of different transposable elements (TEs) within lncRNAs and the genome.

To more clearly characterize the lncRNAs in castor bean seeds, we compared them, including the lincRNAs and lncNATs, with PCgenes. As shown in Figure 1(b), the sequence length of the lncRNA transcripts (average length of 803 nt) was shorter than the PCgene transcripts (average length of 990 nt), but the average length of the exons in the lncRNAs (658 nt) was substantially longer than that of the PCgenes (251 nt). As for exon numbers, approximately 88% of the lincRNAs and 77% of the lncNATs contained only a single exon, which are significantly higher proportions than PCgenes (35% having a single exon; Figure 1c). Interestingly, the castor bean lncRNAs showed more A/U‐rich regions relative to the PCgenes and were similar to 5′‐UTRs, but had less A/U‐rich regions than 3′‐UTRs of PCgenes (Figure 1d). As compared castor bean lncRNAs with genomic sequences from Arabidopsis, Jatropha curcus, rice (O. sativa) and maize (Z. mays), we found, as expected, only a small portion of lncRNAs (from 14% between castor bean and rice to 17% between castor bean and J. curcus) had significant hits (BLASTN, E ≤ 10−5), suggesting substantially low conservation as compared with PCgenes (Fisher exact test, = 2.2−16; Figure 1e). A comparison of the castor bean lncRNAs with known lncRNAs of other species revealed that only 11 castor bean lncRNAs were conserved in Arabidopsis, two in rice and 19 in maize (Table S3). When we assessed the positional conservation of the lncRNAs, 178, 33 and 65 castor bean lncRNAs were detected in their corresponding collinearity regions in Arabidopsis, rice and maize, respectively (Table S3). In addition, we found that 16.3% of lncRNAs (874) overlapped with the transposable elements (named as TE‐lncRNAs) and were more often derived from retrotransposons than DNA transposons (Figure 1f). Among these TEs, long terminal repeat retrotransposons of the Gypsy (45.2%) and Copia (28.8%) families were the most abundant. The relative percentage of TE families within lncRNAs was in good agreement with that of castor bean genome (Figure 1f).

Association of the expression of lncRNAs and protein‐coding RNAs

The overall expression levels of both lincRNAs and lncNATs were significantly lower than those of PCgenes (both < 0.01, t‐test) in the embryo and endosperm tissues of castor bean (Figure 2a; Table S4). To validate their expression patterns, we selected 40 lncRNAs, eight that were highly expressed in embryo tissues and 32 that were highly transcribed in endosperm tissues. Quantitative (q) real‐time‐polymerase chain reaction (RT‐PCR) was performed on root, stem, leaf, pollen and ovule tissues, as well as embryo and endosperm tissues of seeds at 25 days after pollination (DAP), and we found that the lncRNAs exhibited strong tissue‐specific expression patterns (Figure 2b). Overall, the qRT‐PCR results were largely in agreement with the RNA‐seq data (37 out of the 40 lncRNAs were expressed in the expected tissue), except for TCONS_00015695, TCONS_00075540 and TCONS_00069950 (Figure 2c). A Pearson correlation analysis of the log2 expression level differences between the ZB107 endosperm and embryo revealed a strong correlation between the results of the qRT‐PCR and ssRNA‐seq analyses (rp = 0.82).

image

Expression of long non‐coding RNAs (lncRNAs) and protein‐coding genes (PCgenes).

(a) Expression levels of long intergenic non‐coding RNAs (lincRNAs), long non‐coding natural antisense transcripts (lncNATs) and PCgenes in castor bean endosperm and embryo tissues as illustrated by the boxplot.

(b) Heatmap representing tissue‐specific expression patterns of 40 lncRNAs using quantitative real‐time‐polymerase chain reaction (qRT‐PCR). The asterisks indicate lncRNAs with expression levels not consistent with the RNA‐seq data.

(c) Summary of various types and numbers of lncRNA–PCgene pairs in castor bean seeds. Red and blue lines depict lncRNAs (lincRNAs or lncNATs) and PCgenes, respectively, at the right of the diagram. Arrows indicate the direction of transcription initiation. The lincRNA–PCgene and PCgene–PCgene pairs were restricted to adjacent 5‐kb regions.

(d) The density distribution of the Pearson correlation coefficient for lincRNA–PCgene, lncNAT–PCgene and PCgene–PCgene pairs.

(e) The density distribution of the Pearson correlation coefficients for promoter‐associated lincRNAs and nearby PCgenes. ‘US’ indicates lincRNAs upstream of PCgenes with the same direction of transcription; ‘UO’ indicates lincRNAs upstream of PCgenes with the opposite direction of transcription.

(f) Two examples of lincRNA–PCgene pairs with a strong positive correlation (rp > 0.9). The panels from top to bottom show read coverage, read alignment and transcript models.

LncRNAs affect gene expression in a cis (neighboring genes) or trans (distant genes) manner. In this study, Pearson correlation coefficient (rp) was used to estimate the expression correlation of different transcript pairs within 5 kb. First, we removed those transcripts (including lncRNAs and PCgenes) with low expression levels (FPKM < 2) and those lncRNAs that were present within the 500‐bp flanking region of a PCgene. Accordingly, we identified 2838 lincRNA–PCgene, 615 lncNAT–PCgene and 23 364 PCgene–PCgene pairs (Figure 2c). In addition, 2254 lincRNAs were located further from the nearest PCgene (>5 kb; Figure 2c). We observed a high percentage of positive correlations (rp ≥ 0.8) in the lincRNA–PCgene pairs (12.8% versus 6.2% for randomly sampled lincRNA pairs) and in the PCgene–PCgene pairs (19.2% versus 6.5% for randomly sampled PCgene pairs; Figure 2d). For lncNAT–PCgenes, a large number of transcript pairs exhibited neither positive nor negative correlations (Figure 2d). In addition, there were 2528 distant lincRNA sites exhibiting strong positive correlations (rp ≥ 0.8) with PCgenes. Further, we identified 1415 lincRNAs that were transcribed from regions near the promoters of PCgenes (promoter‐associated lncRNAs), including 780 that were transcribed in the same direction (named as US) and 635 that were transcribed in the opposite direction as the adjacent PCgene (named as UO; Figure 2c). The lincRNA–PCgene pairs with the same transcription direction exhibited a stronger correlation than those pairs with opposite direction of transcription (Figure 2e). Notably, among these 1415 promoter‐associated lncRNAs, 230 pairs were found to be located inside the collinearity regions between castor bean and other species, including 122 US pairs and 108 UO pairs.

A gene ontology analysis of those PCgenes showing strong positive correlation (rp ≥ 0.8) with the lncRNAs revealed that most lncRNAs were involved in binding, catalytic and metabolic processes (Figure S2). For instance, we identified the lncRNA (TCONS_00023103) in the promoter region of gene 29733.m000747 (Figure 2f), which was homologous to AT1G80270 (PPR596) in Arabidopsis, and involved in RNA binding and editing (Doniwa et al., 2010). Another lncRNA (TCONS_0005030) was expressed at the promoter region of gene 30074.m001369, which was homologous to Arabidopsis AT1G72180, functioning in the regulation of nitrogen metabolism (Tabata et al., 2014; Figure 2f).

The differential expression network between the embryo and endosperm development

Although the embryo and endosperm tissues are the products of double fertilization and have nearly identical genetic information, they exhibit dramatic differences in multiple characteristics. Differences in gene expression and regulatory networks are critical for determining embryo and endosperm development. To exploit the potential function of lncRNAs in the development of the castor bean embryo and endosperm, we first performed a hierarchical clustering analysis of their 2345 lncRNAs (mean FPKM ≥ 1) and 12 035 PCgenes (mean FPKM ≥ 2), and found that the lncRNAs exhibited stronger tissue specificity than the PCgenes (Figure 3a). A large majority of lncRNAs in cluster I were highly or specifically expressed in at least one embryo tissue, while they were lowly expressed in endosperm tissue. LncRNAs in cluster II were specifically expressed in the endosperm. This contrast showed that the differences between embryo and endosperm development may be partly due to the differences in the expression of transcripts, especially for lncRNAs. Furthermore, we performed a weighted gene co‐expression network analysis (WGCNA) and obtained 13 distinct modules, as shown in the dendrogram (Figure 3b), in which major tree branches define the modules (labeled with different colors). The modules closely related to embryo or endosperm development were of particular interest in this study. Therefore, we correlated the modules with the distinct samples, as we had already obtained the module's expression profile. We found that two modules, ‘darkorange’ and ‘orange’, were the most significantly associated with the embryo and endosperm samples, respectively (Figure 3c).

image

Co‐expression network of transcripts including long non‐coding RNAs (lncRNAs) and protein‐coding genes (PCgenes) involved in endosperm and embryo development in castor bean.

(a) The expression abundance and cluster of PCgenes (FPKM ≥ 2, in the top) and lncRNAs (FPKM ≥ 1, in the bottom) across all samples.

(b) Hierarchical cluster tree and color bands indicating the 13 modules identified by weighted gene co‐expression network analysis (WGCNA).

(c) The analysis of module–trait correlations. Each row represents a module and each column represents a specific tissue. Each cell at the row–column intersection is color‐coded by correlation according to the color legend.

(d) Heatmap indicating the eigengene expression profile for the dark orange module in embryos and endosperms.

(e) The partial correlation network of the dark orange module. The lncRNAs and PCgenes are colored in light blue and green, respectively. The hub lncRNAs with high eigengene connectivity are indicated by larger circles in red.

The module ‘darkorange’ comprised transcripts that were highly expressed in all embryo tissues (Figure 3d). The putative functions of these transcripts were significantly enriched for ribosome biogenesis and assembly, mRNA metabolism including RNA splicing, transport, degradation, and surveillance and nucleotide excision repair (Table S5). Four hub lncRNAs (TCONS_00023544, TCONS_00060634, TCONS_00043996 and TCONS_00068183) were highlighted in the network due to the high eigengene connectivity (Figure 3e). Among these lncRNA‐connected PCgenes, several of the embryo‐specific genes encode transcription factors, including bHLH (30146.m003550 and 29738.m001043), AP2 (29805.m001494), MYB (30170.m013948), bZIP (28226.m000857) and B3 family members (Auxin response factor, 30185.m000956); these transcription factors are known to play important roles in embryo development of Arabidopsis (Le et al., 2010).

The transcripts in the ‘orange’ module were over‐represented in endosperm tissues (Figure S3a) and were substantially enriched for functions in a variety of metabolic processes, such as carbon metabolism (especially for glycolysis), nitrogen metabolism, amino acid metabolism, flavone and flavonol biosynthesis, and zeatin biosynthesis (Table S6). For example, we identified a hub lncRNA, TCONS_00079888, that showed high connection with key genes involved in glycolysis, including SUS1 (29739.m003693), fructose‐1,6‐bisphosphatase (29576.m000229) and glyceraldehyde 3‐phosphate dehydrogenase (30169.m006270; Figure S3b).

DNA methylation of lncRNAs

Considering the important regulatory roles of lncRNAs, the expression of lncRNA itself must be tightly regulated. The regulation by DNA methylation of the expression of PCgenes is well studied, but its regulatory role for lncRNAs has not been well characterized. Therefore, we investigated the DNA methylation profiles around lncRNAs, and found that the average level of methylation within the 2‐kb flanking region and body region of lncRNAs was substantially higher than that of PCgenes (Figure 4a). For both lncRNAs and PCgenes, the level of DNA methylation was reduced near the transcription start and stop sites. Notably, the methylation levels in CG and CHG sequence contexts in the lncRNAs and PCgenes in endosperm tissues were lower than those in embryo tissues, whereas the CHH methylation level in endosperm tissues was comparable to that in embryo tissues (Figure 4a). We evaluated the level of DNA methylation for the lncRNAs with different expression levels, including low expression (0 < FPKM ≤ 1), moderate expression (1 < FPKM ≤ 10) and high expression (FPKM > 10). The highly expressed lncRNAs were found to have the lowest methylation levels of the CG, CHG and CHH sequence contexts at both their flanking and body regions (Figure 4b). In contrast, lncRNAs with low expression levels had the highest methylation levels for all three sequence contexts. This indicated that DNA methylation (regardless of sequence contexts and regions) and lncRNA expression were negatively correlated, in contrast to our previous finding for PCgenes where only the methylation of CG contexts in the promoter region was negatively correlated with PCgene expression (Xu et al., 2016). Moderately expressed PCgenes had the highest level of CG methylation within the gene body (Xu et al., 2016), but this was not observed in the moderately expressed lncRNAs. In addition, we found that, interestingly, TE‐lncRNAs exhibited the highest methylation level of all three sequence contexts (Figure 4c) and lowest expression level as compared with non‐TE‐associated lncRNAs (Figure 4d). In particular, we investigated the relationship between these TE‐lncRNAs and 24‐nucleotide siRNA abundance (Xu et al., 2016), and found that there were more 24‐nucleotide siRNAs enriched in TE‐lncRNA loci, but fewer siRNAs in non‐TE‐associated lncRNA loci (Figure 4e).

image

DNA methylation profiles of long non‐coding RNAs (lncRNAs) and protein‐coding genes (PCgenes) in castor bean seeds.

(a) Average DNA methylation levels across lncRNAs (blue lines) and PCgenes (red lines) in castor bean embryos and endosperm, respectively.

(b) Association between DNA methylation and lncRNA expression in CG, CHG and CHH sequence contexts. The average methylation level for each 100‐bp interval is plotted. The dashed lines represent the point of alignment.

(c) DNA methylation of CG, CHG and CHH sequence contexts in transposable element (TE)‐associated lncRNAs and non‐TE lncRNAs.

(d) The expression levels of TE‐associated lncRNAs and non‐TE lncRNAs in embryo.

(e) The abundance of 24‐nucleotide siRNAs within TE‐associated lncRNA and non‐TE lncRNA regions.

To examine whether endosperm hypomethylation can promote the expression of lncRNAs, we identified a set of 343 lncRNAs with fourfold or greater transcription levels in endosperm compared with embryo (Table S7) and compared their DNA methylation levels. These endosperm‐preferred lncRNAs showed relatively low levels of methylation in CG and CHG sequence contexts in endosperm tissues relative to embryo (Fisher's exact test, P < 0.01), whereas the CHH sequence context exhibited a similar methylation level (Figure 5a). As illustrated in Figure 5(b), positions 9000–16 000 on scaffold 30 150 exhibited obvious differences in methylation in the CG and CHG sequence contexts between embryo and endosperm tissues, in which five endosperm‐enriched lncRNAs (TCONS_00060963–TCONS_00060967) were highly expressed in endosperm tissues. To further test whether the decrease of DNA methylation in the embryo might enhance the expression of lncRNAs (and because no methyltransferase mutants are available in castor bean), we cultured early whole embryos in vitro on basic MS medium with 25 μmol 5‐azacytidine (5‐Az), and compared the expression of 32 genes in the 5‐Az‐treated and untreated embryos. Approximately 84.4% lncRNAs exhibited increased expression (at least twofold higher) in the 5‐Az‐treated embryos, whereas only one lncRNA (TCONS_00074481) was unchanged and four (TCONS_00076127, TCONS_00069950, TCONS_00080228 and TCONS_00012069) were downregulated (Figure 5c). In sum, these results indicated that DNA methylation negatively correlated with the expression of lncRNAs and suggested that endosperm hypomethylation may promote their expression.

image

Local DNA hypomethylation and expression of long non‐coding RNAs (lncRNAs) in endosperm tissues.

(a) Average methylation levels for endosperm‐enriched genes with fourfold higher expression levels relative to the embryo are plotted for each 100‐bp interval.

(b) DNA methylation levels of all three sequence contexts in positions 9000–18 000 on scaffold 30 150 in endosperm and embryo tissues. The read coverage of lncRNAs in endosperm (green peaks) and in embryo (gray peaks) tissues is shown at the bottom.

(c) The differences in expression levels of 32 lncRNAs in 5‐azacytidine (5‐Az)‐treated embryos relative to untreated controls. TCONS_00060967 (in red) with hypermethylation in embryo tissues, as shown in (b), was strongly activated in 5‐Az‐treated embryos.

Inheritance patterns of allelic lncRNAs

Currently, little is known about expression patterns of allelic lncRNAs, especially when the lncRNA transcripts from two accessions are highly divergent in sequence (Table S8) and/or when the parental genome dosage is unbalanced in the endosperm tissue. Here, we evaluated the inheritance patterns of the lncRNAs by comparing the allelic expression ratios in the hybrids based on their single nucleotide polymorphisms (SNPs). The majority of allelic lncRNAs in the hybrid embryos, similar to PCgenes, exhibited an expected 1:1 maternal to paternal ratio, while the expected 2:1 maternal to paternal ratio was observed in the hybrid endosperms (Figures 6a and S4a). However, we found that a small proportion of lncRNAs exhibited the deviation in expression from that expected based on the genome ratio in hybrid seeds of castor bean.

image

Allelic expression of long non‐coding RNAs (lncRNAs) in hybrid endosperms and imprinted lncRNAs.

(a) Log2 normalized read counts for all single nucleotide polymorphism (SNP) loci for the hybrid endosperms. The gray and green dots represent the alleles from protein‐coding genes (PCgenes) and lncRNAs, respectively. The red and blue dashed lines denote the 1:1 and 2:1 ratios (maternal to paternal), respectively.

(b) Changes of parental allelic expression in the hybrid endosperms. The tables show all possible changes to parental allelic expression levels. The symbols =, > and < represent no change (equal to the ratio of parent genome dosage), upregulation (significantly greater than the expected ratio) and downregulation (significantly less than the expected ratio), respectively. The colored tables indicate the main patterns of allelic expression.

(c) Imprinted lncRNAs confirmed by Sanger sequencing of RT‐PCR sequencing. The upper panel indicates the overall expression level of transcribed regions as shown in blue for the hybrid endosperms. The relative proportion of expression levels for specific SNP sites are shown for both ZB107 (blue lines) and ZB306 alleles (orange lines). The exons and introns are depicted by red rectangles and black lines, respectively. The lower panel shows the RT‐PCR sequencing chromatographs around the imprinted SNP site (with red underline).

Such biased expression of alleles in hybrid cells has previously been explained by the interaction of parental alleles such as dominance, where the expression level of one allele is similar to that of a specific parent. To understand the parental effects on allelic lncRNA expression in hybrid castor bean seeds, we examined the relative ratio of ZB107 and ZB306 alleles for the hybrids in which the lncRNAs were significantly differentially expressed in the two parents (Figure S4b; Tables S9 and S10). We found that the alleles in the hybrids were most likely to exhibit significant deviation from the expected ratio when the lncRNAs had substantially differential expression in the parents (Figures 6b and S4c). For example, when the lncRNAs had a similar expression level in the endosperm of two parents (referred to as PZB107 = PZB306), over 70% allelic expression in the hybrid endosperm followed a 2:1 maternal to paternal ratio (AZB107 = AZB306). However, when the lncRNAs were highly expressed in the ZB107 accession compared with the ZB306 (PZB107 > PZB306), the expression of most alleles was not proportional to the genomic contribution, exhibiting the significantly preferential expression of the ZB107 allele in both hybrid endosperms (a ratio of ZB107 allele to ZB306 allele greater than 2:1, AZB107 > AZB306; Figure 6b). When the lncRNAs were highly expressed in the ZB306 accession compared with the ZB107 (PZB107 < PZB306), most of the ZB306 alleles had a higher expression level in the hybrid endosperm tissues as well (a ratio of ZB306 allele to ZB107 allele greater than 2:1, AZB107 < AZB306; Figure 6b). Interestingly, we found that the expression of ZB107 alleles in hybrid endosperm tissues exhibited a stronger activation than did ZB306 alleles, resulting in a small portion of alleles (21.7% in ZB107 × ZB206; 12.1% in ZB107 × ZB206) deviating from the expected 2 m:1p ratio in the case of PZB107 = PZB306 (Figure 6b). A similar inheritance pattern was also observed in hybrid embryo tissues (Figure S4c).

Although lncRNAs from the two parents exhibited divergent sequences, our results revealed a clear pattern of expression inheritance in the hybrids, such that the allelic expression of most lncRNAs could be reconciled and was expressed according to their parental genome contribution. Meanwhile, we demonstrated that the expression levels of the majority of alleles were not affected by the dosage imbalance of the parental genome in the hybrid endosperm. A small fraction of allelic lncRNAs exhibited biased expression in the hybrid seeds, occurring most frequently at loci where parental expression levels were substantially different.

Identification and clustering of imprinted lncRNAs

The differential expression of allelic lncRNAs can be explained in part by the parent‐of‐origin effects or genomic imprinting. Based on stringent criteria (see Experimental procedures), we identified 20 lncRNAs that showed a significant deviation from the expected ratio of 2 m:1p [false‐discovery rate (FDR) < 0.05] and that exhibited maternally imprinted expression patterns in reciprocal endosperm at 25 DAP (Table 1). Among them, seven lncRNAs (TCONS_00057398, TCONS_00012904, TCONS_00021068, TCONS_00057400, TCONS_00066202, TCONS_00008995 and TCONS_00066278) were lncNATs, and the remaining lncRNAs belonged to lincRNAs. We also identified 56 newly imprinted PCgenes, including 55 maternally imprinted genes (MEGs) and one paternally imprinted gene (PEG) in reciprocal endosperm (Table S11). To validate their imprinted status, we selected four lncRNAs (TCONS_00012904, TCONS_00016515, TCONS_00021097 and TCONS_00032648) with high read‐mapping support, which enabled us to easily detect their allele‐specific expression using RT‐PCR sequencing. Three maternally imprinted lncRNAs were validated using independent sources of RNA from reciprocal endosperms at the same developmental stage, which predominantly expressed the maternal allele in reciprocal crosses. One maternally imprinted lncRNA (TCONS_00032648) was biallelically expressed in the ZB107 × ZB306 cross, whereas it was specifically maternally expressed in the ZB306 × ZB107 cross, suggesting an incomplete imprinting status (Figure 6c).

Table 1. Identification of maternally imprinted lncRNAs in castor bean endosperm
lncRNA ID Scaffold Position SNP ZB107/ZB306 ZB107 × ZB306 ZB306 × ZB107 FDR
m p m p
TCONS_00003196 27 942 86 152 C/A 85 0 57 6 0.011
TCONS_00006363 28 359 258 606 T/C 29 0 23 0 7.834E‐06
TCONS_00007832 28 629 169 937 C/T 119 21 122 22 6.86E‐06
TCONS_00008995 28 842 236 283 A/G 63 12 31 5 0.003
TCONS_00011704 29 506 18 417 A/C 22 4 15 2 0.035
TCONS_00012904* 29 596 156 161 G/A 823 0 283 53 1.24E‐145
TCONS_00016515* 29 647 552 053 G/A 84 0 101 8 1.62E‐15
TCONS_00021068 29 709 8205 A/G 22 0 11 1 0.000
TCONS_00021097* 29 709 63 208 C/T 84 0 118 12 1.42E‐16
TCONS_00027657 29 794 554 854 A/T 68 9 24 3 8.72E‐06
TCONS_00032648* 29 841 1 125 638 A/G 190 27 115 12 2.59E‐06
TCONS_00038584 29 904 848 625 G/T 366 0 63 14 0.040
TCONS_00043999 29 958 1 065 389 G/A 26 0 25 3 2.64E‐05
TCONS_00057400 30 138 272 617 A/G 43 0 29 5 2.69E‐08
TCONS_00061981 30 162 1 844 853 A/G 23 0 11 0 8.92E‐05
TCONS_00063679 30 170 1 972 242 C/T 20 2 28 5 0.007
TCONS_00066202 30 174 2 588 562 T/A 25 0 22 2 3.97E‐05
TCONS_00066278 30 174 2 830 658 T/C 12 0 10 0 0.007
TCONS_00067543 30 179 190 807 T/C 25 4 16 3 0.040
TCONS_00057398 30 138 272 617 A/G 43 0 29 5 2.69E‐08
  • The letters ‘m’ and ‘p’ represent the read counts for the maternal allele and the paternal allele, respectively. The asterisks indicate the lncRNAs that were confirmed by RT‐PCR sequencing.
  • FDR, false‐discovery rate; lnc RNA, long non‐coding RNA; SNP, single nucleotide polymorphism.

In mammals, clusters of imprinted genes are governed by imprinting control regions that often contain one or more lncRNAs as their regulator, such as Air and Kcnq1ot1 (Sleutels et al., 2002; Thakur et al., 2004; Mancini‐Dinardo et al., 2006). Here, we tested whether the castor bean imprinted genes (including those newly identified in this study and those from our previous study; Xu et al., 2014) and lncRNAs are similarly clustered. Based on their positions in the castor bean genome, we identified six candidate clusters containing one imprinted gene and one lncRNA within a region of 10 kb (Figure S5a). Among these clusters, we found that maternally imprinted lncRNAs were always clustered with maternally imprinted PCgenes, but not with paternally imprinted PCgenes, different from the observation in mammals. Interestingly, there were two clusters (TCONS_00003196/27942.m000160 and 29506.m000164/TCONS_00011704) in which imprinted lncRNAs exhibited a strong positive correlation with adjacent imprinted PCgenes (rp ≥ 0.8; Figure S5b). These results point to a potential function of imprinted lncRNAs in imprinted clusters.

Discussion

LncRNAs serve as versatile regulators of many aspects in plant growth and development (Zhang and Chen, 2013). However, little is known about their roles in seed development, especially in endosperm, which is a triploid tissue with complex epigenetic regulation. Here, we undertook a systems‐level analysis of the expression network, inheritance patterns and epigenetic regulation of lncRNAs in castor bean seeds with persistent endosperms.

We identified 4640 lincRNAs and 716 lncNATs in castor bean seeds by integrating 10 ssRNA‐seq data sets and five high‐depth RNA‐seq data sets. The sequence, exon–intron architecture and expression of lncRNAs were confirmed by RT‐PCR sequencing and/or qRT‐PCR analysis, giving us high confidence that these are indeed valid to be considered as lncRNAs. The number of lncRNAs we identified in castor bean seeds is far less than in other model plants such as Arabidopsis, maize and cotton, mostly likely because of the strict filter criteria applied in this study. The number of lncRNAs can be strongly influenced by different filter criteria (e.g., whether the precursors of small RNAs were considered) in different analyses (Ulitsky, 2016). The identified lncRNAs from castor bean seed shared most of the common features of lncRNAs reported in other plants, such as short sequence length, single exons, rich A/U content and low conservation between species, suggesting that lncRNAs might share a common and ancient evolutionary origin and have undergone a rapid sequence evolution (Ulitsky et al., 2011). As for the low sequence conservation of lncRNAs among species, one potential complication worthy of consideration is heterogeneous data and different identification approaches; thus, the conservation of lncRNAs might be underestimated. It should be noted that more lncRNAs exhibited positional conservation than sequence conservation between species, consistent with findings reported in vertebrates (Hezroni et al., 2015) and other plants (Mohammadin et al., 2015; Deng et al., 2018). Interestingly, we found that 874 lncRNAs in the castor bean genome overlapped with TEs, similar to observations in vertebrates (Kapusta et al., 2013) and other plants (Wang et al., 2015, 2016, 2017), implying that TEs might make a contribution to the origin and divergence of lncRNAs among organisms. Recent studies have revealed the potential function of TE‐derived lncRNAs in plants (Wang et al., 2016, 2017), indicating that the presence of a TE within the lncRNA is not deleterious to lncRNA function. However, more extensive experimental studies are needed.

LncRNAs can regulate the expression of nearby PCgenes in a cis or trans manner (Fatica and Bozzoni, 2014). In the current study, a large number of distant lncRNAs (5 kb away from PCgenes) exhibited a strong and positive correlation in expression with PCgenes, but whether these lncRNAs exert their function in trans, or as enhancers or insulators, needs to be further determined. Only a small fraction of lincRNA–PCgenes (within 5 kb) exhibited a strong and positive correlation in expression, suggesting that transcription of these pairs may be coordinately regulated. It is tempting to speculate that coordinated transcription of lincRNAs with nearby PCgenes may be due to common regulatory sequences in their promoter regions, and/or that these lncRNAs themselves can positively regulate the transcription of nearby genes in cis. Intriguingly, the coordinated transcription of some lincRNA–PCgenes might have evolutionary consequences, given their positional conservation among species. Further studies on the mechanisms underlying coordinated transcription of lncRNA–PCgene pairs should provide additional insights into the function of lncRNAs in plants.

One of the main objectives in this study was to understand the co‐expression network controlling the development of the endosperm and embryo in castor bean. Expression cluster analysis showed that lncRNAs exhibited more obvious tissue‐specific expression patterns relative to the PCgenes, suggesting that the function of lncRNAs may be closely related to embryo and endosperm identity and development. The co‐expression network analysis with WGCNA revealed tissue‐specific modules and important hub genes involved in the development of embryo and endosperm. The expression network mediated by lncRNAs in the embryo was mainly responsible for the DNA replication and excise repair, RNA splicing and editing, and protein processing, which guaranteed the faithful transmission of genetic information during cell division. Some hub lncRNAs that were highly correlated with transcription factors were involved in embryo development, whereas the transcripts including PCgenes and lncRNAs in endosperm were involved in various metabolic processes, especially carbon metabolism. The hub lncRNA TCONS_00079888 might play an important role in the control of carbon flux. These findings add new knowledge in understanding the development processes of endosperm and embryo in plants.

Compared with PCgenes, lncRNAs exhibited overall higher methylation levels, which might, at least in part, explain the overall lower expression of lncRNAs. Meanwhile, a negative correlation was observed between DNA methylation and the expression of lncRNAs, independent of the methylated regions and different methylation sequence contexts. In particular, TE‐associated lncRNAs have a higher methylation level and lower expression level than non‐TE‐associated lncRNAs. Although the biological functions of these TE‐lncRNAs largely remain unknown, they might be involved in the RdDM pathway (as illustrated in Figure 4e), there are more 24‐nucleotide siRNAs associated with these TE‐lncRNAs. There is evidence that lncRNAs can work as RNA scaffold molecules to modulate RdDM (Böhmdorfer et al., 2014; He et al., 2014; Matzke and Mosher, 2014). Furthermore, a large proportion of lncRNAs with low expression in castor bean embryos were upregulated after treatment with a methyltransferase inhibitor in vitro. In addition, endosperm hypomethylation in CG and CHG sequence contexts appeared to promote the expression of endosperm‐preferred lncRNAs. These findings indicate that the expression of lncRNAs in castor bean seeds is tightly linked to DNA methylation.

Upon hybridization, we monitored the expression of allelic lncRNAs that were transcribed from different intergenic regions in the parents. A large majority of alleles were expressed according to their parental contribution, suggesting that hybrid cells can reconcile the expression of the majority of divergent lncRNAs inherited from their parents. The expression ratios of most allelic lncRNAs in the hybrid endosperms were consistent with the ratio of maternal to paternal genome (2 m:1p), suggesting that the expression of most lncRNAs was not dosage‐sensitive and conformed to Mendelian inheritance. However, a relatively small proportion of lncRNA loci exhibited significant deviation from their expected expression levels during hybridization, such that the expression of allelic lncRNAs in the hybrid was largely determined by their expression levels in the parents. When lncRNAs were dominantly expressed in one of the parents, their alleles were most likely to be significantly highly expressed in the hybrids, which may be due to the effects of cis interactions. However, several lncRNAs that were significantly differentially expressed between the parents still showed expression of alleles that followed the contribution of parental genome dosage, suggesting that the hybrid cells can reconcile the divergent genomes in a trans‐interaction manner in some cases. This study represents the first report of, and sheds light on, the inheritance patterns of allelic lncRNAs in developing seeds at the early stage, but whether these cis and trans effects on allelic expression will be present at later stages of seed development or be inherited in the next generation needs further study.

Additionally, the existence of triploid endosperm with a 2:1 maternal:paternal genome ratio raises a question about the evolutionary significance of parental genome imbalance. Genome imprinting may be an important mechanism for modulating the genome dosage, where genes are expressed in a parent‐of‐origin‐dependent manner (Bauer and Fischer, 2011). However, the extent and potential role of imprinted lncRNAs in the triploid endosperm remain unclear. Recent explorations in plant endosperm, such as rice, maize and now castor bean, have identified a handful of imprinted lncRNAs involved in the modulation of genome dosage (Luo et al., 2011; Zhang et al., 2011). These imprinted lncRNAs are likely to be dosage sensitive or important for the developmental processes of the endosperm, and warrant further inquiry.

Conclusion

We comprehensively identified and characterized 5356 lncRNAs in castor bean seeds. Base on genome‐wide analyses we found that a small fraction of lncRNAs exhibited coordinated expression with nearby PCgenes. Further co‐expression network analysis with WGCNA revealed the potential functions of lncRNAs during the castor bean seed development. Genomic DNA methylation analyses revealed that the expression level of lncRNAs was tightly linked to DNA methylation and that endosperm genome hypomethylation can significantly promote the expression of lncRNAs. We further characterized the inheritance patterns of allelic lncRNA expression during hybridization. Importantly, these results provide valuable information pointing to potential roles for lncRNAs in the development of castor bean seeds. Our findings also shed light on the inheritance patterns of lncRNA expression and the epigenetic regulation of lncRNA itself in plants.

Experimental procedures

Plant materials and strand‐specific transcriptome sequencing

The seeds from two inbred lines of castor bean, ZB107 and ZB306 (kindly provided by Zibo Academy of Agricultural Sciences, Shandong, China), were germinated and planted in a greenhouse at Kunming Institute of Botany (Kunming, Yunnan, China). Seeds from ZB107, ZB306 and their reciprocal crosses (F1: ZB107 × ZB306 and F1′: ZB306 × ZB107) were harvested at the early stages of development (25 DAP), while those from a cross between ZB306 and ZB107 were harvested as late‐stage seeds (35 DAP). The endosperm and embryo tissues were immediately isolated from each sample, and washed with sterile water to avoid tissue contamination. In total, 10 samples were collected, frozen in liquid nitrogen, and stored at −80°C for total RNA isolation.

Total RNA was isolated from these samples using TRIzol (Invitrogen, Carlsbad, CA, USA) and purified using an RNeasy Mini Kit (Qiagen, Valencia, CA, USA) following the manufacturer's protocols. High‐quality RNA was used to construct ssRNA libraries using the TruSeq Stranded Total RNA with Ribo‐Zero Plant kit (Illumina, San Diego, CA, USA) following the manufacturer's instructions. The prepared libraries were sent to BGI‐Shenzhen for transcriptome sequencing on the Illumina HiSeqTM 2000 system (producing paired‐end 125‐bp reads).

Identification of lncRNAs and expression analysis

In this study, we integrated 10 ssRNA‐seq data sets from this study and five mRNA‐seq data sets from our previous seed libraries (SRX485027; Xu et al., 2014). All raw reads were filtered to remove adapter sequences, contaminating sequences and low‐quality reads. The clean reads were aligned to the castor bean reference genome (http://castorbean.jcvi.org/index.php) using the splice read aligner TopHat 2.0 (Trapnell et al., 2012), and a second alignment of reads was performed as proposed by Cabili et al. (2011) to increase the number of mapped reads in splice junctions derived from all samples. The transcriptome of each sample was assembled independently using CUFFLINKS then merged together to form a consensus transcriptome assembly using CUFFMERGE. The expression level of each transcript was normalized using its FPKM value.

To identify high‐confidence lncRNAs, the transcripts were subjected to the following stringent filtration steps. In brief, the newly assembled transcriptome was compared with the castor bean reference genome annotations using CUFFCOMPARE, and transcripts that overlapped with known genes in the sense strand were discarded. Transcripts with sequence length less than 200 bp and FPKM less than 0.5 were removed. The transcripts derived from structural non‐coding RNAs (such as ribosomal RNAs, transfer RNAs, small nuclear RNAs and small nucleolar RNAs) were excluded (E ≤ 10−5). To remove transcripts encoding proteins or protein domains, those with significant hits (BlastX, E ≤ 10−5) in the SwissProt and Pfam databases were removed. Finally, the CPC (Kong et al., 2007) and CNCI software (Sun et al., 2013) were used to evaluate the coding potential of the remaining transcripts, and those with a significant coding potential (CPC score ≥0 or CNCI score ≥0) were discarded. The identified lncRNAs were further divided into two classes, lincRNAs and lncNATs, according to the anatomical properties of their gene loci.

Differential expression analysis and characterization of lncRNAs

The edgeR software package was employed to identify differentially expressed transcripts between pairs of samples (FDR ≤ 0.05 and fold change ≥2; Robinson et al., 2010). The length, exon number and A/U content of the transcripts, including lincRNAs, lncNATs and PCgenes, were analyzed by a custom Perl script. LincRNAs that partially overlapped with TEs, but were not fully contained within them, were defined as TE‐associated lincRNAs. To evaluate the sequence conservation, lncRNAs identified in castor bean seeds were used as the query data set in a BLASTN search against the genomes of other species, including J. curcus, Arabidopsis thaliana, O. sativa and Z. mays (retrieved from Phytozome 12.0; Nordberg et al., 2014), and against lncRNAs from A. thaliana, O. sativa and Z. mays (downloaded from the plant lncRNAs database GreeNC, developed by Paytuví Gallart et al. (2016). The searches were performed with a cutoff query identity ≥50% and E ≤ 10−5. Similar to the strategy reported by Deng et al. (2018), the positional conservation was assessed by investigating whether any lncRNAs were transcribed from close to (500 bp ≤ distance ≤5 kb) or within the collinearity regions in the same orientation between castor bean and A. thaliana, O. sativa or Z. mays. The collinearity regions between species were retrieved from the PGDD database (http://chibba.agtec.uga.edu/duplication/), developed by Lee et al. (2012).

Nearest‐neighbor analysis

To determine the expression relationship between the lncRNAs and protein‐coding RNAs, we identified a set of transcript pairs between the lincRNAs and the nearest PCgenes (500 bp ≤ distance ≤ 5 kb), and between the lncNATs and the corresponding PCgenes. The Pearson correlation coefficient (rp) was employed to estimate the similarity of the expression patterns using the following formula by a custom Perl script:

urn:x-wiley:09607412:media:tpj13953:tpj13953-math-0001

Expression network construction

Hierarchical cluster analyses were separately performed for the PCgenes (mean FPKM ≥ 2) and lncRNAs (mean FPKM ≥ 1) using the OmicShare tools (www.omicshare.com/tools). WGCNA (v1.47) was used to construct the unsigned co‐expression networks based on the transcript expression matrix (Langfelder and Horvath, 2008). A step‐by‐step network construction and module detection method were adopted using the ‘cutreeDynamic’ and ‘mergeCloseModules’ with the following parameters: the power was 13; the minModuleSize was 30; the cutHeight was 0.25. We investigated the relationships between the transcripts in the modules and the samples, and the important modules that were significantly associated with the measured sample trait were identified. To understand the biological functions of the modules, the genes in the modules were subjected to KEGG pathway enrichment analysis. Finally, the co‐expression network was visualized by Cytoscape (v3.5.0) software (Shannon et al., 2003).

Analysis of DNA methylation of lncRNAs

DNA methylation data from the endosperm and embryo tissues of the castor bean inbred line ZB107 were downloaded from the NCBI Short Read Archive under accession number SRX1267331, and the methylation ratios of CG, CHG and CHH sequence contexts were calculated as described in our previous study (Xu et al., 2016). The methylation profiles in the 2‐kb flanking regions and the lncRNA bodies were plotted based on the average methylation level for each 100‐bp interval.

To confirm the effect of DNA methylation on the expression of lncRNAs, we compared the expression of lncRNAs between 5‐Az‐treated embryo tissues and normal embryos cultured in vitro. Briefly, we isolated 20 whole embryos from seeds collected at 25 DAP; 10 embryos were cultured in basic MS medium containing 25 μm 5‐Az, and the other 10 embryos were cultured in normal MS medium as the negative control. After 3 weeks, total RNA was extracted from the embryo tissues and subjected to qRT‐PCR with three independent biological replications. We detected 32 lncRNAs that were expressed at low levels in the embryos. The qRT‐PCR program was as follows: pre‐cycling at 95°C for 2 min followed by 40 cycles of 95°C for 30 sec, 56°C for 30 sec and 72°C for 30 sec. All primers used in this experiment are listed in Table S12.

Allelic expression of lncRNAs and imprinted lncRNAs in hybrid endosperm tissues

To determine the expression of allelic lncRNAs in the castor bean seeds from the reciprocal crosses, we first detected SNPs between the two parents, ZB107 and ZB306. The ‘mpileup’ module in the SAMtool software (Li et al., 2009) was employed to call the SNPs using the following stringent criteria: (1) the high‐quality bases values (Q > 20); (2) high coverage (>10 reads); and (3) exclusion of heterozygous sites.

Based on high‐quality SNPs, the read numbers of ZB107 and ZB306 alleles from each cross were summed from a ‘pileup’ file that was generated from the ‘mpileup’ command using a custom perl script. The SNP site was considered confirmed when it was supported by at least 10 reads in both the endosperm and embryo from each cross. A binomial one‐sided test was performed at each SNP site to test whether the ratio of observed maternal to paternal allele reads significantly deviated from the expected parental genome contribution in each hybrid endosperm (where the maternal to paternal allele ratio is 2 to 1) and in each hybrid embryo (where the maternal to paternal allele ratio is 1 to 1; < 0.05). SNP sites with significant bias (greater than or less than 2:1) in both hybrid endosperm tissues were considered as potentially imprinted loci. To obtain a subset of high‐confidence imprinted loci, we applied stringent standards requiring >90% uniparental transcripts in the endosperm from each cross, similar to the standard used in our previous study (Xu et al., 2014). The FDR was employed to adjust the maximal P‐value, and SNP sites with a FDR ≤ 0.05 were sorted out for subsequent analysis.

qRT‐PCR and sequencing

We collected root, stem, leaf, pollen, ovule, embryo and endosperm tissues from the ZB107 accession and used qRT‐PCR to determine the tissue‐specific expression profiles of 40 selected lncRNAs. In addition, we subjected 10 lncRNAs that were highly expressed in castor bean seeds to RT‐PCR sequencing to confirm their sequences and exon–intron splicing sites. To validate the imprinted lncRNAs, we isolated total RNA from the endosperms of reciprocal crosses at the same stage and performed RT‐PCR. The amplified products were purified and sequenced. All primers are listed in Table S12.

Accession numbers

The clean reads are deposited in Sequence Read Archive (SRA) under SRP143459.

Acknowledgements

The authors thank Dr Fei Li for helping with dissecting castor seeds and determining the seed developmental stages. This work was jointly supported by Chinese National Key Technology R&D Program (grant no. 2015BAD15B02), National Natural Science Foundation of China (31661143002, 31771839 and 31701123) and Yunnan Applied Basic Research Projects (2016FB060).

    Authors’ contributions

    AL, DL and WX conceived and designed the experiments; TY, BW, BH and YW performed the experiments; WX analyzed the data; HZ assisted in sequencing analysis; WX and AL wrote the article.

    Conflict of Interests

    The authors declare that they have no conflict of interests.

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.