Proteomic analysis of reserve proteins in commercial rice cultivars.

Abstract Rice consumption is rising in western countries with the adoption of new nutritional styles, which require the avoidance of gluten. Nevertheless, there are reports of rice allergic reactions. Rice grains contain a low amount of proteins most of which are storage proteins represented by glutelins, prolamins, albumins, and globulins. Some of these proteins are seed allergenic proteins as α‐amylase/trypsin inhibitor, globulins, β‐glyoxylase, and several glutelins. Italy is the major rice producer in Europe, and for this, seed reserve proteins of four Italian rice cultivars were characterized by 2D‐GE analysis. Some differentially abundant proteins were identified and classified as allergenic proteins, prompting a further characterization of the genes encoding some of these proteins. In particular, a deletion in the promoter region of the 19 KDa globulin gene has been identified, which may be responsible for the different abundance of the protein in the Karnak cultivar. This polymorphism can be applied for cultivar identification in commercial samples. Seed proteome was characterized by a variable combination of several proteins, which may determine a different allergenic potential. Proteomic and genomic allowed to identify the protein profile of four commercial cultivars and to develop a molecular marker useful for the analysis of commercial products.

The digestibility and biological value of rice proteins are higher than those of the other major cereals (Amagliani, O'Regan, Kelly, & O'Mahony, 2017).
Rice is generally recognized as a hypoallergenic food, is the first solid food introduced into the diet of infants, and is used in most elimination diets for food allergy diagnostic programs in children and adults. Rice flour represents a common ingredient in the preparation of gluten-free products like bread and pasta. Rice proteins contributing significantly to the quality and technological functionality of these products (Amagliani et al., 2017). The use of rice proteins as food supplement in sports is also increasing, substituting those commonly used from casein, whey, and soy. Some studies have shown that rice protein concentrates can be used as value-added ingredients in the production of bread (Jiamyangyuen, Srijesdaruk, and Harper 2005), biscuits (Yadav, Pandey, & Kumar, 2011), and edible films (Adebiyi, Adebiyi, Jin, Ogawa, & Muramoto, 2008) improving their nutritional and functional properties.
At variance with other cereals, rice seed proteome is made mainly of glutelins (60% to 80%) that are encoded by 34 genes, while only 5% is represented by prolamins that are encoded by 34 genes (Kawakatsu, Hirose, Yasuda, and Takaiwa 2010). SSP are stored in rice endosperm cells within protein bodies (PB); in particular, glutelins and globulins are deposited in PB-II storage vacuoles, whereas prolamins accumulate in the endoplasmic reticulum(ER)-derived protein body I (PB-I) structures that form within the lumen of the rough ER (Kim, Lee, Yoon, Lim, & Kim, 2013;Saito et al., 2008). Some proteins encoded by single gene have also been identified as the seed allergenic proteins RAG2 and RA5, globulin 19 KDa (Goliáš et al., 2013), or the 56 kDa gluten-bound starch synthase I (GBSSI) (Krishnan & Chen, 2013).
Prolamins are classified into three groups (10,13,and 16 kDa) according to their mobility on SDS-PAGE gels.
Rice was the first crop to have its genome publicly available (International Rice Genome Sequencing Project & Sasaki, 2005), giving the opportunity of developing functional genomics tools as proteomics, invaluable to assess global changes in protein profiles (Agrawal and Rakwal, (2006);Agrawal & Rakwal, 2011;Hirano et al., 2016).
This study used two-dimensional gel electrophoresis (2D-GE), to examine the proteomic profile of mature seeds of four Italian rice cultivars (cvs). The cvs analyzed belong to different rice commercial groups: Carnaroli and Karnak are of the Carnaroli group; Arborio and Volano are of the Arborio group. Our specific goals were to compare the proteomic profile of SSP in these Italian cvs and to verify the expression of those proteins considered as allergens. Differentially  Seeds of O. sativa L. spp. japonica, cvs Arborio, Volano, Carnaroli, and Karnak, were kindly provided by the Rice Research Unit, CREA (Vercelli, Italy). The main characteristics of each cvs are reported in Table S1. Commercial samples of Carnaroli and Karnak were purchased from local market. For both proteomics and genomics analyses, seeds were milled with the refrigerated sample mill Knifetec™ 1095 (Foss) to obtain a fine powder.

| Protein extraction
Proteins were extracted from rice flours under denaturing conditions according to Khan et al. (2008) with minor modification.
Fifty mg of rice flour was added with 1.4 ml of extraction buffer containing 125 mM tris(hydroxymethyl)aminomethane Tris-HCl pH 6.8, 8 M Urea, 4% (w/v) sodium dodecyl sulfate SDS, 5% (v/v) β-mercaptoethanol, and 20% (w/v) glycerol. Samples were vortexed vigorously and incubated at room temperature with overnight agitation. Supernatants were collected after centrifugation at 12,000 g for 15 min at room temperature and then precipitated with 4 volumes of cold acetone. Pellets were dissolved in ¾ volumes of water and ¼ volumes of cold trichloroacetic acid 50% (v/v). Samples were kept in ice bath for 30 min and centrifuged at 12,000 g for 15 min at 4°C. Pellets were rinsed with cold acetone for three times.
The rehydration step was carried out for 12 hr at 20°C. Isoelectrofocusing (IEF) was carried out using a Protean® i12™ IEF System (Bio-Rad®) according to the following program: 250 V for 1 hr, a linear ramp to 4,000 V for 1 hr, and finally 4,000 V to reach 15 kV.
The pH 6-11 IPG strips (GE Healthcare) were rehydrated for 12 hr at 20°C in 135 µl of rehydration solution previously reported with specific buffer pH 6-11 (GE Healthcare). A 100 µg of protein extract was loaded using the cup-loading method. IEF was carried out using a Protean® i12™ IEF System (Bio-Rad®) according to the following program: 150 V for 2 hr, 300 V for 2 hr, 600 V for 2 hr, 1,000 V for 1 hr, a linear ramp to 4,000 V for 1 hr, and 4,000 V to reach 15,000 V.

| Image and data analysis
For each sample, a master gel was obtained including all reproducible spots from three replicate gels. In particular, gels were analyzed by ChemiDoc™ MP System (Bio-Rad®) with Image Lab Software 4.0 (Bio-Rad®). PDQuest 8.0 2D-GE Analysis Software (Bio-Rad®) was used to compare three 2D-GE raw gels (biological replicates) for each rice cultivar. The analysis included spot detection, background subtraction, gel matching, generation of a master gel, and relative quantification of each spot. The spot intensity in each master map is proportional to the amount of protein in replicate gels and is normalized to the total protein fraction. To quantify differences between the samples, a threshold fold variation window was set at ±2 and Student's t test p-value < .05.

| Protein identification
For in-gel digestion, protein spots were removed from the 2D-GE with a razor blade and destained in a solution of 100 mM 1:1 (v/v) ammonium bicarbonate/acetonitrile (ACN) over night. Proteins were digested in a solution of 10 mM NH 4 HCO 3 and 10% (v/v) ACN containing trypsin (13 ng/µl) at 37°C overnight. The digested peptides were then suspended in 10 μl of 0.1% trifluoroacetic acid and purified with a ZipTipC18 (Merck Millipore) using the procedure recommended by the manufacturer. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF/MS) analysis in linear mode was carried out as previously described (Visioli et al., 2016) using a 4800 Plus MALDI-TOF/TOF™ (AB SCIEX). Three biological replicates for each sample were performed.
Peptides were also analyzed by ORBITRAP MS/MS using the system LTQ Orbitrap XL (Thermo Fisher Scientific), as previously reported (Graziano et al., 2019). This analysis was performed either as a confirmation of the MALDI/TOF results or as an alternative when MALDI/TOF failed.

| Genomic DNA extraction
Genomic DNA was extracted from 300 mg of Carnaroli and Karnak flours using the GK-Resin method as previously reported (Pafundo, Gullì, & Marmiroli, 2011). DNA yields were determined spectrophotometrically using a VARIAN Cary®50 UV-VIS device (Agilent Technologies), by measuring the absorbance (A) at 260 nm. Quality of DNA was estimated by agarose gel electrophoresis and by evaluating the ratio A 260 / A 280 .

| Proteomic data
Peptide mass fingerprinting was carried out using the Mascot program (http://www.matri xscie nce.com). Proteins were identified by searching through the SWISS-PROT and NCBI nonredundant databases (limited to Oryza sativa).

Raw data obtained by ORBITRAP MS/MS were submitted to
Proteome Discoverer (Thermo Fisher Scientific) against Viridiplantae database (http://www.expasy.org), Oryza sativa was chosen for the taxonomic category, and 0.2 Da was used as the mass error tolerance. In both cases, the search parameters were as follows: trypsin as digestion enzyme with a maximum of two missed cleavage; carbamidomethylation of cysteines (delta mass: 57.0215) as static side chain modification; oxidation of methionine (delta mass, 15.9994) and deamination of asparagine, glutamine, and arginine as dynamic side chain modification (delta mass, 0.9840); precursor mass tolerance 10 ppm; and fragment mass tolerance 0.8 Da.
The relative abundance of the protein spots was visualized using Heatmapper, a freely available web server (Babicki et al., 2016).

| Genomic data
Sequence information of the gene encoding the 19 KDa globulin was retrieved from Ensembl Plants (http://plants.ensem bl.org/Oryza sativ a/Trans cript ) (Fig. S1). Specific primers were designed using the software Primer3 (http://bioin fo.ut.ee/prime r3-0.4.0). Primers specificity was verified in silico, using the BLAST program (http://blast. ncbi.nlm.nih.gov/). Primers were purchased from Sigma-Aldrich. The information about target genes and primers characteristics is reported in Table 1

| Proteomic profile and protein identification in rice seeds
Proteomic analysis of seed proteins of four Italian rice cvs was carried out by 2D-GE. In particular, protein extracts were initially separated across a broad-range IGP strips (pH 3-10) ( Figure 1) and then across midrange strips (pH 6-11) (Figure 2) to allow a better separation of basic proteins. In total, 73 spots were identified in pH 3-10 strips and 95 spots in pH 6-11 strips, of which 19 spots resulted differentially abundant ( Table 2). The basic range IPG strips associated with cup-loading method allowed a better separation of basic proteins, ranging from 10 to 60 kDa.
Among the differentially expressed spots, the most abundant were identified as different members of the glutelin family (spots 2-6, 9-12, Figures 1e and 2e) and as globulin (spot 1, Figure 1f).
The relative amount of each spot was compared in the four samples, and it was observed that five glutelins (spots 2, 3, 4, 9, and 10) were more abundant in Arborio and Volano respect to Carnaroli and Karnak (Figure 3). In particular, GluA2 (spot 4) had the lowest amount in Karnak, and GluB2 (spot 9) was significantly more abundant in Volano.
The isoforms of GBSSI (spots 7-8 in Figure 1e) were more abundant in Carnaroli and Karnak as compared to Volano and Arborio, in which they were almost undetectable on the gel. This enzyme, which is encoded by the Waxy gene, regulates the elongation of amylose chain in developing seeds, and the level of grain amylose is directly associated with the amount of GBSSI in the endosperm (Mikami et al., 2008), and differences in the abundance of GBSSI are directly related to rice cooking quality. Arborio and Volano have an apparent amylose content (AAC) of about 17.4% as compared to Carnaroli and Karnak, which have 22.1% and 20.9%, respectively (data from Ente Nazionale Risi). Therefore, according to the suggested commercial classification, they are low (<20%) or medium (21%-25%) AAC (Biselli et al., 2014).
The differences observed are strictly related to the genetic characteristics of Wx sequence; in particular, Karnak is characterized by the allelic pattern G/C (intron 1/exon 6 SNPs), while Arborio has the T/A pattern (Biselli et al., 2014).
Furthermore, eight spots of low molecular weight were identified as allergenic proteins (Table 2)  The globulins of rice have not enjoyed the same level of attention as the prolamins and glutelins. To date, four globulins of molecular weight 16 kDa, 25 kDa (Komatsu & Hirano, 1992;Krishnan & White, 1992), 26 kDa (Nakase et al., 1996), and 19 kDa (Krishnan & Pueppke, 1993;Shorrosh et al., 1992) have been isolated from rice grain endosperm. Each of these proteins is expressed as a precursor, which is processed to generate the mature protein. DNA sequence analysis of the 26 kDa globulin suggested the protein was very similar to wheat HMW glutenin subunit and barley D hordein (Nakase et al., 1996), while the 19 kDa globulin is most likely encoded by a single-copy gene (Shorrosh et al., 1992) which is similar to the globulins of wheat, rye, and triticale (Krishnan & Pueppke, 1993). It has also been demonstrated that the 19 kDa globulin is mainly localized in the inner part of the rice grain; therefore, it is present also in the polished rice and can be responsible for the allergic reaction (Satoh, Tsuge, Tokuda, & Teshima, 2019).
The study of the promoter region and the ORF of this gene may be useful in order to identify functional markers for breeding/selection of rice Italian cvs with low allergenic potential.
Rice is commonly regarded as a hypoallergenic cereal; however, it has attracted increasingly public attentions after the first allergic  Proteins were focused on IPG strips pH 6-11, and differentially abundant spots are numbered on the master gel reaction reported in 1979 (Shibasaki, Suzuki, Nemoto, & Kuroume, 1979). Hereafter, a number of clinical cases on rice allergy contracted either by contacting with raw rice, inhaling of rice powders or vapors, or by ingesting of rice have been reported (Zhu et al., 2015).

TA B L E 2 Identification of highly abundant protein spots by MALDI-TOF/MS and ORBITRAP MS/MS
The allergenic activity of these molecules is not well understood, (Goliáš et al., 2013;Krishnan & Chen, 2013) (Satoh et al., 2019).
The α-amylase/trypsin inhibitor family includes several members like RAG2, and their proteomic profile varies among rice cultivars; their relative abundance was either higher or lower than the reference Nipponbare (Teshima, Nakamura, Satoh, & Nakamura, 2010).

| Amplification of genes coding seed storage protein
Among the seed allergenic proteins identified in this study as differentially abundant, RAG2, RA5, and the 19 kDa globulin are encoded by single gene copy. Considering the available genomic sequences, we utilized the primer pairs shown in Table 1 to isolate the corresponding genomic sequences from the four cvs. In particular, for each gene The promoter sequences of the 19 KDa globulin gene were studied in different cvs using different pairs of primers ( Figure 5, Table 1).

| PCR endpoint on commercial products
Samples of rice purchased at local market and identified as Endpoint PCR was performed using the primer pair 19B ( Figure   S2a) and 18S ( Figure S2b) on both commercial samples and on Carnaroli and Karnak control samples. All samples were amplifiable with 18S primers as expected ( Figure S2b). Using 19B primers, gDNAs extracted from the Karnak flour and the commercial "Karnak rice" were both amplified resulting in amplicons of the same size (123 bp); differently, the gDNA extracted from the Carnaroli flour did not amplify, while gDNA extracted from the commercial "Carnaroli rice" was amplified showing an amplicon of 123 bp ( Figure   S2a). This last result may be explained considering that the commercial "Carnaroli rice" could be a mixture of Karnak and Carnaroli grains. Effectively, it is quite difficult to obtain commercial rice with the name Karnak, since in most cases it is labeled as "Carnaroli rice," which is the common product group, according to Italian regulation.
In conclusion, some proteins showing a significant increase/decrease were identified by mass spectrometry, the most abundant where members of the glutelin family and the globulin. Each cultivar has shown a specific profile of these allergenic proteins, suggesting a variability in their allergenic potential. The genomic analysis of 19 kDa-globulin, one of the most abundant protein, revealed the presence of a new polymorphism in the promoter region of the Karnak allele, which could explain the different proteomic profile. Since rice is a staple food and an important source of proteins worldwide, a combined proteomic and genomic approach could be justified for a qualitative screening of different cultivars and for developing functional markers for the breeding/ selection of rice varieties with lower allergenic potential.

ACK N OWLED G M ENTS
This study was funded by a grant from AGER Foundation, (RISINNOVA project grant n. 2010(RISINNOVA project grant n. -2369. We wish to thank Mr. D.
Campioli for technical assistance in carrying out 2D-GE analysis.

CO N FLI C T O F I NTE R E S T
The authors do not have any conflicting interests.  Table 1 E TH I C A L A PPROVA L This study does not involve any human or animal testing.