The Plantain Proteome, a Focus on Allele Specific Proteins Obtained from Plantain Fruits

Abstract Proteomics has been applied with great potential to elucidate molecular mechanisms in plants. This is especially valid in the case of non‐model crops of which their genome has not been sequenced yet, or is not well annotated. Plantains are a kind of cooking bananas that are economically very important in Africa, India, and Latin America. The aim of this work was to characterize the fruit proteome of common dessert bananas and plantains and to identify proteins that are only encoded by the plantain genome. We present the first plantain fruit proteome. All data are available via ProteomeXchange with identifier PXD005589. Using our in‐house workflow, we found 37 alleles to be unique for plantain covered by 59 peptides. Although we do not have access (yet) to whole‐genome sequencing data from triploid banana cultivars, we show that proteomics is an easily accessible complementary alternative to detect different allele specific SNPs/SAAPs. These unique alleles might contribute toward the differences in the metabolism between dessert bananas and plantains. This dataset will stimulate further analysis by the scientific community, boost plantain research, and facilitate plantain breeding.

The copyright line for this article was changed on 3 August 2018 after original online publication. complicate the proteome analysis of crops. Bananas and plantains are polyploid crops originated from two wild diploid species: Musa acuminata (AA), which is highly polymorphous, with spindly plants that grow in clumps, and Musa balbisiana (BB), a more homogeneous hardy plant with a massive pseudo-trunk. There are nowadays diploid, triploid, or tetraploid genome groups. [6,7] The main genome groups are AA, AB, AAA, AAB, and ABB. Most dessert banana cultivars are AAB or AAA. The Cavendish subgroup, that is sold on the export market [8] has an AAA genome constitution while plantains are AAB. Plantains are sweet acid starchy bananas with typically long fruits and are mostly consumed after frying or boiling. Plantains are an important staple crop in West and Central Africa, India, and Latin America. [6] Both dessert bananas and plantains are considered a non-model crop and the complexity of their genomes makes it challenging to analyze the transcriptome and the proteome. [9] We used here an easy and reproducible protocol for protein extraction and identification and we present the first proteome of plantain fruits (AAB). We created our own workflow to tackle the difficulties of working with a triploid non-model species without an available database. The mass spectrometry proteomics data have been deposited at the ProteomeXchange Consortium Partner Repository [10] via PRIDE [11] with the dataset identifier PXD005589. These results will stimulate further analysis by the scientific community and will boost plantain research and facilitate breeding.
Plantains fruits and Cavendish fruits were bought in the local supermarket in Leuven, Belgium. Five biological replicates (fruits) of each cultivar were selected based on their phenotypic characteristics and the same green peel color. All fruits were kept separately, cleaned, peeled, their pulp was cut into thin slices and immersed immediately in liquid nitrogen. All ten samples where lyophilized to a water content of 2.5%. After drying, the samples were hermetically sealed and stored at room temperature until the proteomics analysis was performed. Banana tissues are considered difficult for protein extraction due to the presence of many interfering compounds which makes the extraction process more difficult. [12] Lyophilization provides easier material for manipulation without losses in protein content and is an easy and safe way to transport the samples. [12] Protein extractions were performed according to the phenol extraction/ammonium acetate precipitation method we published [13] and adapted for gel free proteomics. [14] www.advancedsciencenews.com www.proteomics-journal.com Twenty μg of proteins were digested with trypsin (Trypsin Protease, MS Grade Thermo Scientific) and purified by Pierce C18 Spin Columns (Thermo Scientific). The digested samples (0.5 μg/5 μL) were separated in an Ultimate 3000 (Thermo Scientific) UPLC system and then in a Q Exactive Orbitrap mass spectrometer (Thermo Scientific) as described. [15] The Q Exactive Orbitrap mass spectrometer (Thermo Scientific, USA) was operated in positive ion mode with a nano spray voltage of 1.5 kV and a source temperature of 250°C. Proteo Mass LTQ/FT-Hybrid ESI Pos. Mode Cal Mix (MS CAL5-1EASUPELCO, Sigma-Aldrich) was used as an external calibrant and the lock mass 445.12003 as an internal calibrant. The instrument was operated in datadependent acquisition (DDA) mode with a survey MS scan at a resolution of 70 000 (fw hm at m/z 200) for the mass range of m/z 400-1600 for precursor ions, followed by MS/MS scans of the top ten most intense peaks with +2, +3, +4, and +5 charged ions above a threshold ion count of 16 000 at 17 500 resolution using normalized collision energy (NCE) of 25 eV with an isolation window of 3.0 m/z and dynamic exclusion of 10 s. All data were acquired with Xcalibur 3.0.63 software (Thermo Scientific). For protein identification, we used MASCOT version 2.2.06 (Matrix Science) against our in house Musa A-B database containing acuminata AA proteins (dh PahangV1), the non-redundant unique balbisiana BB proteins (PKW) (http://banana-genomehub.southgreen.fr/) and the usual contaminants for mass spectrometry (76 220 proteins). The parameters used to search were: parent mass tolerance of 10 PPM, fragment tolerance of 0.02 Da, oxidation of M as variable modification, carbamidomethyl C as fixed modification and up to one missed cleavage was allowed for trypsin. Results from MASCOT were imported to Scaffold version 3.6.5. In Scaffold, the threshold was set to minimum one peptide identified with 95% confidence and the false discovery rate (FDR) was automatically calculated based on default parameters from the software.
Using our Musa A-B database we identified in total 2144 different proteins with 0.2% FDR (Supporting Information, Table 1). Taking into account only the proteins identified in at least two biological replicates reduces this number to 1731, of which 1344 proteins were identified in Cavendish fruits and 1363 in plantain fruits (Supporting Information, Table 1). Esteve et al. [16] utilized the proteominer beads to identify the proteome of Cavendish fruits. The authors were able to identify 1131 proteins using a cross species approach (Musa EST database and Uniprot Viridiplantae Database). The three most abundant annotation categories were oxidation-reduction, ATP binding, and nucleotide binding. In our work, we used a merged database derived from two diploid species (AA and BB). We identified and annotated, 4 years later, more proteins due to the availability of more powerful mass spectrometry and more genetic resources. In the category Molecular Function, 525 different gene ontologies could be retrieved. The five most represented ontologies were GO:0016491 oxidoreductase activity (168 proteins); GO:0016787 hydrolase activity (122 proteins); GO:0000166 nucleotide binding (112 proteins); GO:0003824 catalytic activity (103 proteins); and GO:0005524 ATP binding (99 proteins) (Supporting Information, Table 1). The aim of our study was to characterize the proteome of plantain fruits and compare it to Cavendish fruits to identify important allele specific proteins in a cultivar that is not sequenced, plantain. The main contrasting characteristics between plantain and Cavendish are undoubtedly related to unique alleles that can explain together with epigenetic regulations the different phenotypes. [17] To find allele specific peptides in plantain fruits, we used a basic but very useful principle: spectral counting (Scaffold). Potential plantain allele specific peptides were filtered using the following conditions. Maximum spectral count in Cavendish = 0, which means the peptide was never identified in Cavendish; median spectral count in plantain ࣔ 0, being identified at least in three biological replicates. To detect single amino acid polymorphisms (SAAPs) in acuminata (A) and balbisiana alleles (B), the identified plantain unique peptides were filtered further. Only peptide sequences that were exclusively identified in a B derived protein accession were accepted. Their allelic acuminata homolog was searched using the Greenphyl homolog function (http://www.greenphyl.org/cgibin/get_homologs.cgi) to determine the SAAP. Only plantain specific proteins where the acuminata homolog was successfully identified were accepted (Supporting Information, Table 2). This allowed us to allocate a protein as an A and B allele version. Further annotations of the proteins were retrieved from Uniprot software (http://www.uniprot.org/uploadlists/). Analysis of gene functions from the allelic specific proteins were made through GO enrichment annotations via our in house software (https://labtrop.shinyapps.io/UniGO/).
Following our workflow, we identified 37 interesting loci spread over all 11 chromosomes ( Table 1). We appointed 59 peptides as B allele specific and 47 peptides as A allele specific. The introduction of M. balbisiana genes is said to be correlated to hardiness, drought tolerance, a changed nutritional value, increased starchiness, and different maturation process. [18][19][20] To check which pathways are affected by mutations/polymorphisms, we performed a GO annotation for the 37 loci. GO:0004134 (4-alpha-glucanotransferase activity) and GO:0004133 (glycogen debranching enzyme activity) are the two most significant GOs for Molecular Function (p-value 3.4e-06 and 2.0e-05, respectively) (Supporting Information, Table 3). One single amino acid change can drastically affect the function of proteins. [21][22][23] Through evolution, mutations in the coding region of a gene are likely to have a different biological function, especially if the mutations occur in the protein domain, since they are generally considered as the basic units of protein folding, evolution, and function. [24] Ramu et al. [25] highlighted some possible deleterious mutations in domesticated cassava using whole genomic screening experiments of wild ancestors and cultivars. Like banana, cassava cultivars are clonally propagated and this genomic screening study suggests that many deleterious mutations have not been crossed out. We expect a similar situation in banana. Advanced whole genomic screening experiments enable the identification and interpretation of mutations at the genome level. [24,25] Although we do not have access (yet) to whole-genome sequencing data from triploid banana cultivars, we show that proteomics is an easily accessible complementary alternative to detect the different allele specific SNPs/SAAPs.
To our knowledge, this is the first proteomic investigation in plantain fruits, and the most extensive fruit proteomic study in the genus Musa. This public release of the plantain fruit proteome is an important step for plantain varietal selection and breeding.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.