Common bean (Phaseolus vulgaris L.) with increased cysteine and methionine concentration

In common bean (Phaseolus vulgaris L.), seed storage protein deficiency is associated with increased total cysteine and methionine concentration. The goal of this study was to generate germplasm lines that combine this characteristic with adaptation to short season conditions in Manitoba, Canada. A recombinant inbred line population was developed by crossing the storage protein deficient genotype, SMARC1N‐PN1 with the cultivar Morden003. Two lines, 2‐37 and 3‐84, with a stable protein profile over 2 years at two locations were identified. Like SMARC1N‐PN1, both lines had a significantly higher cysteine concentration than Morden003, by approximately 35%. Methionine levels were elevated by approximately 15%, while tryptophan levels were also increased by approximately 30%. Line 2‐37 had a significant increase in protein quality, as measured by in vitro protein digestibility corrected amino acid score, by approximately 40%, as compared with Morden003. The increased protein quality for this line is attributable to higher levels of total cysteine and methionine, while having an overall reduction in crude protein concentration. Line 2‐37 had a similar seed yield as SMARC1N‐PN1, with a maturity comparable to Morden003. The results of high‐density single nucleotide polymorphism (SNP) genotyping and quantitative trait locus analysis of recombinant inbred lines indicated that variation in cysteine concentration was determined by the phaseolin locus, while variation in methionine concentration was determined by both the phaseolin and lectin loci. SNP markers that track the introgression of phaseolin and lectin deficiency into the Morden003 background were identified and validated.


| INTRODUCTION
Common bean (dry bean, Phaseolus vulgaris) plays a major role in addressing the nutritional needs of a growing increasing human population, with its importance as a crop in Sub-Saharan Africa and South and Central America. Common bean constitutes primarily a source of protein in human diets (De Ron et al., 2015). Like other pulses, the quality of its protein, as defined by the balanced composition of nutritionally essential amino acids, is limited first and foremost by the low levels of sulfur amino acids, methionine and cysteine, and to some extent tryptophan (Nosworthy et al., 2017). In combination with cereals, common bean, like other legumes, can provide a balanced source of protein. Nevertheless, from a biofortification standpoint, improving protein quality of common bean represents a valuable goal.
A recent example from cereal crops is the development of improved Quality Protein Maize, with enhanced levels of both lysine and methionine (Planta & Messing, 2017). Improving protein quality of common bean and other pulses is also relevant to nutritional claims on protein content (Wiggins et al., 2018).
Globulins are a group of seed storage proteins being part of the cupin superfamily. The 7S globulin, phaseolin, normally constitutes approximately half of total seed protein in cultivated varieties of common bean (Vitale & Bollini, 1995). Phaseolins are encoded by a single, complex locus on chromosome 7 containing multiple genes in tandem (Joshi et al., 2017;Talbot et al., 1984). Despite the fact that phaseolins have few methionine residues, it was previously reported that the total concentration of methionine is positively correlated with phaseolin levels, since they are so abundant in seed (Gepts & Bliss, 1984). Montoya et al. (2010) proposed to use phaseolin isoforms from wild accessions as a possible strategy to improve protein quality. Lectins are the second most abundant seed proteins after phaseolins and account for approximately 5%-10% of total seed protein. Lectins protect seeds from insect pests and herbivory. Most lectins are encoded at a single, complex arcelin/phytohemagglutinin/α-amylase inhibitor (APA) locus on chromosome 4 (Freyre et al., 1998;Osborn et al., 1986).
We previously reported that a progressive deficiency in major seed proteins, phaseolin and lectins, results in a significant increase in sulfur amino acid concentration, in a series of genetically related lines, with cysteine concentration elevated by up to 70% and methionine concentration by 10%-20% (Taylor et al., 2008). The changes in sulfur amino acid levels happen through proteome rebalancing, to compensate for the absence of phaseolin and lectins, which favors the accumulation of sulfur-rich proteins (Liao et al., 2012;Marsolais et al., 2010;Pandurangan et al., 2016;Pandurangan, Sandercock, et al., 2015;Yin et al., 2011). In the present work, we hypothesized that the characteristics identified in SMARC1N-PN1 could be transferred to other germplasm lines in a different genetic background.
SMARC1N-PN1, deficient in phaseolin and major lectins (Osborn et al., 2003), was crossed with the cultivar Morden003 (Mündel et al., 2004), to recover lines having increased cysteine and methionine concentration, with adaptation to short season conditions in Manitoba, Canada. High density single nucleotide polymorphism (SNP) genotyping and quantitative trait loci (QTL) mapping was performed to determine the effect of genetic loci on sulfur amino acid profiles.

| Plant material
A total of 183 F 2:8 recombinant inbred lines (RILs) were developed through a single-seed descent method from a cross between SMARC1N-PN1 and Morden003. "Morden003" is a navy bean cultivar with early maturity and primary adaptation to southern Manitoba, Canada (Mündel et al., 2004). SMARC1N-PN1 is a navy bean germplasm line in 'Sanilac' genetic background, with deficiency in phaseolin and lectin polypeptides (Osborn et al., 2003 Each trial consisted of two rows per line with 5 m length and 75 cm row spacing, in a randomized complete block design with three replications. Each plot was harvested at natural (no dessicants) harvest maturity in bulk. The average growing season minimum and maximum temperatures in Morden were 10.8 C and 11.7 C, and 23.6 C and 25.7 C, respectively, during the 2 years of the study. The average growing season minimum and maximum temperatures in London were 13.2 C and 13.2 C, and 24.8 C and 24.6 C, respectively. The total precipitation for 2014 and 2015, respectively, during the growing seasons was 243.9 mm and 171.5 mm in Morden, and 249.7 mm and 269.5 mm in London.

| Determination of phaseolin and lectin protein profiles
Seed samples (1.2 g) from the F 2:8 generation collected in the greenhouse were ground with a Kleco ball mill (Garcia Machine, Visalia, CA). Soluble protein was extracted from 50 mg of ground tissue using 1 ml of cold extraction buffer (50 mM Tris-HCl pH 8.0, 50 mM KCl, 1 mM CaCl 2 , 10% glycerol) containing 1 mM phenylmethylsulfonyl fluoride (PMSF) and 1 mM dithiothreitol (DTT). Extracts were centrifuged twice at 25,000Âg for 20 min at 4 C. A total of 4.5 μl of the 4 Â SDS protein sample buffer was added to 1.5 μl of extract. Samples were boiled for 5 min at 99 C.

| Amino acid analysis
Seed samples from the F 2:9 generation collected in the greenhouse were prepped and total amino acids analyzed as described by Jafari et al. (2016). Samples were acid hydrolysed in the presence of 6 N HCl and 1% phenol. Hydrolysis was performed using an Eldex Workstation (Napa, CA). NorValine was used as an internal standard. Cysteine was detected separately as cysteic acid, after performic acid oxidation. The Linearity of the peak areas at different concentrations was determined.
Calculations were based on the area under the peak for a known concentration. For a subgroup of lines, amino acids were analyzed by HPLC after derivatization with phenylisothiocyanate at the SPARC BioCentre of SickKids Hospital (Toronto, ON) as previously described (Pandurangan, Pajak, et al., 2015). For tryptophan, samples were subjected to alkaline hydrolysis with barium hydroxide (20 hr at 110 C in an autoclave) and analyzed as per ISO protocol 13,904 (International Organization for Standardization, 2016). For quality control, the NIST soy flour Standard Reference Material 3234 was used. Tryptophan was quantified, using α-methyltryptophan as the internal standard, on a Shimadzu UPLC system (Columbia, MD), complete with an SIL-30 AC autosampler. The pH of the protein solution was monitored and recorded every 1 min for 10 min and the in vitro protein digestibility (IVPD) was calculated as follows:

| Measurement of crude protein and protein digestibility
where ΔpH 10min refers to the change in pH from the initial value of 8.0 to the end of the 10-min period.

| SNPs and linkage analysis
One hundred eighty-one RILs were kept, and two were removed, due to a high percentage of heterozygosity and missing data, and 474 SNP markers were obtained after filtration for distortion, missing data and redundancy. Markers with greater than 20% missing data were removed for linkage map construction. A genetic linkage map was developed for Morden003/SMARC1N-PN1 using MapDisto v. 1.8.1 (Lorieux, 2012) with an r max of 0.24, LOD min of 3, and the Kosambi function. Marker order was optimized by the "order" and "ripple" functions. Linkage groups were assigned to a physical common bean chromosome (v2.1) by using BLASTn results from Phytozome   Table S1 provides the high-density SNP genotyping information for the two parents and RILs, along with their genotype at the phaseolin and lectin loci, and the concentration of cysteine, methionine, cysteine + methionine, and S-methylcysteine. SNP genotyping information is also provided for parents of SMARC1N-PN1: the navy bean cultivars Sanilac and Great Northern US 1140, which carries an APA locus haplotype conferring erythroagglutinating phytohemagglutinin (pha-E) deficiency (Osborn & Bliss, 1985), along with three phaseolin-deficient P. coccineus genotypes , the wild accession G12882 that contains the insecticidal lectin arcelin-1, and the related germplasm lines, SARC1 and SMARC1-PN1 (Osborn et al., 2003). For a subgroup of 26 lines, sulfur amino acids were determined separately at the SPARC BioCentre (Table S2). These data were excluded from the QTL analysis. However, they revealed similar contrasts between protein phenotypic groups (Viscarra Torrico, 2017). Amino acid data were missing for an additional subgroup of 14 lines.

| QTL analysis of sulfur amino acid traits
QTL mapping was performed to identify genomic loci contributing to the variation in sulfur amino acid concentration. A genetic map was built using the high-density genotyping information obtained with the BARCBean6K_3 array ( Figure S1). The phenotypic markers, Lectin and Phaseolin, were mapped on Pv04 ($46 Mb) and Pv07 ($3 to 6 Mb), respectively ( Figure 1a). Table 1 summarizes the information on the QTL identified in this study. Figure Table S3 summarizes the genotyping results and allele information for the markers. Table S4 lists the primers or sequence interval targeted for genotyping.
Supplementary data document the detection of specific alleles at each locus (Dataset S2 to Dataset S5).

| Protein quality evaluation of selected RILs
A group of seven SS lines was selected for further analysis. Dataset  Table 1). Horizontal bar and error bars indicate average ± standard deviation. Red indicates the genotype derived from parental cultivar Morden003; blue the genotype derived from germplasm line SMARC1N-PN1. (d) Methionine concentration of genotypes contrasted at SNP marker SS715645808, corresponding to the peak of the lectin QTL, and SS715646465, corresponding to the peak of the phaseolin QTL. (e) Methionine + cysteine concentration of genotypes contrasted at SNP marker SS715645808, corresponding to the peak of the lectin QTL, and SS715646455, corresponding to the peak of the phaseolin QTL

| DISCUSSION
The results of QTL analyses performed in this study clarify the effects of the phaseolin and lectin loci on variation in cysteine and methionine concentrations in the Morden003 Â SMARC1N-PN1 RIL population. QTL were identified, even though amino acid data for some of the lines were either missing or excluded due to the fact they were obtained on a different analytical platform, and therefore were not directly comparable with those of the majority of the RILs. The phaseolin locus was the only QTL associated with cysteine concentration (Table 1 and Figure 1b). Phaseolin-deficiency led to increased methionine concentration, with an additive effect of lectin deficiency ( Figure 1d).
The SNP markers identified in this study may be useful to test genetic material for phaseolin or lectin deficiency, and to follow the introgression of the traits (Figure 2). Parent-of-origin specific SNPs present on the genotyping array are able to track the segregation of phaseolin deficiency, although they are relatively distant, approximately 1 Mb on either side of the corresponding locus (Table S3)  stable protein profile over 2 years and locations (Dataset S6). Factors that may have affected the stability of protein composition include outcrossing, although its frequency is low in common bean (Ibarra-Perez et al., 1997). To face this potential problem, physical isolation might be required to grow storage protein deficient common bean genotypes, like what is practiced with waxy corn, for example.
Environmental variability is another possible factor that may have influenced the stability of the protein profile. It is worth noting that SMARC1N-PN1 is not completely devoid of phaseolin, due to the presence of a residual, functional copy of a β-phaseolin gene. A 20-fold decrease in β-phaseolin accumulation was associated with a polymorphism converting a proximal G-box to an ACGT motif in the promoter of β-phaseolin . The presence of this residual phaseolin gene may contribute to variation in protein profile in the RILs. Tracking phaseolin and lectin deficiency at different stages of breeding, using the PCR markers identified in this study ( Figure 2), is likely to facilitate the recovery of pure genotypes, having a stable protein profile.
The results of amino acid analyses of genotypes grown over 2 years at two locations confirmed prior results linking storage protein deficiency with increased cysteine concentration (Table 4) (Taylor et al., 2008). However, the increase in cysteine concentration (approximately 30%-40%) was not as high as previously reported in the initial study (up to 70%, when compared to SARC1 as the reference genotype). To maximize the concentration of sulfur amino acids, it may be necessary to apply sulfate fertilizer. This was only done at the Morden site, a common practice in Manitoba, but not generalized in Ontario. Cysteine concentration was previously shown to positively respond to sulfate nutrition in SMARC1N-PN1, under controlled conditions (Pandurangan, Sandercock, et al., 2015). In the present study, line 2-37 displayed a substantial increase in PDCAAS, based on in vitro digestibility assay ( Table 6). The increased protein quality for this line is attributable to higher sulfur amino acid levels, while having a slightly lower protein concentration. The differences in crude protein concentration, as determined by elemental analysis, were consistent with those determined for the sum of total amino acids, measured after acid hydrolysis. The increase of in vitro PDCAAS measured for line 3-84 as compared with Morden003 was relatively marginal, equal to only 6.5%. This reveals how potentially difficult it may be to implement a strategy based on storage protein deficiency to improve protein quality. Tryptophan is the second most limiting amino acid in pulses after sulfur amino acids (Nosworthy et al., 2017). The amino acid score for tryptophan was substantially increased,  Note: Average. n = 3. %CP, percent crude protein determined by elemental analysis. Asp, aspartate; Thr, threonine; Ser, serine; Glu, Glutamate; Pro, proline; Gly, glycine; Ala, alanine; Cys, cysteine (determined by oxidation method); Val, valine; Met, methionine; Ile, isoleucine; Leu, leucine; Tyr, tyrosine; Phe, phenylalanine; His, histidine; Lys, lysine; Arg, arginine; Trp, tryptophan (determined via alkaline hydrolysis). phaseolin and major lectins have similar agronomic characteristics as regular genotypes (Hartweck & Osborn, 1997;Osborn & Bliss, 1985). This is consistent with the findings from this study. The lectin deficiency present in SMARC1N-PN1 does not involve a complete lack of lectins. Rather, the absence of erythroagglutinating phytohemagglutinin encoded by pha-E, leucoagglutinating phytohemagglutinin encoded by pha-L and lectin encoded by lec4-B17 is compensated by high levels of leucoagglutinating phytohemagglutinin encoded by PDLEC2, α-amylase inhibitor 1, α-amylase inhibitor-like protein and mannose lectin FRIL . Kusolwa et al. (2016) recently introduced an arcelin from tepary bean (Phaseolus acutifolius) into a phaseolin and lectin-deficient background. They found that the new weevil-resistant lines had higher levels of threonine (34%-44%), cysteine (15%-20%), methionine (0%-11%), lysine, and valine, whereas the levels of isoleucine and leucine stayed the same or were decreased, similar with the present study. Giuberti et al. (2019) recently examined correlations between variations in protein composition and nutritional characteristics of common bean. They reported a strong positive correlation between genotypes lacking phaseolin and having high levels of α-amylase inhibitor, as in the present study, and the accumulation of iron and zinc. This could be of further interest from a biofortification point of view.
In conclusion, a germplasm line, designated HS 2-37, has been generated, that exhibits a substantial increase in protein quality as compared with its commercial parent cultivar. This property might be interesting from a nutritional standpoint, or for specific food applications, since it is also relevant to nutritional claims on protein content.
In addition to testing agronomic characteristics, breeding for this trait likely requires stringent controls to ensure genetic purity, using molecular markers, as well as monitoring for seed quality characteristics, including protein concentration and essential amino acid profiles.

CONFLICT OF INTEREST
The authors declare no conflict of interest.

AUTHOR CONTRIBUTIONS
RCV-T, RLC, PNM, AH and FM contributed to conceptualization; RCV-T, AP, ASG, BZ and SP to investigation; ASG and MD to formal analysis; QS, PBC, PNM, and JDH contributed resources; RCV-T, AP, ASG, BZ, AH, and FM contributed to writing-original draft; SP, QS, JDH, PNM, AF, and FM contributed to writing-review and editing; FM and AH participated to supervision; AH was responsible for funding acquisition.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available through the Scholars Portal Dataverse at https://doi.org/10.5683/SP2/ JL7Y2I. Lines developed through this study have been deposited at the Plant Gene Resources of Canada under the following names and accession numbers: HS 2-37 (CN 120262) and HS 3-84 (CN 120263).