Dissecting the genetic architecture of leaf morphology traits in mungbean (Vigna radiata (L.) Wizcek) using genome‐wide association study

Mungbean (Vigna radiata (L.) Wizcek) is an important pulse crop, increasingly used as a source of protein, fiber, low fat, carbohydrates, minerals, and bioactive compounds in human diets. Mungbean is a dicot plant with trifoliate leaves. The primary component of many plant functions, including photosynthesis, light interception, and canopy structure, are leaves. The objectives were to investigate leaf morphological attributes, use image analysis to extract leaf morphological traits from photos from the Iowa Mungbean Diversity (IMD) panel, create a regression model to predict leaflet area, and undertake association mapping. We collected over 5000 leaf images of the IMD panel consisting of 484 accessions over 2 years (2020 and 2021) with two replications per experiment. Leaf traits were extracted using image analysis, analyzed, and used for association mapping. Morphological diversity included leaflet type (oval or lobed), leaflet size (small, medium, large), lobed angle (shallow, deep), and vein coloration (green, purple). A regression model was developed to predict each ovate leaflet's area (adjusted R2 = 0.97; residual standard errors of < = 1.10). The candidate genes Vradi01g07560, Vradi05g01240, Vradi02g05730, and Vradi03g00440 are associated with multiple traits (length, width, perimeter, and area) across the leaflets (left, terminal, and right). These are suitable candidate genes for further investigation in their role in leaf development, growth, and function. Future studies will be needed to correlate the observed traits discussed here with yield or important agronomic traits for use as phenotypic or genotypic markers in marker‐aided selection methods for mungbean crop improvement.


INTRODUCTION
Mungbean (Vigna radiata (L.) Wizcek) is an important pulse crop mostly in tropical areas of the world, with rapidly growing usage in northern latitude countries (Nair & Schreinemachers, 2020;Sandhu & Singh, 2021).It is a source of protein, fiber, low fat, carbohydrates, minerals, and bioactive compounds (Hou et al., 2019;Sandhu & Singh, 2021;Singh et al., 2021;Tang et al., 2014).Mungbeans are mostly used for human consumption as mature whole/split seeds to make soups, sprouts, and pastries with minimal use as livestock feed (Nair & Schreinemachers, 2020).In recent years, mungbean has gained a reputation in the campaign for plantbased protein intake in lieu of animal protein due to the lower carbon footprint to combat climate change (Iseki et al., 2018;Tang et al., 2014;van Vliet et al., 2020).Major breeding objectives in mungbean are to increase seed yield and protein.To increase seed yield, direct or indirect selection can be used (Singh et al., 2021).Direct selection can be done by testing varieties in the field and measuring the harvested seed yield.Indirect selection can be done with the use of phenomics approaches, such as reflectance and vegetation indices (Chiozza et al., 2021;Parmley et al., 2019), canopy coverage (Howard & Jarquin, 2019;Xavier et al., 2017), and reproductive organs (Riera et al., 2021).Physiologically, yield is a product of total biomass and its harvest index.Leaves are an integral part of total biomass.In addition to canopy coverage, leaves also have distinct shapes and sizes and orientations.
Leaves are central to various plant processes like photosynthesis, light interception, disease and pests warning signals, soil erosion from leaf residue, crop-weed competition, and overall canopy structure (Schrader et al., 2021;Stewart & Dwyer, 1999;Wright et al., 2004Wright et al., , 2005)).Variations in leaf traits such as shape, orientation, anatomy, placement, and other functional traits contribute to the overall performance of a leaf (Baldocchi et al., 1985;X. Yu et al., 2020).Previous studies have found a positive correlation between a single leaf photosynthetic rate, biomass, and yield in Cassava (Manihot esculenta Crantz) (El-Sharkawy et al., 1990), cowpea (Vigna unguiculata (L.) Walp) (Digrado et al., 2022), and soybean (Glycine max) (Boerma & Ashley, 1988).For example, soybean narrow leaflet type cultivars have more seeds per pod than their broad leaflet counterpart, which can lead to a potential yield increase (Dinkins et al., 2002;Jeong et al., 2011;Sayama et al., 2017).Using gamma radiation to induce mutations, Tah (2008) studied the effect of multifoliate (>3 leaflets) on seed yield in different mungbean mutant generations.No direct correlation was determined with seed yield, although an indirect correlation was established with yield components such as the number of pods/plant and branches per plant.Traditionally, leaves have served as the early warning signs of pathogen infection, nutrient deficiency, waterlogging, and drought in plants (Isaac et al., 2018).

Core Ideas
• Mungbean exhibits phenotypic diversity in leaf morphology traits.
• An interaction regression model to predict ovate leaflet area was developed.• Careful attention is needed when using unified linear mixed models for association mapping of binary traits.• Candidate genes showed a lot of overlap for different trait across the leaflets.
Broadly speaking, leaves are important plant organs, and a study to characterize them is needed.The insights from such work can be useful for breeding programs.Useful leaf traits can be used as phenotypic markers if linked to yield or other important agronomic traits in marker-assisted selection (Collard & Mackill, 2008;Pottorff et al., 2012).A study of genetic diversity for traits of interest provides additional usefulness in science.Researchers have used genetic characterization of diversity panels to conduct genome-wide association studies (GWAS), for example, disease-related traits such as sudden death syndrome (Zhang et al., 2015), and iron deficiency chlorosis (Assefa et al., 2020) in soybean, fusarium wilt, plant height, days to flowering and seed coat color in mungbean (Sandhu & Singh, 2021), and root-related traits in soybean (Falk, Jubery, O'Rourke, et al., 2020) and mungbean (Chiteri et al., 2022).GWAS studies have proved vital in the detection of the marker-trait association whose top-level outcome can kickstart further investigation of genes for transgenic crop improvement or through marker-assisted selection as reviewed by Tibbs Cortes et al. ( 2021) and Zhu et al. (2008).For example, Jun et al. (2014) found that the soybean's narrow leaf was highly correlated with the number of seeds per pod, a yield component trait.A few quantitative trait loci (QTL) mapping studies for leaflet type/shape have been done in soybean (Jeong et al., 2011;Jun et al., 2014;L. Wang, Cheng, et al., 2019), cowpea (Pottorff et al., 2012), and mungbean (Jiao et al., 2016).C. Fang et al. (2017) conducted a GWAS of soybean leaflet area, length, width, and shape.
The objectives of this study, therefore, included (a) phenotypic characterization, (b) development of a leaf area prediction model, and (c) conducting genome-wide association mapping of important leaf traits of the Iowa Mungbean Diversity (IMD) panel.In this study, we use high throughput image analysis to extract the length, width, perimeter, apex and base angles, and the area of each leaflet to develop a predictive model within the IMD panel.We then conducted a comprehensive GWAS for length (L), width (W), perimeter (P), apex and base angles, and the area (LA) of mungbean leaflets.

Planting and experimental design
The IMD panel (Sandhu & Singh, 2021)  In 2020, planting was done on June 5 at the Burkey and Bruner farms, while in 2021, planting was done on June 3 at the AEA and Bruner farms.The farms can be viewed here using the ISU Lands app.Each accession was planted in 7 ft single-row plots consisting of 50 plants.A 2″ and 30″ spacing was used between plants in a plot and between plots, respectively.A randomized complete block design was used with two replicates at each location.Standard agronomic practices were used in the management of the crop.

Leaf collection, image capture, and trait extraction
Leaves were collected from one replication per location for the two years giving us four data points.Leaf collections were done during the vegetative growth and took between 3 and 5 days at each location, weather permitting and the availability of labor.Leaves were collected on the following dates in 2020: Burkey 2-5 September, Bruner 7, 13-15 September, and 2021: AEA 27-28 July, 9-11 August, Bruner 11-13, 16-17 August.The third trifoliate leaf from the top (most recent bud) on the plant was plucked destructively as the leaf below was already senescing in some plants and already dropped in others.The third leaf represented the mature leaf on the plant.Three trifoliate were collected randomly per plot, put in a Ziploc, and temporarily stored in a cooler box.The cooler box was later transported to the imaging station.
We used a high throughput imaging station to capture the leaf images described by Falk, Jubery, Mirnezami, et al. (2020), Falk, Jubery, O'Rourke, et al. (2020), and Chiteri et al. (2022).The station consists of a utility cart, a camera, a light mounting platform, and a file storage system.The 18-megapixel Canon Rebel T5i digital SLR camera (Canon USA, Inc., Melville, NY) was used.Barcodes enabled the automated renaming of the images using the Smartshooter software (Hart, n.d.).Each trifoliate was imaged separately, making a total of three images per plot, amounting to 5736 images for the 2 years (492 accessions *3 images *2 locations for 2020 and 484 accessions *3 images *2 locations for 2021).Images were routinely transferred to a local server for longterm storage pending analysis.An effort was made to make sure the leaflets were not touching each other to make it easier for the trait extraction.The leaf images were annotated using the bean_annotater tool at https://bitbucket.org/baskargroup/ leaf_annotator/src/master/ by drawing a straight line from the proximal-most to the distal-most point of the laminar (length) and between any touching leaflets.Image analysis was used to extract traits from the images.The traits extracted (Table 1) were guided by what is provided in the manual of leaf architecture (Ash et al., 1999) and other articles reviewed (Digrado et al., 2022;Y. Wang, Jin, et al., 2019;X. Yu et al., 2020) and as diagrammed in Figure 1.The trait extraction pipeline is  easily scalable with small modifications to real-time settings where the images could be captured nondestructively in their natural setting.

Phenotypic description and statistical analysis
The diversity panel leaves visual phenotypic traits were noted during the growing seasons.The accessions had oval trifoliate in most cases.Only the genotypes in the 2020 and 2021 growing seasons were kept for the study, and those with fewer than three measurements for each attribute per plot were filtered away.The average of the three measurements per trait for the remaining 462 genotypes was used to calculate best linear unbiased predictors (BLUPs).BLUPs and estimates of broad-sense heritability were calculated using the R "inti" package (Lozano-Isla, 2021) developed for the analysis of multi-environment trials using the linear mixed model: where Y ij is the phenotypic value of the j th genotype in the i th environment, μ is the overall mean, (1|loc_year) i is the random effect due to the i th environment, (1|line) j is the random effect due to the j th genotype, and e ij is the random error following N(0, σ e 2 ).Broad-sense heritability was estimated using the formula below as described previously (Cullis et al., 2006;Piepho & Möhring, 2007): where  2  , is the broad-sense heritability, δ 2 is the variance, g is the genotypes, and BLUP is the average standard error of the BLUPs.Variance components, blups distribution, and trait correlations were also estimated.Trait correlations were performed and visualized using the hierarchical clustering by setting parameter "hc.order = TRUE" in the ggcorrplot package.All statistical analyses were performed in the R statistical computing environment (R Core Team, 2021).

Modeling the leaf area
Several simple and multiple empirical regression models were developed to estimate the leaflet and trifoliate area of the 458 ovate-shaped mungbean accessions.No model was developed for the lobed leaves as they were only four genotypes.Simple models included using a single variable such as length, width, and perimeter traits, while multiple models included a combination of length and width.The parameters length, width, and perimeter are simple to measure on a small scale hence have frequently been used in the estimation of leaf area in other crops such as soybean (Bakhshandeh et al., 2011;Wiersma & Bailey, 1975), cowpea (Digrado et al., 2022), mungbean (Hamid & Agata, 1989), vines (X.Yu et al., 2020), and other crops (cereal and legumes) as tabulated by Bakhshandeh et al. (2010).For each model, the models, r-square (R 2 ) and adjusted r-square (adj.R 2 ), and the residual standard errors were extracted, recorded (Table S1), and plotted, respectively.Regression was performed for each leaflet and the trifoliate area.

Genome wide association study and candidate gene identification
Association mapping was conducted for each trait (length, width, perimeter, area, apex, and base angles) per leaflet using BLUPs and a leaflet type trait (normal or lobed).Several methods for association were compared.The single-locus unified mixed linear model (MLM) (J.Yu et al., 2006) was implemented in both Trait analysis by ASSociation, Evolution, and Linkage (TASSEL) (Bradbury et al., 2007) and Genome Analysis and Prediction Integrated Tool (GAPIT) (J.Wang & Zhang, 2021).Multilocus models FarmCPU (Liu et al., 2016) and Bayesian methods were implemented in GAPIT and Selection of Variables with Embedded screening using Bayesian methods (SVEN) (Li et al., 2022), respectively.The population structure (first four principal components [PC]) and kinship (K matrix) were calculated from the single nucleotide polymorphism (SNP) data in TAS-SEL and fed as model covariates in GAPIT for association analysis.For the binary trait, leaf type, an additional generalized linear model (GLM) with a binomial family was fit using the GWASTools package (Gogarten et al., 2012).Additionally, we randomly sampled five times within the ovate genotypes and then included the four lobed genotypes.The lobed leaf, in this case considered minor allele frequency, was maintained at 2%.Note that 26550 SNP markers generated using genotype by sequencing technology (Sandhu & Singh, 2021) were used in the study after filtering out those minor allele frequencies of >0.01 and retaining those with <15% missing data.Candidate gene search was performed by using the National Center for Biotechnology Information (NCBI) genome viewer and Legume Information System (LIS) GBrowse (https://www.legumeinfo.org/)tools to locate the significant SNP on the mungbean genome (Annotation release 101) (Kang et al., 2014) alongside the annotation deposited with the NCBI (Crop Genomics Lab, n.d.; Sayers et al., 2022).Gene ontology was inferred using UniProtKB (UniProt Consortium, 2021).

Phenotypic description, trait correlations, descriptive statistics, and analysis of variance
Mungbean leaves exhibit diverse phenotypic diversity as recorded during the growing seasons (Figure 2).The prominent phenotypes noted include leaflet type (A: ovate or lobed), leaflet size (B: small, medium, large), lobed angle (C: shallow, deep), and vein coloration (D: green, purple).
The statistical analysis focused on the quantitative traits length, width, perimeter, area, apex, and base angles for each genotype whose overall BLUPs were normally distributed following the removal of the outliers (Bernal-Vasquez et al., 2016).As shown in Figure 3 (Trait Distribution), phenotypes were grouped according to the units of measurements, that is, in cm (A), cm 2 (B), and angles (C).In subpanel A, although the perimeter showed the highest values (∼4X) than length and width, they were all normally distributed.The area (subpanel B) showed a constant variation in all the leaflets.The base angle (subpanel C) for the terminal leaflet was relatively higher than both the left and right leaflet base angles.The same pattern was observed at the apex angle.In Figure 3 (Trait Correlation), all traits showed varying levels of correlations.By employing hierarchical clustering, length, width, perimeter, and area showed strong positive correlations between them (0.78-0.98), and apex and base angle showed moderate to high positive correlation (0.58-0.92), while the two groups showed varying levels of both negative and positive correlations between them (−0.08 to 0.57).The width between the leaflets showed a strong positive correlation (0.93-0.98) than the lengths between the leaflets (0.82-0.84).
Traits exhibited variation in the standard deviations (SD) (Table 2) within a leaflet but retained a similar pattern between leaflets.Area had the largest SD, followed by apex and base angle, and perimeter, while length and width had the smallest SD.For the coefficients of variation (CV), the apex and base had the lowest (<5%), while the area had the highest at 13%.Width (7%) had a higher coefficient of variation than length (4%).The results are similar among the leaflets.Broad sense heritability (H2.cullis) ranged between 51% and 72% with length having a lower range (51%-53%), while width had a range of 70%-72%.
Variance components of the phenotypic variations of the traits are as tabulated (Table 3).Similar patterns with slight deviation were seen in the percentages of the total variance explained by each component.For the left leaflet, line and loc_year (Environment) accounted for 2%-40% of the F I G U R E 3 Boxplots showing the distribution of the phenotypes (G) grouped by units of measurements (cm) (A), cm 2 (B), angles (C), and correlations between the phenotypes variation for area, perimeter, length, width, apex, and base traits.Line contributed just 12% of the variation for length, while loc_year contributed the least for apex (2%) and base (5%) trait variations.For the terminal leaflets, the line and loc_year effects accounted for 1%-37% of the trait's variability.Line contributed 18% of the variation for length, while loc_year contributed the least for apex (2%) and base (1%) variance.For the right leaflet, the line and loc_year effects accounted for 1%-43% of the trait's variation.Line contributed 14% of the length, while loc_year accounted for only 1% in apex and 3% in base variations.

Modeling the leaf area
Several regression models were fitted for predicting the leaflet area.Length, width, and perimeter showed high correlations (Figure 3) with the area and were used to build the models.Each model's residual distribution, residual standard error, and R 2 /adjusted R 2 were examined (Figure 4, Tables S1).
For each leaflet area (Figure 4A), the multiple regression model, including the interaction between the length and width parameters, had an adjusted R 2 of 0.97 and low residual standard errors.Simple regression models using length as the only predictor variable performed dismally with R 2 of between 0.76, 0.70, and 0.76 and large standard errors (2.92, 3.21, 2.97) for the left, terminal, and right leaflets, respectively.Models with width as the only predictor had an R 2 of 0.95, only 2% less than the interaction models with moderate residual standard errors (1.41, 1.5, 1.31).Figure 4B shows that width was a better predictor of the trifoliate area than length.
Figure S1 shows the normal distribution of residuals indicating homoscedasticity of the variance and quantile-quantile (Q-Q) plots for both the top (interactions) and width models.

Genome wide association and candidate gene identification
Single locus (MLM) and multiple-locus (FarmCPU and SVEN) models for association mapping detected varying numbers of significant markers (Figure 5, Table S2).Across leaflet and traits, GAPIT_FarmCPU had the greatest number of significant SNPs, followed by TaSSEL, SVEN, and GAPIT_MLM, respectively (Figure 5A).The multilocus models detected more significant associations than the single locus models (Table S2).Markers 1_12063422 and 11_5315338 did overlap between TaSSEL and GAPIT_MLM (Figure 5B), while markers 1_12063422, 2_22658605, and 7_8776168 overlapped between GAPIT_FarmCPU and SVEN (Figure 5C).The overlapped markers showed significant associations with the different traits as discussed below.Multilocus models have a high power to detect association provided (J.Wang & Zhang, 2021;S.-B. Wang et al., 2016).For most traits in this study, more than three significant markers were detected, but only the top three (smallest p-values) from the multilocus GAPIT_FarmCPU were picked for candidate gene analysis (Table 4).The description of the significant markers and candidate genes is presented in Table 4. Figures 6, 7, and Figure S2 are the manhattan and corresponding Q-Q plots showing the distribution of significant markers across the genome for traits associated with each leaflet.For Figure S2.1, except for the apex angle, the rest of the traits were significantly associated with at least two or more SNP markers for the left leaflet.In Figure S2.2, each trait for the right leaflet had more than three significant SNP markers.Similarly, in Figure S2.3, all the traits were associated with more than three significant SNP markers for the terminal leaflet.For the leaflet type (ovate or lobed) trait, there were more than three associated significant markers with a strong signal with marker 2_5935179 on chromosome 2.All the Q-Q plots showed slight variations from the theoretical normal distribution quantiles.Significant markers associated with leaflet type from the five randomly run analysis have significant SNPs mostly on chromosomes two, six, and eleven (Figure S3).Only the first random sampling had a significant SNP 3_11482658 on chromosome three.
Fifty-eight percent of the candidate genes ontology were classified as molecular function, 28% as a biological process, 0% for the cellular component, and 13% had no classification.Forty-three percent of the significant SNPs were found in exon genomic regions, while 32% were in the introns and 22% in intergenic regions.The candidate genes identified (Table 4) are involved in various biological functions such as adenosine 5′-triphosphate (ATP) hydrolysis, protein refolding, T A B L E 4 Description of identified significant marker-trait associations from the GAPIT_FarmCPU model and the candidate genes associated with them inferred from the mungbean genome Vradiata_var6 at National Center for Biotechnology Information (NCBI) Annotation release 101 (Sayers et al.,   response to heat, cofactors, transport, exocytosis, cell surface receptor signaling pathways, and regulation of transcription, RNA processing and methylation, kinase activity, and cell cycle.Of interest were those significant SNPs overlapping between traits and also among leaflets.Marker 1_12063422 found within the intron of LOC106771372/Vradi01g07560 described as chaperone protein ClpB3, chloroplastic is impor-tant for ATP hydrolysis, protein refolding, and response to heat.Marker 7_8776168 was not associated with any locus.Marker 7_35888986 within the exon of LOC106766374 described as (3S,6E)-nerolidol synthase 1 is involved in binding to a magnesium (Mg) ion.Marker 5_1309071 within the exon of LOC106761475/Vradi05g01240 is described as probable purine permease 4 is involved in purine nucleobase transmembrane transporter activity.Marker 4_9008678 was not associated with any locus.Marker 2_5935179 within the exon of LOC106775932/Vradi02g05730 described as protein indeterminate-domain 9 is important in metal binding.Marker 8_35030148 is found in the intron of an uncharacterized LOC106769596/Vradi08g15000.Marker 2_5972317 within the intron of LOC106756136 described as beta-galactosidase 8-like is involved in beta-galactosidase activity, carbohydratebinding, and metabolic processes.Marker 3_725105 within the exon of LOC106757286/Vradi03g00440 is described as LRR receptor-like serine/threonine-protein kinase ERL2 is involved in kinase activity.

DISCUSSION
In the present study, we characterized the phenotypic diversity, developed a linear model to predict leaf area, and conducted a genome-wide association mapping for mungbean leaflet traits within the IMD panel.The major phenotypes observed in the field include the ovate or lobed leaflets, varying leaf sizes (small, medium, and large), shallow or deep lobed angle, and green or purple vein coloration (Figure 2).These characteristics have been documented before and provided a useful resource to classify and characterize the IMD panel (Poehlman & Milton, 1991).In this panel, the oval, medium-size leaflet type, deep lobed angle, and green vein coloration are the predominant traits.The shallow lobed angle is a characteristic of the F1 progeny of the cross between ovate and lobed leaflet type accessions as studied using QTL mapping by Jiao et al. (2016), showing dominant expression of lobed angled over ovate and the low-level outcrossing in the panel as a characteristic of mungbean (H.Chen, Wang, et al., 2016).The left and right leaflets are mirrors of each other.However, the correlation of length (0.82-0.84) between leaflets was lower than the width (0.93-0.98) between leaflets.These results coincide with other studies that used the original Gielis equation (Shi et al., 2018) and simplified Gielis equation (Su et al., 2019), which showed that leaf width manifested less variation than the length.The relatively stable broad sense heritability (51%-70%) for all traits reflects a significant contribution of the line effect to the total variance.Line and loc_year components contributed more to the total phenotypic variation.In the case of length, line contributed less than loc_year toward the variance.loc_year explained low variations (1%-5%) in the apex and base angle traits.A significant amount of the phenotypic variation was compounded in the residuals, which could be due to the lack of replicates as only one sample was collected per block per location per year.
We developed a multiple regression model for the 458 ovate genotypes in our panel.The model LA = b 0 + b 1 L + b 2 W + b 3 L × W was the best at predicting the area of each leaflet with an adjusted R 2 of 0.97 and < = 1.10 residual standard errors (Figure 4A, LA is the Leaf Area, b 0 is the intercept, b 1 , b 2 , and b 3 are regression coefficients).Our study presents findings that researchers can confidently use as a nondestructive robust leaf area prediction model in mungbean in instances when they still need to use low-throughput tools like caliphers or rulers.In other disciplines such as pathology and entomology where destructive sampling of many samples is still needed to estimate the level of injury or disease incidence, the high-throughput nature of our methods will be time saving.Koyama and Smith (2022) even propose to scale a model to predict the total area of shoots in five species and diverse plant taxa.Leaf area has been used to predict shoot biomass in Arabidopsis (Weraduwage et al., 2015), an important parameter in the calculation of LAI, and a critical variable in processes such as transpiration, photosynthesis, and light interception (Bakhshandeh et al., 2011;El-Sharkawy et al., 1990;H. Fang et al., 2019).Simple or multiple linear and quadratic regression equations have been developed for use nondestructively in other crops such as sunflower (Helianthus annuus L., cv.Melody) (Rouphael et al., 2007), black gram (V.mungo L. Hepper) (Mishra et al., 2000), soybean (Pandey & Singh, 2011;Wiersma & Bailey, 1975), common bean (Pohlmann et al., 2021), horticultural crops (Khan et al., 2016), 15 vines (X.Yu et al., 2020), and maize (Stewart & Dwyer, 1999).Using 3125 leaves from 780 taxa, Schrader et al. (2021) developed a nondestructive method to estimate leaf size using a correction factor when only the length, width, and shape are available.In mungbeans, Hamid and Agata (1989) used the model y = b 0 + b 1 L × W to estimate the terminal leaflet area for each of the five varieties; however, they argued against using a universal model for all the leaflets, considering the significant variation from one genotype and location to another.However, over the past three decades, we have moved to an omics age where larger panels are common; therefore, there is a need for a robust model to predict leaflet area and avoid any further use of destructive methods in the plots (Y.Yang et al., 2021).Our results using >450 genotypes compared to their five genotypes came up with a similar observation that both L and W had a high correlation with the leaf area, which is expected.Therefore, this study complements similar work in other crops, while our image extraction methods can be scaled and used even for nondestructively collected images giving additional advantage to existing methods.
We used our imaging methods to conduct a comprehensive GWAS conducted on mungbean leaf traits, which is currently lacking in published works.Our study reports more than 50 significant SNPs (Figures 6 and 7, Figure S3 and Table 4) associated with the various leaflet traits.The novelty of our work is reporting SNP markers that overlapped between traits, leaflets, and different association mapping methods utilized for a comprehensive investigation.C. Fang et al. (2017) conducted a similar study on soybean leaflet area, width, length, and shape.Our study reflects the case of pleiotropy of genes and the polygenic nature of leaf traits (Sayama et al., 2017).As pointed out in the results, the major overlapping genes included Vradi01g07560, Vradi05g01240, Vradi02g05730, Vradi08g15000, and Vradi03g00440, which were associated with similar traits in different leaflets.The genes are involved in various plant processes such as response to heat, transport activity, metal binding, kinase activity, and metabolic process (Kosentka et al., 2017;Lin et al., 2017;Seddigh & Darabi, 2014).GWAS/QTL studies have been conducted in various crops for leaf traits such as leaf area, rachis length, and total dry weight in oil palm (Eleaeisguineensis) (Babu et al., 2019), shape, length, width, area, and specific leaf weight in soybean (Jeong et al., 2011;Jun et al., 2014;Sayama et al., 2017;L. Wang, Cheng, et al., 2019).More genetic studies on soybean leaf-associated traits can be found at https://www.soybase.org/.The significant marker 3_11482658 detected in the random sampling was ∼6,000,000 bases away to be further considered in this study.
Additionally, QTL mapping studies for different leaflet shapes have been previously conducted in several other legumes, including soybean (narrow vs. broad) (Jeong et al., 2011), cowpea (sub-globose vs. hastate) (Pottorff et al., 2012), and mungbean (ovate vs. lobed) (Jiao et al., 2016).Although FarmCPU did detect various signals with a strong association on chromosome 2 (Figure 6) for leaflet type, we believe this to be a spurious result.This was confirmed by the GLM logistic regression model, which did not detect any significant results (Figure 7).A previous QTL mapping study in mungbean by Jiao et al. (2016) narrowed down the classical lobed leaflets margin (lma) locus to chromosome three, which has a syntenic region on chromosome one in common bean.Within the lma locus, the gene Vradi03g04470, encoding an A20/AN1 zinc finger domain transcription factor, was the probable LMA gene.There could be two reasons for the two outcomes.First, the unbalanced case-control data cause a violation of the constant residual variance assumption with linear mixed models (J.Chen, Somta, et al., 2016;Dai et al., 2021).Out of the 458 genotypes used for association mapping, only four genotypes were lobed, while the rest were ovate.Second, it could be an indication of the weakness of using unified linear mixed models instead of logistic mixed models for binary traits unless certain conditions are met (H.Chen, Wang, et al., 2016;Dai et al., 2021;Hayeck et al., 2015;Jiang et al., 2015;Shenstone et al., 2018;J. Yang et al., 2014).The relationship between leaf-associated traits and yield, as previously emphasized, can be used effectively exploited by plant breeders to meet their objective of increasing yield by using them either as phenotypic markers or direct targets of selection (Baldocchi et al., 1985;Heath & Gregory, 1938;Board & Harville, 1992;Jeong et al., 2011;Jun & Kang, 2012;Jun et al., 2014;Ma et al., 2022;Sayama et al., 2017;L. Wang, Cheng, et al., 2019).

CONCLUSIONS
Mungbean leaves in the Iowa diversity panel exhibited a range of phenotypic variations during the growing seasons.
The linear regression model developed in this study can be used to nondestructively predict the area of the ovate leaflets.
Using GWAS, the genes LOC106771372/Vradi01g07560, LOC106761475/Vradi05g01240, LOC106775932/ Vradi02g05730, LOC106756136, and LOC106757286/ Vradi03g00440 are associated with multiple traits (length, width, perimeter, and area) across the leaflets (left, terminal, and right) would be suitable potential candidates for further investigation in their role in leaf development, growth, and function.This study also serves as a caution against employing unified linear mixed models for association mapping of binary variables or phenotypes as shown with leaflet type and instead favor logistic regression models.To employ the observed features discussed here as phenotypic or genotypic markers in marker-aided selection techniques Description of the leaf traits used in the study Trait Description Unit Length The distance from proximal-most to distal-most point of the lamina cm Width The distance across the laminar that lies perpendicular to the axis of greatest length the apical termination of the midvein to the pair of points where a line perpendicular to the midvein and 0.75 × length from the base intersects the margin Degrees ( o ) Base angle The angle from the leaflet base to the pair of points where a line perpendicular to the midvein and 0.25 × length from the base intersects the margin Degrees ( o )Note.Traits were prefixed with a letter indicating the leaflet, that is, l_(left), t_(terminal), r_(right).

F I G U R E 1
Representation of how traits were measured for each leaflet.The red line represents the length (L), and the blue line represents the width (W).The blue and orange lines are perpendicular to the redline.The perimeter is not shown in the diagram.

F
I G U R E 2 Phenotypic traits observed within the mungbean diversity panel as A (leaflet type), B (leaflet size), C (lobed angle), and D (vein coloration)

F
Regression models to predict individual leaflet area (A) and the trifoliate area (B) using both simple and multiple explanatory variables.The appended numerals/text to the extreme right of each bar shows the residual standard errors (A and B) and leaflet (in B).

F
Venn diagram showing the number of significant single nucleotide polymorphism (SNP) markers associated with all traits grouped by the statistical method and package use.Both single and multi-locus methods used are shown in (A).(B and C) Markers associated with single and multi-locus models F I G U R E 6 Manhattan and corresponding quantile-quantile (Q-Q) plots showing results of GAPIT_FarmCPU linear mixed model (LMM) genome-wide association studies (GWAS) for traits associated with the leaflet type (oval or lobed).The blue line indicates the Bonferroni genome-wide correction set with "cut_off/∝" = 0.05.

F
Manhattan and corresponding quantile-quantile (Q-Q) plot of showing results of the logistic regression general linear model (GLM) for leaflet type implemented in GWAS Tools.The blue line indicates the Bonferroni genome-wide correction set with "cut_off/∝" = 0.05.
Summary statistics, coefficients of variation (CV), and broad-sense heritability (H2.cullis) of the leaflet traits Variance components of the model used in the analysis T A B L E 2Abbreviations: CV, coefficient of variation; Max, maximum; Min, minimum.T A B L E 3