Genome‐wide association study for yield and quality of granulated cassava processed product

The starchy storage roots of cassava are commonly processed into a variety of products, including cassava granulated processed products (gari). The commercial value of cassava roots depends on the yield and quality of processed products, directly influencing the acceptance of new varieties by farmers, processors, and consumers. This study aims to estimate genetic advance through phenotypic selection and identify genomic regions associated and candidate genes linked with gari yield and quality. Higher single nucleotide polymorphism (SNP)‐based heritability estimates compared to broad‐sense heritability estimates were observed for most traits highlighting the influence of genetic factors on observed variation. Using genome‐wide association analysis of 188 clones, genotyped using 53,150 genome‐wide SNPs, nine SNPs located on seven chromosomes were significantly associated with peel loss, gari yield, color parameters for gari and eba, bulk density, swelling index, and textural properties of eba. Future research will focus on validating and understanding the functions of identified genes and their influence on gari yield and quality traits.


INTRODUCTION
Over half a billion people in Africa, Asia, and Latin America depend on cassava (Manihot esculenta Crantz), a clonally propagated root crop for dietary calorie sources (Montagnac et al., 2009).Its remarkable resilience to climate change and poor soil conditions has made it a valuable crop for farmers, processors, and consumers (Burns et al., 2010).It is a versatile crop with the potential for food, industrial (starch and biofuel), and export (chips) applications.Africa primarily produces cassava for human consumption, accounting for over 90% of the total production, whereas in Asia and South America, human consumption takes about 50% and 43%, respectively (Nweke, 2004).Cassava roots, being highly perishable and containing cyanogenic glucosides, are commonly processed into various granulated and paste products for detoxification, increase in shelf-life, and value addition.By far, the two processed cassava products account for the largest proportion of cassava food consumption in Africa, absorbing more than 90% of the produced roots (Nweke, 2004;Phillips et al., 2004).These products not only play a crucial role in food security but also contribute significantly to local economies by providing employment in both agriculture and food processing industries, thereby offering sustainable income opportunities for smallholder farmers.
The granulated cassava product also known as gari in West Africa and farinha in Brazil is made through grating, fermenting, dewatering, and toasting of semi-dried mash on dry heat (Udoro et al., 2014).Gari can be eaten dry or soaked in water or milk (akin to breakfast cereals) or made into eba, a West African stiff dough enjoyed with soup (Oduro et al., 2000).
Under optimized processing conditions, genetic variations in gari yield, quantified as the amount of gari produced per hectare (t/ha) or as a percentage (%) relative to the initial weight of the storage roots, can be observed among different varieties (Almazan, 1992;Bassey, 2018;Ibe & Ezedinma, 1981).The yield of gari from a particular cassava variety has significant economic implications, given the fixed costs involved in processing, such as labor, time, energy, and transportation (Ibekwe et al., 2012;Nweke, 2004;Wossen et al., 2017).Besides gari yield, other quality parameters of the processed products are also important and many of these are determined both by varietal (genetic) differences and processing methods (Ngoualem Kégah & Ndjouenkeu, 2023).
Cassava gari quality is one of the most important factors that processors and consumers consider when choosing a variety.Some determinants of gari quality include the color and functional properties such as bulk density and swelling index (Ndjouenkeu et al., 2021;Ngoualem Kégah & Ndjouenkeu, 2023;Teeken et al., 2021).Bulk density is an indicator of the weight of gari and can affect its packaging and distribution.The swelling index is used to evaluate the expansion of gari particles in the presence of cold water.The swelling of gari influences its texture, taste, ease of preparation, digestibility, satiation, and overall consumer satisfaction (Oluwamukomi & Lawal, 2020).Two sensory attributes, including hardness and gumminess measured using the compression test, are important determinants of gari acceptance by consumers (Teeken et al., 2021).Hardness describes the softness or firmness of a product, while gumminess is a secondary parameter of cohesiveness or stickiness (Olaosebikan et al., 2023).
With cassava steadily gaining popularity worldwide and across Africa and a shift toward a demand-led breeding approach, breeding programs are increasingly focusing on product quality traits in addition to yield.This strategic shift is essential to ensure the adoption of new varieties that are not only high-yielding, pest-and disease-resistant but also meet the specific quality demands of consumers and processors (Bechoff et al., 2018;Ndjouenkeu et al., 2021;Teeken et al., 2021;Wossen et al., 2017).Traditionally, breeders have faced significant challenges in addressing quality traits in cassava due to several factors.The large population sizes and the limited number of roots available in early breeding stages hinder effective phenotypic selection (Ceballos et al., 2015).
Furthermore, the complexity of converting fresh roots to gari and the heterogeneity in processing methods complicate the assessment of quality traits (Abass et al., 2013).These challenges are compounded by the inherently long breeding cycle of cassava, which further delays the realization of improvements in product quality through conventional breeding methods.Despite these challenges, the potential benefits of advanced molecular breeding tools are substantial.These tools offer the possibility to significantly accelerate the development of new cassava varieties that meet specific quality parameters, thereby overcoming traditional limitations and enhancing breeding efficiency.
Among cutting-edge tools, genome-wide association studies (GWASs), employing high-density genetic markers, have proven to be a powerful approach for uncovering genomic regions underlying variation in traits of interest (Wallace et al., 2014).The molecular markers identified through GWAS are instrumental in facilitating marker-assisted selection, significantly enhancing breeding efficiencies.For cassava, these markers are especially valuable as they allow breeders to make selections during the early testing stages, where a large number of entries are typically evaluated.This approach not only increases selection intensity but also shortens the breeding cycle and reduces the need to manage and process large numbers of plots, thereby significantly cutting costs.
Several studies have successfully utilized GWAS to uncover quantitative trait loci (QTLs) associated with a variety of traits in cassava, including resistance to cassava brown streak disease (Kayondo et al., 2018;Nandudu et al., 2023), cassava root mealiness (Uchendu et al., 2021), cassava green mite pest (Ezenwaka et al., 2018), and cassava root rot (Hohenfeld et al., 2022), as well as dry matter content, provitamin A carotenoid content (Rabbi et al., 2017), and starch pasting properties (Phumichai et al., 2022).These studies highlight the power of GWAS in providing valuable genetic insights for developing cassava varieties with superior agronomic and nutritional traits.QTL discovery and candidate gene identification studies have been conducted for end-use quality traits in other crops like wheat (Triticum aestivum) and rice (Oryza sativa).Specific genomic regions associated with flour yield and dough characteristics have been identified in wheat (Chang et al., 2022;Ji et al., 2021;Kristensen et al., 2018Kristensen et al., , 2019)).In durum wheat, multiple chromosomes have been linked to semolina dough color, yield, and firmness (Johnson et al., 2019).In rice, Misra et al. (2018) identified multiple major and minor quantitative trait nucleotides (QTN) linked to traits like hardness, adhesiveness, and springiness.
The objective of this study was to identify specific genomic regions and candidate genes associated with gari yield and quality in cassava, using GWASs and estimate the genetic progress achievable through phenotypic selection.

Core Ideas
• This study is the first comprehensive genome-wide association study (GWAS) to uncover the genetic factors influencing gari quality in cassava.• GWAS identified nine significant single nucleotide polymorphisms (SNPs) closely linked to variations in the measured traits.• Five core candidate genes and hypothetical proteins were identified, co-localized with significant SNPs.• Phosphoglucosamine mutase family protein and the pectin lyase-like superfamily protein demonstrated associations with swelling index.These findings underscore the potential of molecular breeding to enhance gari yield and quality.

Plant material and field trials
An association mapping panel of 200 cassava clones sourced from the International Institute of Tropical Agriculture (IITA), Oyo, Nigeria, was used in this study.The selection of genotypes was based on their fresh root weight in previous seasons as well as their pedigree and single nucleotide polymorphism (SNP) relationships (https://cassavabase.org/ breeders/trial/6630).The genotypes were carefully selected from cycle 4 breeding of the Next Generation Cassava Breeding Project (www.nextgencassava.org)and comprised 119 distinct families derived from biparental crosses involving 63 female and 58 male parents (Table S1).
The collection was established in Ikenne in the first planting season (2020/2021) and Ikenne and Ibadan in the second season (2021/2022).Ikenne is a humid forest located at 6˚84′N and 3˚69′E, while Ibadan is a derived savanna located at 7˚49′N and 3˚90′E.The trials were established in October of each year and harvested in October of the following year.The field trials were planted in an augmented design, with each block comprising three replicated standard checks.In each block, a total of 25 genotypes were included, with the three standard checks randomly distributed within the block.The selection of the three common standard checks, namely, TMEB 419, TMS-IBA000070, and TMS30572, was based on their stability and consistent product yield (Aghogho et al., 2022).Each plot had dimensions of 4 m by 5 m and consisted of 20 plants established in four rows, with a spacing of 1 m between rows and 0.8 meters within rows.Healthy stem cuttings measuring 20-25 cm in length were used as planting material.Regular manual weeding was performed throughout the experimental period to maintain a clean field.

Gari processing
Ten kilograms of cassava roots harvested 12 months after planting were processed into gari at the IITA research facility, following the established standard operating procedure (Abass et al., 2013).The roots were peeled, washed, drained, and mechanically grated (Dandrea Agriport Grater, Maquinas d' Andrea).The resulting grated mash was then placed into jute bags and allowed to ferment naturally at room temperature for 72 h.The fermented mash was subsequently dewatered using a hydraulic press for 24 h and the resulting pressed cake was pulverized and sieved to eliminate any fibrous materials.The semi-dry product was subsequently toasted into gari in a stainless-steel pan at a temperature of 120.5˚C until it reached a moisture content ranging between 8% and 10%.Once the gari had cooled down, it was carefully stored in a cool dry place in airtight containers until further quality analysis.Strict monitoring was carried out at each processing step to minimize operator variation and reduce sample batch effects.

Eba preparation
Eba, the stiff dough, was prepared in duplicate from 100 g of gari samples according to the approved standard operating procedure of the RTB foods (https://doi.org/10.18167/agritrop/00604).Gari was gradually added into 250 mL of hot water in a plastic bowl, which was covered and allowed to stand for 60 s to enable gelatinization.After this initial step, the gari-water mixture was transferred to a 30 L planetary dough mixer set at 142 rpm and mixed for another 60 s to form a stiff dough.The freshly prepared eba was immediately scooped into aluminum foil, wrapped tightly to prevent moisture loss, and placed in a styrofoam box to maintain warmth until quality assessment.The temperature of the boiling water and the dough was consistently monitored using a thermometer to ensure accuracy in replication.

Product quality trait phenotyping
The association panel was phenotyped for selected traits based on their influence on either productivity or consumer acceptability.The productivity-related traits included peel loss and yield of gari (expressed as a percentage), which are critical for evaluating the efficiency and output of the production process.Consumer-oriented traits were color parameters for both gari and eba, bulk density of gari, swelling index of gari, and textural properties of eba such as hardness and gumminess.
The specific methodologies and measurement units for each trait are detailed in Table 1.

Diversity Arrays Technology sequencing (DArTseq) genotyping and genotypes filtering
Leaf samples were collected in 96-well boxes and freezedried to preserve DNA integrity.DNA extraction, genotyping, and SNP calling for each sample were performed using the DArTseq genotyping platform (https://www.diversityarrays.com/technology-and-resources/dartreseq/).Twelve of the 200 genotypes with more than 20% of their SNP marker data missing were excluded from the association analysis.SNPs with a missing call rate above 50% and a minor allele frequency (MAF) below 5% in the remaining genotypes were also removed.This resulted in a final dataset of 53,150 markers, which were used for GWAS, population structure, and genetic kinship analyses.SNP marker quality control was conducted using the Trait Analysis by Association, Evolution, and Linkage (TASSEL) software (Bradbury et al., 2007).

2.6
Statistical analysis

Phenotype data analysis
Summary statistics for phenotypic data were done using the function stat.desc from the pastesc package and visualized using violin-imposed boxplot in the R software.
A linear mixed model was used to obtain the best linear unbiased predictions (BLUPs) for each genotype.The lme4 package in the R software was used to fit the model (Bates, 2007) following the below statistical representation: , where y is the response vector of a trait for a given location, μ is the grand mean, check  are the checks present in each block, env  is the environment effect, Block  (env  ) is the block within environment effect, gen  is the genotypic effect, env  ×   is the genotype by environment effect, and   is the residual.Variance was partitioned and used to estimate broad sense and SNP-based heritability.
Association and correlation analysis was performed using the obtained deregressed BLUPs values as phenotypes.Deregressed BLUPs were estimated from the BLUPs obtained from the above model using the formula suggested by Garrick T A B L E 1 Trait, class, and phenotyping method of the 12 traits assessed in the present study.

Class Phenotyping method
Gari (%) Processing traits Gari (%) was calculated as a percentage of the initial weight of starting roots (Aghogho et al., 2022).

Peel loss (%) Processing traits
Peel loss (%) was calculated as a percentage of the initial weight of starting roots (Aghogho et al., 2022).

L* gari Color parameters
A chroma meter (CR-400 Konica Minolta) was used to measure the Commission on Illumination (CIE) tristimulus L*(white/black), a*(red/green), and b* (yellow/blue) parameters.Gari and eba samples for color measurement were prepared as described by Aghogho et al. (2023).A 250-mL graduated measuring cylinder was filled with gari samples and gently tapped on the laboratory bench about 50 times until they were leveled to the 250 mL mark (Aghogho et al., 2023).
Swelling index (SI) 10 g of gari samples was placed in a 100-mL measuring cylinder and the initial volume was measured.Cold distilled water was added to the 100 mL mark of the cylinder and allowed to sit for 4 h before observing the final volume of gari.SI was calculated to be a multiple of the original volume (Aghogho et al., 2023).
Eba gumminess (%) et al. ( 2009) and given as follows: where PEV is the predicted error variance of the BLUP and  2  is the genotypic variance component.Correlation analysis was carried out using the cor package and visualized using ggcorrplot in core R version 4.1.1(R Core Team, 2020)

Heritability estimates
Broad-sense heritability was estimated as suggested by Piepho and Möhring (2007) and given as follows: where  2 is the broad-sense heritability estimate across all environments,  2  is genotypic variance,  2  is the variance of genotype by environment interaction,  2  is the residual variance, e is the number of environments, and r is the number of replications.
SNP-based heritability was estimated as suggested by Yang et al. (2017) and given as follows: where  2 (SNP) is the SNP-derived genetic variance and  2  is the SNP-derived error variance.
SNP-based genetic variance and SNP-derived error variance were extracted using a compressed linear mixed model in TASSEL.

Genetic and phenotypic variation estimates
The phenotypic and genotypic coefficients of variation as well as genetic advance (GA) and genetic advance as percentage of mean (GAM) were estimated as per the formula prescribed by Burton and Devane (1953).
Phenotypic coefficient of variance Genetic advance as a percentage of mean where  2  is the phenotypic variance,  2  is the genotypic variance,  is the mean, GA is the expected GA, K is the standardized selection differential at 5% selection intensity (K = 2.063), [ 2  ] 1∕2 is the phenotypic standard deviation on a mean basis, and ℎ 2 SNP is the SNP-based heritability.

Population structure analysis
Linkage disequilibrium (LD) pruning was conducted using Plink 1.9 to select 16,210 SNP markers with a pairwise LD value of <0.5 within each 50-SNP window size.These LDpruned SNPs were subsequently used for population structure analysis, including hierarchical clustering and ADMIXTURE analysis.ADMIXTURE uses a model-based clustering algorithm to assign proportions of ancestry to each individual based on their genotype data.For this study, a range of subpopulation counts (K) from 2 to 15 was tested to identify distinct ancestries.A 10-fold cross-validation procedure was employed to ensure the accuracy of the ADMIXTURE analysis results.

Genome-wide association analyses
For each trait, GWAS was conducted using both single-locus and multi-locus models to identify significant SNP associations.The single-locus model utilized was the compressed mixed linear model (CMLM), as recommended by Zhang et al. (2010).This model helps mitigate false positives by incorporating population structure (Q) and kinship (K) matrices into its corrections.Conversely, the multi-locus model employed was the Bayesian information and linkage disequilibrium iteratively nested keyway (BLINK), as suggested by Huang et al. (2019), which addresses both false positives and false negatives by incorporating pseudo-QTN as covariates.
Both models were implemented using the Genome Association and Prediction Integrated Tool version 3 R package (J.Wang & Zhang, 2021).False discovery rate and marker allele frequency thresholds were set at 0.05.Significant SNPs were initially identified using the Bonferroni correction p-value threshold of −log10(p) = 6.00.However, for traits where this threshold proved too stringent, a more relaxed threshold of −log10(p) = 4.00 was applied based on the insights from quantile-quantile (QQ) plots and the distribution of p-values, as discussed in the literature (Ji et al., 2021;Kristensen et al., 2019;Lou et al., 2021).
Manhattan plots of top SNPs from the CMLM and BLINK results were visualized using the ggplot2 package in R (Wickham, 2016).QQ plots were used to visualize the results of GWAS using the CMplot package in R software (Yin, 2020).

Linear regression and dependence of peak SNPs
We assessed the linear dependence of peak SNPs by fitting a linear regression model with peak SNPs at the identified loci as independent variables and the traits' BLUPS as the response variables.We used the olsrr package (Hebbali & Hebbali, 2017) to perform collinearity diagnostics on the output from the linear regression model.SNPs with variance inflation factors (VIFs) above 50 and infinity (inf) were dropped until only SNPs with VIF <5 were left in the model.The R 2 and non-collinear peak SNPs were visualized using a bar plot from ggplot2.

Candidate gene identification
Candidate gene identification analysis was performed using non-correlated significant SNPs selected from the GWAS results.The significant SNPs were mapped to their corresponding genes or to surrounding genes using the genome annotation available as a general feature format (gff3) file, M. esculenta_305_v6.1.gene.gff3, of the cassava reference genome v6 .1 available in Phytozome v12.1 (https:// phytozome-next.jgi.doe.gov/).The gff3 file was converted to a browser extensible data (bed) file using the gff2bed function from the bedops program (Neph et al., 2012).The closes-tBed function from bedtools (Quinlan & Hall, 2010) was then used to identify genes in the LD decay regions and then narrowed down to genes closest to significant SNPs.Gene ontology annotation was carried out on the plant Ensembl website (https://plants.ensembl.org/index.html).The National Center for Biotechnology Information genome data viewer using Manihot esculenta v6.1 (GCF_001659605.1)was used to confirm genes not found in the Ensembl plant.

Phenotypic variation and relationships among yield and quality of gari-related traits
Comprehensive statistical information on 12 essential traits related to gari yield and the quality of gari and eba are reported in Figure 1 and Table S2.The data showed significant genetic and environmental variation for all traits except for a* gari.Genotype by environment interaction was only significant The Plant Genome F I G U R E 1 Summary statistics for yield and quality of gari-related traits.CV, coefficient of variation; SE, standard error; BLUEs, best linear unbiased estimates.in a couple of traits (L* gari, bulk density, swelling index, and eba gumminess).Across all measured traits, phenotypic variations exceeding two-fold differences between maximum and minimum values were observed, with an average coefficient of variation (CV) of 25%.The color parameter for eba (a*) exhibited a significantly higher CV (67%).These differences observed between the maximum and minimum values are an indication of broad phenotypic variability within the association panel.

Estimates of variance components for yield and quality of gari-related traits
To assess the impact of additive and nonadditive genetic effects on the observed phenotypic values, we computed estimates for broad-sense and genomic heritability estimates from variance components.Estimates of these variance components for each of the studied traits are presented in Table 2. SNP-based heritability ranged from 21% (peel loss) to 70% (b* gari and b* eba).Traits such as a* eba, b* eba, L* eba, swelling index, eba hardness, and eba gumminess exhibited higher SNP-heritability estimates (>0.5).Phenotype-based broad-sense heritability ranged from 6% (L* gari) to 63% (b* gari and peel loss).Traits such as gari yield (%), peel loss, L* gari, a* gari, b* gari, and bulk density showed lower heritability estimates (<0.4).We observed that SNP-based heritability was generally higher than phenotype-based broad-sense heritability.The phenotypic coefficient of variation (PCV) for all traits was relatively high, ranging from 4.23% to 58.91%, indicating substantial variation and, consequently, potential for improvement through selection.The genetic coefficient of variation (GCV) values were moderate, ranging from 0.52% to 16.86%.These results suggest considerable potential for phenotypic improvement across all traits through targeted selection.

3.3
Correlation analysis for yield and quality of gari and eba-related traits Significant variations were observed in the phenotypic correlations for gari yield, ranging from a negative correlation of −0.55 with eba hardness and swelling index to a positive correlation of 0.64 with bulk density (Figure 2).Notably, peel loss exhibited low, negative correlations with gari L* (−0.24, p < 0.001) and b* eba (−0.26, p < 0.001).Eba hardness correlated positively with eba a* (0.27, p < 0.01), a* gari F I G U R E 2 Correlation coefficient using deregressed best linear unbiased predictions (BLUPs) for 12 traits.

Genotypic relationships and informativeness in SNP data
Cluster analysis using kinship matrix (Figure 3a) as well as principal component analysis (Figure 3b) was used to assess the genetic relationship among the accessions in the GWAS panel.PC1 and 2 explained about 15% of the total genetic variation, suggesting broad genetic diversity.In terms of their phenotypic characteristics, the genotypes exhibited a variation in root dry matter content and fresh root weight, ranging from 21.02% to 44.8% and 10 to 38 kg, respectively.

Distribution of SNPs, LD and population structure
Genome-wide SNP distributions on 18 chromosomes of cassava are shown in Figure 4a.SNPs coverage per chromosome varied from 1623 on chromosome 7 to 7311 on chromosome 1, indicating considerable differences in SNP density across chromosomes.Figure 4b shows the decay of genomewide LD for 188 cassava genotypes.The LD decay across the whole genome had a maximum LD (r 2 ) value of 0. 46 and decreased to 0.23 at 10 kb, suggesting rapid genetic recombination and potential for higher resolution mapping (Figure 4b).The population structure of the 188 cassava genotypes based on 16,210 SNPs is presented in Figure 4e.The population stratification inferred by using the model-based ADMIXTURE clustering method indicated the presence of seven subpopulations (Figure 4e).

Genome-wide association results
Only traits with broad-sense heritability estimates above 0.30 were analyzed in the GWAS (Table 2).This resulted in eight traits related to gari yield and quality of gari and eba being considered for association analysis: L* eba, b* eba, bulk density, b* gari, eba hardness, peel loss, and gari yield (%).A combined total of 58 unique genomic regions associated with these traits were identified using CMLM and BLINK models, with nine and eight unique regions found exclusively by each model, respectively, and 41 regions identified by both (Figure 5).The most significant trait-marker associations, including specific genomic regions, minor allele frequencies, p-values, and the proportion of phenotypic variance explained by the SNPs, are provided in Table S3.In the following sections, we present the results for each trait.
The GWAS for color-related traits in gari and eba revealed several significant associations across multiple chromosomes.For b* gari, which measures the yellowness of gari, eight loci were identified on chromosomes 1, 11, and 15.The locus on chromosome 11, tagged by S11_23058961 (p-value of 4.65E-09) with an MAF of 0.47, explained 8.76% of phenotypic variance.In contrast, L* eba, assessing the brightness of eba, was linked to a significant locus on chromosome 11 at S11_23133377 (p-value = 3.21E-05), which had an MAF of 0.19 and accounted for 8.0% of the variance.Additionally, the yellowness of eba, denoted as b* eba, was associated with a locus on chromosome 15.The marker, S15_7457511 (pvalue = 1.53E-079), with an MAF of 0.470, had a %PVE of 8.85%.
Several loci were found to be associated with gari and eba physicochemical traits.For the swelling index, a trait indicative of gari's ability to absorb water and swell, we found significant association signals on chromosomes 10 and 15.The most significant locus on chromosome 15 was tagged by marker S15_3653278 (p-value = 1.13E-05), which has an MAF of 0.44.The bulk density of gari, which relates to its compactness and heaviness, was associated with five SNP markers on chromosomes 1 and 6.Four of these markers were located on chromosome 1, with the most prominent being S1_28607143, characterized by a p-value of 2.70E-11, with an MAF of 0.34 and a %PVE of 12.1%.Conversely, eba hardness, which affects the firmness and mouthfeel of the prepared food product, did not reveal any significant associations with the markers analyzed in this study.The absence of notable loci implies that eba hardness may be controlled by complex genetic factors that are not easily captured by the current SNP array or may be more influenced by environmental factors and postharvest processing methods.

Linear regression and dependence of peak SNPs
To assess the strength of the association of Peak SNPs with studied traits, we performed a multiple linear model using F I G U R E Multiple barplot across traits and non-collinearity peak single nucleotide polymorphisms (SNPs).Numbers in the barplot denote the number of loci in the regression model with the R 2 in the bracket.
peak SNPs at the identified loci (Figure 5) as independent variables against the traits BLUPs as the response variables.The R 2 values for the peak SNPs at the identified loci ranged from 0.05% (b* eba) to 0.24% (swelling index) (Figure 6).The markers for L* eba, bulk density, and peel loss had the lowest predictive ability (R 2 < 0.1).The result of the multicollinearity analysis revealed that among the 39 peak SNPs observed for swelling index, two were found to be non-correlated (Figure 6; Table S4).Among the five peak SNPs associated with bulk density, two were found to be uncorrelated.

Putative candidate gene identification
Non-correlated significant SNPs were further examined to identify annotated genes with putative functions using the M. esculenta 305_v6.1.gene.gff3file of the cassava reference genome available on Phytozome v12.1.The number of genes identified using the LD decay regions varies across traits, ranging from 117 for the b*gari trait to 268 for traits such as gari percentage and bulk density of gari (Table S5).Further, we then narrowed down the search to genes close to the significant SNPs (Table 3).The candidate gene Manes.01G189200,coding for ubiquitin-conjugating enzyme E2 protein, was found closest to the significant SNP (SI_28607159) identified for gari yield (%) and bulk density of gari.An uncharacterized protein was found to be closest to the significant SNP (S5_6824266) identified for peel loss.Gene identification for color-related traits for gari and eba revealed that candidate gene Manes.11G121400encodes a leucine-rich repeat (LRR) protein kinase family protein and was found as the closest gene to S11_23058961.For eba brightness (L* eba) and the candidate gene Manes.11G122000,encoding for the B-box (BBX) zinc finger family protein was found closest to S11_23133377.
Two putative candidate genes were found to be closest to S10_4428114 and S15_3653278 SNPs identified to be associated with swelling encoding for proteins belonging to the phosphoglucosamine mutase (PGM) family (Manes.10G046600)and pectin lyase-like (PEL) superfamily (Manes.15G048700),respectively.

DISCUSSION
The genetic architecture of gari yield and quality is essential for varietal development and acceptability.Despite advances in studying cassava root quality traits, the genetic blueprint for its processed products remains unexplored.This study aimed to estimate the GA and identify genomic regions and candidate genes associated with cassava gari yield (%) and quality.While significant phenotypic variation was observed for all traits, L* gari and a* gari showed minimal variation.This is because L* and a* measurements capture the T A B L E 3 Putative candidate genes-closest to significant single nucleotide polymorphisms (SNPs) associated with gari yield and quality.red-green (a*) and lightness-darkness (L*) spectrums of color.However, these spectrums alone might not fully represent the complexities of color variation observed in gari.This was evident by the more than two-fold difference between the minimum and maximum values, and the significant genetic and environmental variation.The range and mean values observed for the traits under study were consistent with what was reported for gari produced from frozen roots (Oyeyinka et al., 2019).The similarity in results may be due to the same range of genetic diversity in the genetic materials or phenotyping methods used in both studies.However, the significant genetic variance offers a valuable resource for genetic improvement for these traits.The estimation of SNP-based heritability is a valuable tool for quantifying the contribution of additive genetic variation to phenotypic variance.This is because it considers individual genetic variants' contribution to trait expression (Yang et al., 2017).We observed a higher SNP-based than broad-sense heritability for all traits measured except for gari yield, peel loss, and bulk density.This difference may be due to the influence of environmental and genotype by environment factors on phenotypic broadsense heritability (Nyquist & Baker, 1991) or other factors such as population stratification, LD, sample size, and genotyping data quality on SNP-based heritability (Zhu & Zhou, 2020).Both phenotypic broad-sense and SNP-based heritability estimates were moderate to high, indicating the significant influence of additive genetics on these traits.This suggests promising prospects for genetic improvement and highlights the repeatability and reproducibility of the experimental procedures.All traits exhibited PCV greater than GCV, indicating significant influence of genetics and environment on traits.This trend is consistent with other studies involving cassava yield traits such as dry matter content and starch content (Peprah et al., 2020).The GAM values obtained in this study are within the range of values reported for other cassava studies (Adjebeng-Danquah et al., 2017;Akinwale et al., 2010).Despite the lower SNP-based heritability estimate of eba gumminess (0.47) compared to b* eba (0.70), we observed similar GAM.This suggests that while eba gumminess likely has higher nonadditive variance and environmental influence, genetic progress remains achievable for both traits in each selection cycle (Pradeepkumar et al., 2001).Traits with high estimates of genetic improvement potential (PCV, GCV, GA, and GAM) offer opportunities for genetic improvement.This study revealed moderate values for these measures, suggesting potential for genetic improvement in gari yield and quality traits.

Traits
Trait correlation plays a pivotal role in improving the accuracy of selection for complex traits via indirect selection.However, negative correlations between two desirable traits could potentially hinder simultaneous genetic progress.Significant correlations among traits may arise from genetic linkage and/or pleiotropic effects of different genes.This study revealed a stronger positive correlation between eba hardness and both bulk density and swelling index compared to what was reported by Awoyale et al. (2022).The difference could be attributed to variations in population size and location between the two studies.The findings from our study were consistent with the findings of Almazan (1992) for the negative correlation between swelling index and gari yield (%).The negative correlation between eba hardness and bulk density presents a significant challenge for breeders aiming to simultaneously improve both traits.Advanced breeding methods such as marker-assisted selection and genomic selection will be necessary to identify genotypes with favorable combinations of eba hardness and bulk density.Also, multi-trait selection strategies can aid breeders in navigating this negative correlation and achieving simultaneous improvement in both traits (Gamal El-Dien et al., 2015;Merrick & Carter, 2021).
For a GWAS analysis, understanding the relationship among genotypes and assessing the informativeness of SNPs are crucial steps for identifying genetic variants associated with phenotypic traits.Previous studies have identified a correlation between root dry matter content and the yield and quality of gari (Almazan, 1992;Laya et al., 2018).Therefore, a population exhibiting the range of dry matter content reported in this study offer a platform for conducting gene discovery analysis.Also, the DArTseq genotyping platform generates a moderate to high density of markers, making it suitable for GWAS in cassava.The availability of numerous markers in the platform allows for the detection of genetic variations associated with complex traits.Compared to other genotyping platforms such as SNP arrays and whole-genome sequencing, DArTseq offers a good balance between marker selection, cost, and marker density, making it more feasible for large-scale association studies in cassava (Davey et al., 2011).DarTseq in association analysis for cassava has been reported for several marker discoveries studies in traits such as root tuber quality traits (Rabbi et al., 2017), flowering-related traits (Baguma et al., 2024), and nitrogen use efficiency in cassava (Mbe et al., 2024).
Population structure and cryptic genetic relatedness can lead to misleading associations in GWAS, even with advanced statistical methods (Hellwege et al., 2017).To ensure accurate analysis, this study adopted a combination of methods to assess population structure and minimize spurious associations.This included the use of a 15-fold cross-validation ratio, estimating admixture ancestry, and performing hierarchical clustering.The analysis revealed 188 genotypes clustered into seven distinct subpopulations.Despite this structure, genotypes with low to high dry matter content were distributed across all subpopulations, suggesting that the population is suitable for association analysis.
GWAS has identified genes associated with traits like yield, disease resistance, and starch content in cassava (Zargar et al., 2015) and quality traits in cooked rice (Misra et al., 2018).Taking examples from previous GWAS works, this study focuses on putative candidate genes for product quality traits in cassava.The ubiquitin-conjugating enzymes (E2s) found closest to significant SNPs related to gari (%) and bulk density are deeply involved in hormone signaling transduction (Santner & Estelle, 2010) and responses to biotic and abiotic stresses (Becker et al., 1993;Lee & Kim, 2011).This suggests a potential link to stress resilience which could indirectly influence yield and bulk density.In cassava, specific subfamilies of the ubiquiting-conjugating (UBC) superfamily (UBC35 subfamily) have been associated with root development (Salcedo et al., 2014), a major factor in determining gari yield (%) and bulk density.In Arabidopsis, the UBC34 subfamily regulates the SUC2 transporter, influencing sucrose transport and potentially carbohydrate metabolism (Liu et al., 2020).In yeast, the UBC8 subfamily is critical for the degradation of fructose 1,6-bisphosphatase 1 (Fbp1), a key element in carbohydrate metabolism (Schüle et al., 2000;Z. Wang et al., 2022).Given the link between carbohydrate metabolism, sucrose transport, and the accumulation of dry matter in cassava roots (Beyene et al., 2020), the implications of this gene in gari production are even more significant.This aligns with previous research by Teeken et al. (2021), which suggested a potential correlation between dry matter content and gari yield (%).
The LRR protein kinase, associated with variations in b* gari as described by Jones and Jones (1997), encompasses receptor serine-threonine kinases and polygalacturonaseinhibiting proteins.Previous research has drawn a connection between LRR protein kinase family protein expression and flour color in wheat (Ji et al., 2021).In melon fruit, the LRR protein has shown substantial correlations with the orange gene (CmOr), regulating the accumulation of β-carotene (Chayut et al., 2021).The interaction between the orange gene and the PSY1 gene in cassava has a role in carotenoid accumulation (Jaramillo et al., 2022), but the relationship between the LRR protein kinase, orange gene, and PSY1 gene in cassava's carotenoid regulation remains unexplained.
The BBX family protein, which was found close to the significant SNPs associated with the brightness of eba (L*eba), comprises zinc-finger transcription factors containing one or two highly conserved BBX motifs at their N-termini (Fang et al., 2019).BBX proteins are key players in plant developmental and growth processes like seedling light response, shade avoidance, flowering time regulation, stress resilience, fruit pigmentation, and carotenoid accumulation (Kim et al., 2019;Soitamo et al., 2008;Wu, 2011;Xiong et al., 2019).In tomatoes (Solanum lycopersicum) and cucumber (Cucumis sativus), it functions by binding to the promoter region of phytoene synthase 1 (PSY1), leading to elevated carotenoid accumulation within the chloroplast (Cao et al., 2023;Obel et al., 2022;Xiong et al., 2019).Further research is needed The Plant Genome to understand the regulatory mechanisms of BBX and its interactions with pigment-related pathways in cassava roots.
For the swelling index of gari, two putative candidate genes were identified near significant SNPs.These genes are potentially involved in cellulose synthesis and degradation of the pectate component of the plant cell wall.The PGM family protein plays a role in the accumulation of reserve starch and tuber growth by catalyzing the interconversion of glucose 6-phosphate and glucose 1-phosphate (Fettke et al., 2010;Tauberger et al., 2000).A lower PGM activity reduces starch reserve and tuber growth and increases the production of uridine diphosphate-glucose, a precursor of cellulose synthesis (Swissa et al., 1980).Cellulose has been shown to affect water distribution and absorption in wheat dough (Hu et al., 2022;Jiang et al., 2020).The PEL superfamily protein is known to be involved in the degradation of the pectate component of the plant cell wall by depolymerizing enzymes degrade the polygalacturonic acid (Henrissat et al., 1995).This degradation initiates cell separation and wall loosening during fruit ripening (Sun & van Nocker, 2010;Voragen et al., 2009).Cell wall separation and loosening reduce the adhesion between cells, a crucial factor affecting cell hydration properties (Auffret et al., 1994;Kunzek et al., 1999;Müller & Kunzek, 1998).Pectin has been linked to water absorption capacity in cassavabased products (Awoyale et al., 2023).Although our study identified associations between the swelling index of gari and both PGM and PEL, we did not observe a clear interaction between these genes.This complexity suggests that water cassava granules water absorption likely involves an intricate interplay of genetic factors beyond individual genes.
This study offers a valuable starting point for improving the yield and quality of gari by cassava breeders.Future steps include investigating the specific function of these genes, validating their roles and developing molecular markers for breeding selection, to improve traits related to yield and quality of gari.

CONCLUSION
This study represents a pioneering effort in cassava research, as it is the first to conduct a comprehensive GWAS to identify the underlying genetic factors influencing gari yield and the quality of gari and eba.This powerful approach identified nine significant SNPs closely associated with variations in measured traits.These identified polymorphisms hold immense potential as molecular markers enabling breeders to effectively select for desired traits for increased genetic gain.Nonetheless, research and experimental validation will help elucidate these genes' functional roles and their potential contributions to gari yield and quality, ultimately advancing our knowledge and practical application in cassava breeding programs.

A C K N O W L E D G M E N T S
We acknowledge the Bill & Melinda Gates Foundation, Commonwealth & Development Office (FCDO) of the United Kingdom through the Next Generation Cassava Breeding project (https://www.nextgencassava.orggrant no.IVN-007637).We thank the Deutscher Akademischer Austauschdienst (DAAD) for funding the Cynthia Idhigu Aghoghor's PhD scholarship in plant breeding at the West Africa Centre for Crop Improvement (WACCI), University of Ghana Legon, Accra, Ghana.We thank the reviewers and Associate Editor for their valuable comments, which have contributed to the improvement of the manuscript.

C O N F L I C T O F I N T E R E S T S T A T E M E N T
The authors declare no conflicts of interest.

D A T A AVA I L A B I L I T Y S T A T E M E N T
The datasets analyzed for the current study and original GWAS output files are available at the https://cassavabase.org/ftp/manuscripts/Aghogho_et_al_2023/.

F
I G U R E 3 (a) Heat map of single nucleotide polymorphism (SNP)-based relationship matrices for 200 cassava genotypes, showing genotype relatedness; (b) Scatter plot of first and second principal component of 200 genotypes showing marker informativeness across genotypes.DM, dry matter content; FYLD, fresh root yield (t/ha).

F
I G U R E 4 (a) Genome-wide single nucleotide polymorphism (SNP) distributions on 18 chromosomes of cassava.The horizontal axis displays the chromosome length in base pairs; the vertical axis on the left displays the number of SNPs; 0-10,927 color-coded legend insert indicates the density of SNPs; the number of SNPs is also indicated for each of the 18 chromosomes.(b) Linkage disequilibrium (LD, r 2 ) decay plot of 53,150 marker pairs as a function of physical distance (bp) for the 188-cassava genotype.(c) The population structure of cassava accessions for 188 genotypes based on 16,210 SNPs and a 10 fold cross-validation error rate for K = 2 to K = 15.(d) Hierarchical clustering (Ward's minimum variance method) dendrogram.(e) Individual ancestry estimated ADMIXTURE analysis (K = 7).The colored sections within each bar indicate membership of the genotypes in the different subpopulations.
Estimates of variance components for gari yield and quality-related traits.: GA, genetic advance, GAM, genetic advance as percentage of mean; GCV, genotypic coefficient of variation; H 2 , broad sense heritability; ℎ 2 SNP , SNPs-based heritability; PCV, phenotypic coefficient of variation.