Heterogeneity of a dwarf phenotype in Dutch traditional chicken breeds revealed by genomic analyses

Abstract The growth of animals is a complex trait, in chicken resulting in a diverse variety of forms, caused by a heterogeneous genetic basis. Bantam chicken, known as an exquisite form of dwarfism, has been used for crossbreeding to create corresponding dwarf counterparts for native fowls in the Dutch populations. Here, we demonstrate the heterogeneity of the bantam trait in Dutch chickens and reveal the underlying genetic causes, using whole‐genome sequence data from matching pairs of bantam and normal‐sized breeds. During the bantam‐oriented crossbreeding, various bantam origins were used to introduce the bantam phenotype, and three major bantam sources were identified and clustered. The genome‐wide association studies revealed multiple genetic variants and genes associated with bantam phenotype, including HMGA2 and PRDM16, genes involved in body growth and stature. The comparison of associated variants among studies illustrated differences related to divergent bantam origins, suggesting a clear heterogeneity among bantam breeds. We show that in neo‐bantam breeds, the bantam‐related regions underwent a strong haplotype introgression from the bantam source, outcompeting haplotypes from the normal‐sized counterpart. The bantam heterogeneity is further confirmed by the presence of multiple haplotypes comprising associated alleles, which suggests the selection of the bantam phenotype is likely subject to a convergent direction across populations. Our study demonstrates that the diverse history of human‐mediated crossbreeding has contributed to the complexity and heterogeneity of the bantam phenotype.


Figure S1
The breed reduced body weight and the phylogenic structure of the three groups. Figure S2 The association studies of structural variations for three groups on GGA1 and GGA4 respectively。 Figure S3 Manhattan plot of GWAS results of the "pooled" analysis, HMGA2 analysis in group 3, and sex-linked analysis. Figure S4 The enrichment analysis of Gene Ontology (GO) terms. Figure S5 The histogram shows the comparison of the count of IBD fragments in the three groups. Figure S6 Distribution of rIBD and Fst estimation on GGA4 the group 3 specific region. Figure S7 The haplotype of genomic associated sites in the three groups. Figure S8 NJ-tree topology of haplotype and PCA structure (page 9-10).

Figure S1
The breed reduced body weight and the phylogenic structure of the three groups. The Breed standard bodyweight of Dutch chicken is collected (Supplementary file 1) and the breed reduced body weight of each breed was summarized in three groups. Figure (A) shows the mean ratio of reduced body weight (shown in the y-axis) summarized according to three groups (displayed by the x-axis). Figure (B) shows the reduced ratio of breeds. The figures (C-D) show the NJ-tree topology of neo-bantams of each group and the bantams source in the complete dataset, namely Dutch bantam (group 1), Java bantam and Sebright bantam (group 2); true bantams in group 3 were not sampled in this study, therefore three representative neo-bantams that possess diversification are used to represent the group. The color scheme is based on the group, red nodes show individuals from group 1, yellow shows group2, and group 3 is in blue nodes.

Figure S2
The association studies of structural variations for three groups on GGA1 and GGA4 respectively. The red horizontal line shows the Bonferroni threshold of P-value (0.05/the number of tested variants), and a suggestive cutoff threshold (P=5x10 -8 ) is indicated by blue horizontal line. To investigate the variation associated with the bantam phenotype, we tested the Structural Variation (SV) in the dataset. Firstly, we checked the variation within the genetic region that is of high risk identified by SNP based GWAS. We do not have the evidence that any deletion or duplication in HMGA2 gene is associated with bantam in the three bantam groups. The associated variation was also manually checked for coverage as well. Secondly, using the information of the genotype of each variation, we conducted an association study in each group. In group 2, a duplication spanning GGA1:1,101,992-1,102,562 shows tentative association with the bantam phenotype, the frequency of the duplication in case:control is 44%:14%. The duplication is in the first intron of ARF5 (ADP-ribosylation factor 5). However, it is not clear that ARF may functionally influence body growth. Two deletions, GGA1:164,677,677,819,and GGA1:166,849,849,983, are additionally reported as association, however there is no gene annotated to the deletions.

Figure S4
The enrichment analysis of Gene Ontology (GO) terms. The gene set includes associated genes identified by four studies (the three group-based GWAS and the meta-analysis). Three GO categories, BP, CC and MF are shown by the three plots, with colored circle as the adjusted P value, the x-axis as the count of the terms, and the size of the circle as the gene ratio. We have found no statistically significant enrichment, but the GO terms associated with bone morphogenesis, developmental process, skeleton muscle, and bone development can be found in Table S6.

Figure S5
The histogram shows the comparison of the count of IBD fragments in the three groups. In each group, the count of IBD on GGA1 (at length in bp) between bantam and neo-bantam is plotted (denoted as bantam-neobantam in red), as well as the IBD between neo-bantam and their normal-sized counterpart (denoted as neo-bantam-large in green).

Figure S7
The haplotype of genomic associated sites in three groups. (A-C) The horizontal axis represents the significant variant sites surrounding GGA1:34-35Mb ordered by chromosome and position, each row represents one individual, the breed name is on the right vertical axis and annotated by the left colored boxes. The bantam and normal-sized breeds are separated by a black horizontal line. The highlighted SNP in the orange box on the horizontal axis is the most significant peak variant (NC_006088.5:g.34326548G>C) found in the meta-analysis. (D) Manhattan plot shows the computed linkage disequilibrium (r 2 ) between markers in the three groups (group 1 to 3) and metaanalysis. The value of r 2 was calculated for each variant against the peak SNP (NC_006088.5:g.34326548G>C). The colors of points indicate the values of r 2 , ranging from 0 (blue) to 1 (red).

Figure S8
NJ-tree topology of haplotype and PCA structure. In the NJ-tree panel, we used the haplotype coves the variant of interest and 1Kb surrounding region. The two haplotypes of one individual is denoetd by the number one and two at the end of the tip name. The color scheme is based on the group, red shows bantams in group 1, yellow shows bantams in group2, and bantams in group 3 are in blue nodes, and the normal-sized ones are colored black. In the PCA panel, the phenotypes (1 is for normal-sized, 2 is for bantam) are shown in colors, and groups are shown in different shapes. We showed 5 haplotype blocks, denoted by the lead variants: (A) Variant NC_006088.5:g.34326548G>C; (B) SNP (rs732144338) has the frequency of bantam associated allele in case:control of each group as, 45.67%: 0.00% (group 1); 10%: 2.5% (group 2), while in group 3 it is not tested in this study. (C) SNP (rs13643124) has the bantam allele frequency in case:control of each group as, 26.67%: 23.68% (group 1); 42%: 9.76% (group 2); 80.67%: 5.83% (group 3). (D) SNP (rs316652476) has the bantam allele frequency as, 90%: 2.78% (group 1); 20%:19.51% (group 2 ); 61.11%: 31.82% (group 3). (E) Variant (rs735861847) has the bantam allele frequency in case:control 75%: 26.32%(group 1); 79.17%: 10% (group 2); 38.89%: 36.36% (group 3). Specially, individuals without genotype in this interval were removed in the PCA, the imputed haplotype was used in phylogeney construction. The position showen in the figure is based on the genome build GRCg6a (GenBank Accession: GCA_000002315.5).