Segregation analysis identifies specific alpha-defensin ( DEFA1A3 ) SNP–CNV haplotypes in predisposition to IgA nephropathy

Background: Immunoglobulin A (IgA) nephropathy is a disorder of the immune system affecting kidney function, and genome-wide association studies (GWAS) have defined numerous loci with associated variation, all implicating components of innate or adaptive immunity. Among these, single nucleotide polymorphisms (SNPs) in a region including the multiallelic copy number variation (CNV) of DEFA1A3 are associated with IgA nephropathy in both European and Asian populations. At present, the precise factors underlying the observed associations at DEFA1A3 have not been defined, although the key alleles differ between Asian and European populations, and multiple independent factors may be involved even within a single population. Methods: In this study, we measured DEFA1A3 copy number in UK family trios with an offspring affected by IgA nephropathy, used the population distributions of joint SNP–CNV haplotypes to infer the likely segregation in trios, and applied transmission disequilibrium tests (TDT) to examine joint SNP–CNV haplotypes for over-or undertransmission into affected offspring from heterozygous parents. Results and Conclusions: We observed overtransmission of 3-copy class 2 haplotypes (raw p = 0.029) and some evidence for under-transmission of 3-copy class 1 haplotypes (raw p = 0.051), although these apparent effects were not statistically significant after correction for testing of multiple haplotypes.

to IgA nephropathy, including a characteristically high frequency of the disorder specifically in East Asian populations, and the identification of associated variation at specific loci using genome-wide association studies (GWAS) (Kiryluk et al., 2010;Sanchez-Rodriguez et al., 2021;Xie et al., 2013).
Although IgA nephropathy was the first example of a human phenotype significantly associated with DEFA1A3 variation, single nucleotide polymorphisms (SNPs) neighboring DEFA1A3 have also been shown to have significant associations with periodontitis (Munz et al., 2017) and with measured properties of human blood cells, including differential white blood cell counts and neutrophil granularity (Akbari et al., 2020;Astle et al., 2016).There are indications that the key allelic factors may overlap between the three phenotypes or may possibly be identical.
Genome-wide significant association of SNP alleles at DEFA1A3 with IgA nephropathy has been observed in both East Asian (Ai et al., 2016;Kiryluk et al., 2014;Qi et al., 2015;Yu et al., 2012) and European (Kiryluk et al., 2014) populations, but the precise allelic factors involved appear to differ between Asian and European populations (Kiryluk et al., 2014).Furthermore, although DEFA1A3 and the neighboring DEFA4 are the only protein-coding genes in the associated linkage disequilibrium block, it is not clear in any population what the crucial functional difference is between risk alleles and others, or indeed whether there could be multiple alleles carrying different levels of risk or protection.
Analysis of DEFA1A3 copy number and sequence variants in Chinese populations demonstrated that copy numbers of DEFA1A3 and its sequence variants were more closely associated with IgA nephropathy than the lead single-copy SNPs (Ai et al., 2016), suggesting that the underlying causal variation was likely to be in the copy number variation (CNV) region rather than in single-copy flanking DNA.Gene copy number at DEFA1A3 can vary extensively, from a minimum of three copies per person up to more than 20 in some populations, and the contributions of the DEFA1 and DEFA3 gene variants to the total gene complement can also be highly variable (Black et al., 2014;Hughes et al., 2020;Khan et al., 2013).
In other examples of multiallelic structural variation associated with human phenotypes, such as the association of complement C4 with schizophrenia (Sekar et al., 2016) or of haptoglobin with cholesterol levels (Boettger et al., 2016), different structural variants have different effects, in patterns not simply associated with individual flanking SNPs or proportional to gene dosage.In a detailed analysis of SNP-based association with IgA nephropathy at the DEFA1A3 locus in Han Chinese, at least three independent signals were defined, with indications that further independent effects remained to be discovered (Li et al., 2015).Risk factors associated with the copy-variable DEFA1A3 gene region may therefore show graded multiallelic effects, and the relationship between gene copy number and alpha-defensin function is not clear; although early work suggested a simple proportionality between gene number and peptide levels (Linzmeier & Ganz, 2005), subsequent analysis of leukocyte mRNA levels failed to show a correlation with gene number (Aldred et al., 2005).
To address these challenges, we attempted to identify key allelic factors at DEFA1A3 in predisposition to IgA nephropathy in a European (UK) population using the combined linkage and association approach of the transmission disequilibrium test (TDT).In this work, we examined DEFA1A3 SNP and CNV variation in DNA from family trios assembled at the UK MRC glomerulonephritis DNA Bank collection, each trio composed of an IgA nephropathy proband and both their parents.

DEFA1A3 copy number measurement by PRT
Paralogue ratio test (PRT) methods (Armour et al., 2007;Carpenter et al., 2015;Shwan et al., 2017;Walker et al., 2009) were used to determine gene copy numbers for DEFA1A3, including the determination of the relative contribution of sequence variants within the gene repeats, multiplexing PCR products from different assays for the same individual in a single electrophoresis run.The details of experimental methods followed those described (Ai et al., 2016;Khan et al., 2013), except that two new improved PRT measurements were adopted to measure total copy number for the DEFA1A3 genes.The cen1PRT assay used primers DEFAcen1F (CCCAGAGAGCTCCTTCATT) and DEFAcen1R (TCCTA-GAAAGCTGGTTGCTC) to amplify a 444 bp product from the centromeric gene position, and a 332 bp product from other (noncentromeric) copies; the tel2PRT assay used primers DEFAtel2F (AGAGCAGCCGTGCACAAAC) and DEFAtel2R (GCATCTYGGGGTCCATTGT) to amplify a 260 bp product from the telomeric gene position and a 263 bp product from other copies.Measurements of DEFA1A3 copy number deduced from cen1 and tel2 PRT data, and the splits of other sequence variants, were used to infer the most likely copy number for each sample as described (Khan et al., 2013).

DNA samples and SNP data
DNA samples from IgA nephropathy families and controls were obtained from the MRC glomerulonephritis DNA Bank held at the CIGMR Biobank (University of Manchester).Individual samples were genotyped for 318,127 autosomal and X-chromosomal SNPs with the Illumina Sentrix HumanHap300 BeadChip as previously reported (Feehally et al., 2010).For an additional 47 family trio samples not genotyped by those methods, SNP genotypes flanking DEFA1A3 were determined using the PCR-RFLP assays described (Black et al., 2014;Khan et al., 2013).

Common European SNP haplotypes
Previous work on variation in European population samples (Black et al., 2014;Khan et al., 2013) used sequencing of a 4.1 kb region immediately adjacent to the centromeric boundary of the repeat array to define five common haplotypes.The haplotype represented on the GRCh37/hg19 reference assembly was designated the "Reference" haplotype, and two other common variant haplotypes as Class 1 and Class 2. Sequencing also demonstrated that two haplotypes included segmental exchanges at the border of the repeat region, and these were therefore termed Exchange 1 and Exchange 2.Although the repeat exchanges are closely adjacent to the SNPs that define the major European haplotypes, the key SNPs are not themselves included in the replacement events.Each of the common haplotypes in European populations are associated with distinctive numbers and sequence variants of the repeat unit; for example, the Class 2(C2) haplotype is nearly always associated with CNV alleles containing two or three repeat units, of which the most telomeric repeat contains the DEFA3 gene (Black et al., 2014).The composition and properties of these common European haplotypes are summarized in Table S1.

Joint phasing of SNP-CNV haplotypes
SNP and DEFA1A3 CNV data from 989 Europeans (combining data from (Khan et al., 2013) and the European population samples in the 1000 Genomes Project phase 3 (The 1000 Genomes Project Consortium, 2015)) were phased using MOCSphaser (Kato et al., 2008), which infers the likely joint SNP-CNV haplotypes underlying the observed variation.MOCSphaser analysis produced haplotype frequency (*.hpfq) outputs giving a frequency distribution of different DEFA1A3 copy number states on each of the five common SNP haplotype backgrounds (Table S2).We then used this information about the common SNP-CNV haplotypes in European populations to infer the likely haplotype segregation in family trios.For each trio, a custom R script iterated across all possible haplotype segregation patterns consistent with the observed SNP and CNV profiles, and the frequencies of the implicated SNP-CNV haplotypes were used to assign relative probabilities to each of the possible solutions.To reduce the search space without appreciable loss of accuracy, we assumed that the minimum DEFA1A3 copy number per haplotype was 2, in keeping with observations from largescale typing (Black et al., 2014;Hughes et al., 2020;Khan et al., 2013), in which 1-copy haplotypes are rare; 0-copy haplotypes have not been observed.As a general index of confidence in the most likely segregation pattern, we calculated the probability of the most likely segregation pattern relative to the total for all possible patterns (Table S3).

Transmission disequilibrium test
To test over-or undertransmission of specific haplotypes from heterozygous parents into affected offspring, after appropriate recoding of genotypes to target specific haplotypes we used transmission disequilibrium tests implemented via the "-tdt" option of PLINK v1.90b6.2(Chang et al., 2015).

High-precision typing of DEFA1A3 copy number
Building on previous methods for characterizing DEFA1A3 CNV in genomic DNA (Black et al., 2014;Khan et al., 2013), we developed two new PRT (Armour et al., 2007) methods for assessing DEFA1A3 gene copy number (Figure 1).The first (cen1) exploits a consistent sequence length difference F I G U R E 1 Overview of the DEFA1A3 copy-variable region and the basis of the new measurement methods described in this report.The DEFA1A3 genes are represented as three tandemly arranged copies (blue arrows) on the hg19 reference assembly.Each copy of the gene is included within near-identical repeats of a 19 kb full repeat unit or (at the centromeric end, shown here on the right) a shorter partial repeat (gray arrows).Flanking these copy-variable regions of near-identical sequence are diverged sequences that do not show common structural variation (lighter gray arrows).The cen1 PRT measurement uses a single primer pair to amplify a 444 bp product (dark blue) from a constant flanking site, and products of 332 bp from other repeats (light blue); the ratio of internal (332 bp) to flanking (444 bp) products can be used to deduce the total copy number for the sample.Similarly, the tel2 primers amplify a 260 bp product (red) from a flanking site and 263 bp products from copy-variable internal locations (pink) between the copy of DEFA1A3 found at the centromeric end of the array (shown on the right in Figure 1) and uses a single primer pair to amplify different-length products from the two centromeric copies (per diploid genome) and the variable number of noncentromeric copies.The total copy number can be estimated from the ratio of noncentromeric to centromeric PCR products, assuming that there are exactly two centromeric copies (see the Section 2 Methods above).Similarly, a consistent length difference between the gene position at the telomeric end of the array, shown on the left in Figure 1, is the basis for the tel2 PRT which compares the signal amplified from the (exactly two) telomeric positions with the variable number of nontelomeric positions.
Despite the risk of structural variants disrupting the spatial arrangement of these sequences, we found that in practice the cen1 and tel2 measurements gave consistent outcomes, clustered as expected around integer copy number values (Figure 2).In addition to the cen1 and tel2 PRT measurements, we included assays of other ratios between different gene sequence variants, including the DEFA1 and DEFA3 variants of the DEFA1A3 gene.For example, a sample showing a DEFA1:DEFA3 ratio of about 1.5 (i.e., 3:2) is consistent with a copy number of 5 or 10 but is harder to reconcile with a copy number of 4. Combining all these ratio measurements with knowledge of the observed magnitudes and distributions of measurement errors allowed us to derive a maximum likelihood copy number (MLCN) for each sample, as previously described (Black et al., 2014;Khan et al., 2013).
We applied these methods to 1146 measurements of DEFA1A3 copy number; other samples typed included members of uninformative trios, other (nontrio) family members, or spouses of affected individuals.The DNA  b), the joint haplotypes are designated in the form (for example) "3REF," which indicates a "Reference" SNP haplotype carrying three copies of DEFA1A3.Neglecting the possibility that a haplotype can have 0 or 1 copies of DEFA1A3, the only resolution for the child is to have two 2-copy class 2 (C2) haplotypes, from which the nature of the untransmitted haplotypes in each parent can be deduced by simple subtraction.This trio has low DEFA1A3 copy numbers for all members, making the inferred haplotypes shown highly likely; by contrast, most trios include an offspring with higher copy number, allowing a wider range of compatible possibilities.In those cases, given the known relationships between SNP haplotype backgrounds and DEFA1A3 content, the relative likelihoods for all the different possible solutions can be evaluated had flanking SNP haplotypes corresponding to the five common haplotype classes defined in Europeans (Black et al., 2014).To identify the key features of risk or protective haplotypes for IgA nephropathy, we aimed to exploit the known relationships between flanking SNP haplotypes and DEFA1A3 copy numbers (Black et al., 2014) to infer the likely segregation of joint SNP-CNV haplotypes in family trios consisting of IgA nephropathy probands and their parents.To provide a systematic basis for the inferences, we phased SNP and DEFA1A3 CNV data from 989 Europeans using MOCSphaser (Kato et al., 2008), from which the haplotype frequency (*.hpfq) outputs gave a frequency distribution of different DEFA1A3 copy number states on each of the five common SNP haplotype backgrounds (Table S2).These distributions could then be used to calculate probabilities for each of the possible segregation patterns for each trio, and the definition of the most likely segregation pattern in each case (Figure 3 and Table S3).Although in some trios there were numerous possible patterns of SNP-CNV segregation, in most families there was a clear most likely segregation pattern, and in 167 trios (87%) the most likely segregation pattern had a posterior probability of 0.75 or greater (Table S3).
In evaluation of the transmission of specific haplotypes into affected offspring from heterozygous parents using TDTs, we first examined the four individual SNPs that between them define the five common haplotypes in Europeans, plus one SNP (rs2738048) highlighted as a lead SNP in several different association studies of IgA nephropathy (Ai et al., 2016;Kiryluk et al., 2014;Yu et al., 2012); none of these appeared to be significantly over-or undertransmitted from heterozygous parents to individuals affected with IgA nephropathy (Table 1).
We then examined (Table 2) the six most common joint SNP-CNV haplotypes defined in our segregation analysis, which indicated overtransmission of 3-copy Class 2 haplotypes ("3C2," odds ratio 1.59, raw p = 0.029) and possible undertransmission of 3-copy Class 1 haplotypes ("3C1," odds ratio 0.60, raw p = 0.051).In analyses restricted to the 181 trios for which the most likely segregation pattern had a relative likelihood greater than 0.6 of the total, the apparent unequal transmission remained nominally significant for both overtransmission of 3-copy Class 2 haplotypes (p = 0.024) and undertransmission of 3-copy Class 1 haplotypes (p = 0.026).However, given that six common haplotypes were examined, these possible effects were no longer statistically significant after Bonferroni correction for multiple testing.

DISCUSSION
In this work, we have applied new approaches to infer the detailed segregation of structural variants at the multiallelic CNV DEFA1A3 in family trios, by combining accurate measurement of gene copy number with knowledge of the population relationships between SNP variation and CNV status.Although some trios with high copy number in the proband had numerous possible segregation patterns with similar likelihoods, the constraints imposed by the observed segregation and the known copy number spectrum on each SNP-defined haplotype led to the definition of segregation in many trios with good levels of confidence.Despite this high level of informativeness for segregation, the signals of apparent overtransmission of 3-copy class 2 haplotypes and undertransmission of 3-copy class 1 haplotypes into affected offspring did not reach levels that were statistically significant after correction for testing of multiple haplotypes (Table 2).Because of recombination across the CNV repeat region the phylogenetic relationships of the major European SNP haplotypes are complex and do not shed light on the association findings.For example, the Exchange 1 haplotype appears to be of Neanderthal origin, and the ancestral sequence is represented by the reference haplotype, but these haplotypes do not seem to have a specific role in the associations observed.
If the observed over-transmission of 3-copy class 2 haplotypes into affected offspring is the true signal of a risk haplotype, the effect is not a property of class 2 haplotypes generally, as shown by the even segregation of the class 2-tagging SNP rs7825750 (Table 1); similarly, overtransmission is not seen for 2-copy class 2 haplotypes (Table 2).Functionally important differences between 3-copy and 2-copy class 2 haplotypes could arise from background-specific gene dosage effects, haplotypespecific gene sequence variants, or conceivably because of the presence of sequences important in longer-range control of gene expression distant from the DEFA1A3 region.
The implication of 3-copy class 1 alleles as a potentially protective factor is interesting in the light of the appearance of the class 1-tagging SNP rs7826487 as a lead SNP in this region in studies of the cellular composition of blood, including associations with monocyte and basophil counts (see Table S4 of Astle et al., 2016).The most frequent sequences of 3-copy class 1 and 3-copy class 2 alleles differ not only in their flanking SNPs but also in the internal structure of the DEFA1A3 gene array, with a copy of DEFA3 found on class 2 but not class 1 backgrounds (Black et al., 2014).
Finally, if the 3-copy class 2 and class 1 alleles do in fact have opposing functional effects, the otherwise puzzling consistency with which GWAS data implicate SNPs like rs2738048, which have no close relationship with DEFA1A3 structural variation, could be explained as an aggregate effect, effectively summing the varied effects of different haplotypes when analysed from the standpoint of a single SNP.

C O N F L I C T O F I N T E R E S T
The authors declare no conflict of interest.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available in the supporting information of this article.

F
Results from 1146 PRT measurements of DEFA1A3 copy number as part of this study, including members of 193 informative family trios.The two independent measurements using the cen1 or tel2 ratios are well correlated (r 2 = 0.86), and the underlying accuracy of the measurements is validated by the strong clustering of data points around integer values, most clearly at copy numbers of 6 or fewer samples we initially typed included members of 210 family trios with an offspring with IgA nephropathy, of which flanking SNP segregation could be deduced unambiguously for 193 informative trios independently of the CNV status.All of the 193 IgA nephropathy family trios for which the segregation of flanking SNP genotypes could be resolved 14691809, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/ahg.12481by University College London UCL Library Services, Wiley Online Library on [19/10/2022].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons LicenseF I G U R E 3 A simple example from IgA nephropathy family data illustrating the deduction from (a) SNP and CNV genotypes of (b) joint SNP-CNV haplotypes.In (a), the total DEFA1A3 copy number as measured by PRT methods is shown in red, with the SNP haplotypes deduced from SNP genotypes shown in blue.In ( Study design DPG JB JALA; Data collection NS DPG JALA; Contribution of new analytical tools and methodology NS ECM PEN AJF JALA; Data analysis NS JALA; Manuscript preparation JALA, with input from all authors.A C K N O W L E D G M E N T S This work was supported by an Innovation Grant from Kidney Research UK (IN_005_20170302).

TA B L E 1
Summary statistics for TDT analysis of SNP segregation in 193 family trios with an IgA nephropathy proband offspring.The first four SNPs act as tags for common haplotypes, and the relevant haplotypes are indicated in the second column