APOBEC3 genes encode cytidine deaminases endowed with the ability to inhibit retroviruses and retrotransposons. These genes have been targets of natural selection throughout primate evolutionary history. We analyzed their selection pattern in human populations observing that APOBEC3F and 3G are neutrally evolving. Conversely, nucleotide diversity was extremely high for APOBEC3H, and most tests rejected the hypothesis of selective neutrality in Eurasian populations. Haplotype analysis and the derived intraallelic nucleotide diversity test indicated that positive selection has driven the increase in frequency of one haplotype (Hap I) outside Africa. Consistently, population genetic differentiation between African and non-African populations was higher than expected under neutrality. A case–control association analysis indicated that Hap I is associated with protection from sexually transmitted HIV-1 infection. Hap I carries a protein-destabilizing variant and a residue conferring resistance to Vif-mediated degradation. Data herein suggest that lower protein stability might have been traded-off with a higher ability to circumvent Vif-mediated hijacking. Alternatively, transcription regulatory variants might represent the selection target. Our data represent an example of how the selective pressures exerted by extinct or unknown viral agents can be exploited to provide valuable information on the allelic determinants of susceptibility to modern infections.

Infectious agents have represented one of the stronger selective pressures for human populations. In particular, viruses have affected humans before they emerged as a species, as testified by the fact that roughly 8% of the human genome is accounted for by recognizable endogenous retroviruses (Lander et al. 2001), which represent the fossil remnants of past infections. In addition, the human genome contains a large number of both extinct and active non-LTR transposons (Lander et al. 2001). Higher eukaryotes have evolved mechanisms to sense and fight viral infections and to restrict the activity of mobile elements. Among these, APOBEC3 genes encode cytidine deaminases endowed with the ability to inhibit retroviruses and retrotransposons (reviewed in Ross [2009]). Different mammals have distinct complements of APOBEC3 genes, with mice harboring one single active gene and humans having seven APOBEC3 genes (A to D and F to H) located in cluster on chromosome 22 (Conticello 2008; Ross 2009).

In line with the paradigm whereby genes involved in the host–pathogen equilibrium are common targets of natural selection, interspecies analyses have indicated that APOBEC3 genes have been subjected to positive selection throughout primate evolutionary history (Sawyer et al. 2004; OhAinle et al. 2006), possibly as a consequence of adaptation to an ever-changing landscape of viruses and transposons. Not much is known about the forces that have shaped APOBEC3 gene diversity in a more recent past—i.e. during the evolutionary history of human populations. A previous study analyzed APOBEC3G nucleotide diversity in a relatively small number of humans with different ancestry and described no deviation from neutrality (Zhang and Webb 2004). Recently, OhAinle et al. (2008) showed that two polymorphic variants that destabilize the protein product of APOBEC3H have arisen independently in human populations, and suggested that the different frequency of stable and unstable products in distinct geographic areas may be the result of natural selection. These same authors reported that four major haplotypes segregate in human populations and only one of these (generally referred to as haplotype II) originates a stable APOBEC3H protein with relatively high antiviral activity against HIV-1, whereas the remaining haplotypes give rise to unstable proteins (OhAinle et al. 2008; Zhen et al. 2010). Still, one of this less-stable proteins (the one deriving from haplotype I) seems to be completely resistant to the HIV-1 encoded virion infectivity factor (Vif) (OhAinle et al. 2008; Li et al. 2010a; Zhen et al. 2010). Vif was previously shown to bind and inhibit the activity of APOBEC3F and APOBEC3G by targeting these proteins to ubiquitination and degradation in the proteosome (reviewed in Ross [2009]). Therefore, HIV-1 has developed an effective strategy to circumvent the inhibitory activity of human cytidine deaminases, and Vif or Vif-related proteins from different retroviruses show species-specificity in their ability to degrade APOBEC3 molecules (Ross 2009), suggesting coevolution and genetic conflict. The extent to which genetic diversity in APOBEC3 genes affects susceptibility to HIV-1 infection and viral load control in humans is still poorly understood. Previous studies indicated that a nonsynonymous variant in APOBEC3G is associated with faster progression to AIDS and a rare intron 4 polymorphism in the same gene seems to confer an increased risk of HIV-1 infection (reviewed in Piacentini et al. [2009]). Other data mapped a locus conferring resistance to HIV-1 infection to chromosome 22q12–13, where the human APOBEC3 gene cluster is located (Kanari et al. 2005). Recent results also indicate that the levels and the activity of APOBEC3 proteins modulate the progression of HIV infection as they positively correlate with CD4 counts and lower viral set points (Land et al. 2008; Ulenga et al. 2008); Finally, significantly increased quantities of APOBEC3G in CD14+ cells were also shown to be present in HIV-exposed but uninfected individuals (Biasin et al. 2007).

Here we show that natural selection has shaped genetic diversity at APOBEC3H by driving the increase in frequency of haplotype I in Europeans and Asians; this Vif-resistant haplotype is associated with natural protection from HIV-1 infection in an Italian population.

Material and Methods


Human genomic DNA from HapMap subjects (20 individuals for each population: YRI, Yoruba, CEU, European, EAS, East Asian) was obtained from the Coriell Institute for Medical Research. The APOBEC3H gene region we analyzed was PCR amplified and directly sequenced; primer sequences are available upon request. The genomic span of the resequenced region is chr22:37,825,628–37,828,074 (NCBI36/hg18 Build). PCR products were treated with ExoSAP-IT (USB Corporation Cleveland OH), directly sequenced on both strands with a Big Dye Terminator sequencing Kit (version 3.1 Applied Biosystems) and run on an Applied Biosystems ABI 3130 XL Genetic Analyzer (Applied Biosystems Italia, Monza, Italy). Sequences were assembled using AutoAssembler version 1.4.0 (Applied Biosystems), and inspected manually by two distinct operators. Table S1 lists HapMap individual IDs and all variants/haplotypes identified in the APOBEC3H region we analyzed.


Blood samples were collected from 70 Italian HIV-exposed seronegative (HESN) individuals and their HIV-infected sexual partners. Inclusion criteria for HESN were a history of multiple unprotected sexual episodes for more than 4 years at the time of the enrollment, with at least three episodes of at-risk intercourse within 4 months prior to study entry and an average of 30 (range, 18 to >100) reported unprotected sexual contacts per year. These subjects are part of a well-characterized cohort of serodiscordant heterosexual couples that has been followed since 1997 (Biasin et al. 2010). The study was reviewed and approved by the institutional review board of the S. M Annunziata Hospital, Florence. Written informed consent was obtained from all subjects.

Variants in the APOBEC3H genomic region were genotyped in the HESN and control samples through direct sequencing, as described above. In particular, we analyzed two regions of 386 bp (exon2, chr22:37826174–37826559) and 282 bp (exon4, chr22:37827193–37827474) encompassing rs34522862 and rs139297/rs139299, respectively. The polymorphic 32-bp deletion at the CCR5 locus was typed using a PCR-based method, as previously proposed by Cagliani et al. (2010b).


Genotype data for APOBEC3G and APOBEC3F were obtained from the SeattleSNP Discovery Resource website (http://pga.mbt.washington.edu/). In particular, both genes have been resequenced (with some minor gaps in introns) in 24 HapMap Yoruba and 23 HapMap CEU individuals. Genotype data for 5-kb regions from 238 resequenced human genes were derived from the NIEHS SNPs Program website (http://egp.gs.washington.edu). The NIEHS SNPs Program is focused on the analysis of environmental response genes; among the 647 genes included in the Program we selected all those that have been resequenced in samples of defined ethnicity (NIEHS panel 2). From these data, genotype information for Yoruba, Asian, and European subjects was obtained.

Haplotypes were inferred using PHASE version 2.1 (Stephens et al. 2001; Stephens and Scheet 2005).

Tajima's D (Tajima 1989), Fu and Li's D* and F* (Fu and Li 1993) statistics, as well as diversity parameters θW (Watterson 1975) and π (Nei and Li 1979) were calculated using libsequence (Thornton 2003). Calibrated coalescent simulations were performed using the cosi package with 10,000 iterations and best-fit parameters for YRI, CEU, and EAS (Schaffner et al. 2005). Coalescent simulation were also run using multiple demographic models (Marth et al. 2004; Voight et al. 2005; Gutenkunst et al. 2009). In all cases, coalescent simulations were conditioned on recombination and mutation rates. Estimates of the population recombination rate parameter ρ were obtained with the use of the Web application MAXDIP (http://genapps.uchicago.edu/maxdip/) and converted to cM/Mb. The maximum-likelihood-ratio Hudson-Kreitman-Aguadé (HKA) test was performed using the maximum-likelihood HKA (MLHKA) software (Wright and Charlesworth 2004), using 16 reference loci as previously proposed (Fumagalli et al. 2009). Briefly, 16 reference loci were randomly selected among NIEHS loci shorter than 20 kb that have been resequenced in the three populations; the only criterion was that Tajima's D did not suggest the action of natural selection (i.e., Tajima's D is higher than the 5th and lower than the 95th percentiles in the distribution of NIEHS genes).

The DIND (Derived Intraallelic Nucleotide Diversity) test was performed as previously proposed (Barreiro et al. 2009). Significance thresholds for each MAF interval were computed via coalescent simulations incorporating a demographic model (Schaffner et al. 2005).

The reduced-median network to infer haplotype genealogy was constructed using NETWORK 4.5 (Bandelt et al. 1999).

Because a resequencing gap is present in the APOBEC3H reference sequence for chimpanzee, in all analyses the orangutan sequence was used as the outgroup. Sequence information for Pongo pygmaeus abelii was obtained from the UCSC Website (http://genome.ucsc.edu/) and derives from the 2007 draft assembly (WUGSC 2.0.2/ponAbe2).

In order to test for gene conversion events, we applied Sawyer's gene conversion algorithm (Sawyer 1989) implemented in the GENECONV program. Significance was assessed using the approximate P-value method described in Karlin and Altschul (1990).

Haplotype association analyses were performed using PLINK (Purcell et al. 2007).

Data on expression QTL (eQTLs) were retrieved from the eQTL Browser (http://eqtl.uchicago.edu/).



The ability of APOBEC3 proteins to restrict HIV-1 infection in vitro has been demonstrated for family members 3G, 3F, and 3H (Conticello 2008; OhAinle et al. 2008; Li et al. 2010b; Zhen et al. 2010), and, based on their tissue and cell-type distribution, these three genes have been suggested to be the most likely contributors to HIV-1 resistance in humans (Refsland et al. 2010). To study the evolutionary history of APOBEC3G, 3F, and 3H, we selected gene regions known to carry functional (or putatively functional) variants. For both APOBEC3G and APOBEC3F, we analyzed regions covering exons 3 to 5; in the case of APOBEC3G this gene portion (3.1 kb) contains the R186H variant (in exon 4) that has previously been associated with delayed AIDS progression (An et al. 2004). As for APOBEC3F, the three exons harbor a nonsynonymous variant each. Both genes have been resequenced in Yoruba (YRI) and Europeans (CEU) by the SeattleSNPs Variation Discovery Resource. With respect to APOBEC3H, the four major haplotypes found in human populations result from the combination of SNPs located in exons 2 and 3. Specifically, the deletion of residue 15 (N) (rs34522862) and the G105R (rs139297) variant were reported to affect protein stability (OhAinle et al. 2008), whereas the K121D polymorphism is critical for Vif sensitivity/resistance (OhAinle et al. 2008; Li et al. 2010a; Fig. 1). Therefore, we focused on a 2.4-kb gene portion comprising exons 2 to 4, and we resequenced it in YRI, CEU, and East Asians (EAS) from HapMap. A total of 26 variants were identified in the APOBEC3H region (Table S1), none of them being a novel nonsynonymous substitution. With respect to known variants, we observed that rs139293 (R18L) has relatively high minor allele frequency (MAF) in CEU and EAS (MAF = 27.5% and 12.5%, respectively), while its reported MAF in HapMap is very low in these populations (MAF = 1.7% and 0%, respectively). We therefore checked electropherogarms and resequenced heterozygous subjects twice obtaining the same results. This difference has an impact on the distribution of major APOBEC3H haplotypes (see below).

Figure 1.

Schematic representation of APOBEC3F, APOBEC3G, and APOBEC3H. The exon–intron structure of APOBEC3F (A) and APOBEC3G (B) is shown together with the location of known nonsynonymous variants. The regions we analyzed are boxed within hatched lines. The shaded regions indicate resequencing gaps in SeattleSNPs data. The four major APOBEC3H haplotypes are shown in (C). They were designated as in previous works (OhAinle et al. 2008; Li et al. 2010a) and the region we resequenced is boxed.

For the three regions in APOBEC3G, 3F, and 3H, we calculated nucleotide diversity by means of two indexes: θW (Watterson 1975), an estimate of the expected per site heterozygosity, and π (Nei and Li 1979) the average number of pairwise sequence nucleotide differences. To compare the values we obtained for the APOBEC3 gene regions, we calculated θW and π for 5-kb windows (thereafter referred to as reference windows) deriving from 238 genes resequenced by the NIEHS program in the same population samples; the percentile rank corresponding to APOBEC3G, 3F, and 3H in the distribution of reference windows is reported in Table 1. Ranks were always above the 95th percentile for the APOBEC3H region, while APOBEC3F diversity was within average values calculated for reference windows. As for APOBEC3G, unusually high θW and π were obtained for CEU and YRI, respectively.

Table 1.  Summary statistics, MLHKA test, and FST for the APOBEC3 gene regions we analyzed.
GenePop.aLbNcSdθWeπfTajima's DFu & Li's D*Fu & Li's F*MLHKA FST (rank)
ValueRankgValueRankgValueRankg P h ValueRankg P h ValueRankg P h k P CEUEAS
  1. a Population.

  2. b Length (kb): the size of the analyzed region is reported. APOBEC3G and APOBEC3F have been resquenced by the SeattleSNPs Discovery Resource with some minor resequencing gaps in intronic regions; therefore the actual sizes of the resequenced regions amount to 2.9 and 4 kb, respectively. Gaps were accounted for in all calculations.

  3. c Sample size (chromosomes).

  4. d Number of segregating sites.

  5. e Watterson's theta estimation per site (×10−4).

  6. f Nucleotide diversity per site (×10−4).

  7. g Percentile rank relative to a distribution of 238 5kb segments from NIEHS genes.

  8. hP value applying demographic coalescent simulations.

  9. i Not available.

  10. j Not performed.

3G YRI3.1482015.260.9417.92 0.96 0.560.890.06−0.150.57 0.30 0.120.68 0.18 (0.70)n.a.i
  CEU   46 18 13.86 0.96  8.74  0.77 −1.17 0.14 0.91 −1.88 0.08  0.92 −1.95 0.08  0.93 2.3 0.19 - n.a.i
3F YRI4.7482814.870.9011.82 0.84−0.680.370.53−1.270.16 0.76−1.270.19 0.73n.p.jn.p.j 0.12 (0.48)n.a.i
  CEU   46 13  6.97 0.69 10.59  0.86  1.58 0.93 0.03  0.01 0.54  0.42  0.63 0.72  0.20 n.p.j n.p.j - n.a.i
3H YRI2.4402624.980.9929.35 0.99 0.590.890.06−0.200.55 0.37 0.790.67−0.212.570.018 0.36 (0.95)0.42 (0.97)
  CEU   40 21 20.17 0.99 32.62 >0.99  2.05 0.96 0.01  1.68 0.98 <0.01  2.13 0.98 <0.01 2.48 0.007 -  
EAS402423.060.9929.85 0.99 0.990.790.20 1.420.97 0.024 1.510.96 0.0473.330.003−0.008 (0.05)-

Natural selection acting on specific gene regions can determine a distortion in the site frequency spectrum (SFS). Common neutrality tests based on the SFS include Tajima's D (DT) (Tajima 1989) and Fu and Li's D* and F* (Fu and Li 1993). DT tests the departure from neutrality by comparing θW and π and positive values indicate an excess of intermediate frequency variants. Fu and Li's F* and D* are also based on SNP frequency spectra and differ from DT in that they also take into account whether mutations occur in external or internal branches of a genealogy. Because population history, in addition to selective processes, is known to affect the SFS, we evaluated the significance of neutrality tests by performing coalescent simulations that incorporate demographic scenarios (see Methods and Table S2 for multiple models; Schaffner et al. 2005). As explained above, we also applied an empirical comparison by calculating the percentile rank of DT, F* and D* for the APOBEC3 gene regions relative to 5 kb reference windows. Neutrality tests for APOBEC3H indicated departure from neutrality with significantly positive values for most statistics in CEU and EAS (Table 1). In line with these findings, DT, as well as Fu and Li's F* and D* calculated for APOBEC3H rank above the 95th percentile of the distribution of 5-kb reference windows in these two populations. Conversely, SFS-based statistics for APOBEC3H yielded no significant value in the YRI sample and no deviation from neutrality was observed for APOBEC3G and 3F (Table 1).

Our data (Table 1) indicate that nucleotide diversity indexes are extremely high for APOBEC3H in all populations, notably APOBEC3G also shows an increase in θW and π, although less marked than that observed for 3H. Yet, polymorphism level also depends on local mutation rates; therefore, under neutral evolution, the amount of within- and between-species diversity is expected to be similar at all loci in the genome. The multilocus HKA test was developed to verify this expectation (Wright and Charlesworth 2004). We thus next applied a multilocus MLHKA test by comparing polymorphism and divergence levels at the APOBEC3H and APOBEC3G genomic regions with 16 NIEHS genes resequenced in YRI, CEU and EAS. These 16 loci were selected among those resequenced by the NIEHS Program to be shorter than 20 kb and display no overt evidence of natural selection (see methods) (Fumagalli et al. 2009). The APOBEC3H region displays a significant excess of polymorphism compared to divergence in all populations (Table 1). Conversely no significant excess of intra- versus interspecies diversity is observed for APOBEC3G (Table 1).

As reported above, APOBEC3 genes are organized in a cluster on chromosome 22, raising the possibility that nonhomologous gene conversion is responsible for the high nucleotide diversity observed in APOBEC3H and, to a lower extent, in APOBEC3G. We therefore applied Sawyer's gene conversion algorithm (Sawyer 1989) which identified no region of gene conversion within the analyzed APOBEC3H region (see Methods). In the case of APOBEC3G, a region of apparent gene conversion with APOBEC3D is evident; in particular, the global P value was significant in a relatively small (510 bp) 3′ portion of the analyzed region.

Under neutral evolution, genetic differentiation between populations is the result of demographic and casual (genetic drift) effects. Natural selection might either increase or decrease population genetic differentiation as allele frequencies may be driven to differ more than expected on the basis of neutral forces. We calculated population genetic differentiation by means of FST (Wright 1950) and, again, we compared the pairwise FST values between populations to those obtained for reference windows. As summarized in Table 1, FST between YRI and EAS is 0.42 for APOBEC3H, an unusually high value (above the 95th percentile) and the same applies to CEU/YRI. Conversely, FST between CEU and EAS is significantly lower than expected (Table 1). In the case of APOBEC3G and APOBEC3F (Table 1) no unusual FST value was obtained.

Overall, these data suggest that, whereas APOBEC3F and APOBEC3G are neutrally evolving in humans, APOBEC3H has evolved in response to selective forces that have acted, with different strengths and pressures, in the populations examined.


Further insight into the evolutionary history of a gene region can be gained by inferring haplotype genealogies. This has a descriptive purpose (i.e., showing the relationship among alleles and their distribution in human populations) and can also be used to infer the underlying selective scenarios. We reconstructed APOBEC3H haplotype genealogy using a reduced-median network (Bandelt et al. 1999). This analysis revealed a relatively complex genealogy with four major haplotypes showing extremely different frequencies in the three populations (in agreement with the FST values we calculated; Fig. 2). Some recurrent mutations are also evident and possibly result from recombination/gene conversion events (recombination rate in the region is relatively high ranging from 1.2 to 2.7 cM/Mb, depending on the population). As reported above, the frequency of the R18L variant is higher than previously estimated and we observed no CEU chromosome carrying the previously described haplotype III, as all European haplotypes with the N15 deletion also harbor the 18L variant, being therefore classified as Hap IV (Fig. 2). As it is evident from the network, Hap I represents a major, homogeneous haplotype in CEU and EAS: all European chromosomes carrying the 105G (variant 21 in the network) and 121K (deriving from the derived allele at position 23 and ancestral allele at position 24) are identical (with the exception of one chromosome that is differentiated by a single variant). A similar observation applies to EAS chromosomes but in this population a single 105G/121K chromosome is highly divergent, possibly as a result of a complex recombination/gene conversion event. Conversely, CEU and EAS chromosomes carrying the 105R and 121D alleles are split into different haplotype clades. This observation suggests that Hap I has rapidly increased in frequency in CEU and EAS due to selection, resulting in limited accumulation of neutral diversity. To verify this hypothesis we applied the DIND test that was shown to have high statistical power for selected alleles at frequency lower than 70% (as in this case; Barreiro et al. 2009). The DIND test is based on the ratio of intraallelic diversity associated with the ancestral and derived alleles (iπA/iπD) plotted against the frequency of the derived allele (DAF) (Fig. 3): a high value of iπA/iπD for variants with high DAF is suggestive of positive selection, as the neutral diversity associated with the derived allele is limited despite its high frequency in the population. We applied the DIND test to APOBEC3H, and statistical significance was calculated by coalescent simulations that incorporate a demographic model (see Methods). Significantly high values of DIND were obtained in CEU for rs139297 (105G) and rs139298/rs139299 (121K). As for EAS, a significant DIND value was obtained only for rs139285 (variant 1 in the network). The failure to reach statistical significance for the 105G and 121K alleles is due to the presence of the recombinant EAS chromosome, as its removal from the dataset resulted in significant high iπA/iπD for these variants, as well (Fig. 2). Very similar results were obtained when coalescent simulations were performed using multiple demographic models (Fig. S1).

Figure 2.

Genealogy of APOBEC3H haplotypes. The haplotype genealogy (A) was reconstructed through a reduced-median network. Each node represents a different haplotype, with the size of the circle proportional to frequency. Nucleotide differences between haplotypes are indicated on the branches of the network. Circles are color-coded according to population (green: YRI, blue: CEU, red: EAS). The most recent common ancestor (MRCA) is also shown (black circle). The relative position of mutations along a branch is arbitrary. Positions that may have been involved in recombination/gene conversion events are highlighted in different colors depending on their being located on branches leading to Haplotypes I (red) or Haplotypes II (blue).

Figure 3.

DIND test for APOBEC3H. The DIND test was performed for CEU (A) and EAS (B). P values were calculated through coalescent simulations that incorporate demographic scenarios (Schaffner et al. 2005). The continuous and hatched blue lines indicate the median and the 95th percentile, respectively. In the case of EAS, the test was performed using all chromosomes (green dots) and with the exclusion of the single divergent chromosome carrying Hap I (red dots).


We next verified whether APOBEC3H haplotypes may modulate the susceptibility to HIV-1 infection. Most humans are susceptible to the virus, but a minority of individuals do not seroconvert despite multiple exposures. We analyzed a cohort of well-characterized Italian heterosexual HIV-exposed seronegative (HESN) individuals who have a history of unprotected sex with their seropositive partners, who were used as the control population (HIV). No HESN was homozygous for the CCR5Δ32 variant, which confers resistance to R5 HIV-1 strains (Samson et al. 1996). Four variants in the HESN and HIV populations were typed: rs139292 (N15Del), rs139297 (R105G), and rs139298/rs139299 (D121K); this allows discrimination of the major APOBEC3H haplotypes (Hap I, Hap II and Hap III/IV), while the R18L has never been shown to alter protein stability, antiviral activity or Vif sensitivity (OhAinle et al. 2008; Li et al. 2010a; Zhen et al. 2010). As shown in Table 2, APOBEC3H haplotypes were differentially represented in HESN and HIV individuals, the most significant difference being accounted for by an over-representation of haplotype I in HESN (P= 0.0056).

Table 2.  Association analysis of APOBEC3H haplotypes in HESN and HIV-infected subjects.
HaplotypeFrequency (HESN)Frequency (HIV)χ2 P
II 0.1291 0.2231 4.099 0.0429

Analysis of linkage disequilibrium (LD) over a genomic region that encompasses APOBEC3F, 3G, 3H, and CBX7 (Fig. 4) indicated that, in Europeans, the APOBEC3H SNPs we analyzed are not in LD with variants located in nearby genes. Inspection of known expression QTLs (eQTLs) in the region revealed that several eQTLs for APOBEC3H, APOBEC3G, and APOBEC3F are located within the LD block comprising the SNPs we analyzed.

Figure 4.

Analysis of linkage disequilibrium. LD (r2) plot in CEU for a genomic region encompassing APOBEC3H and covering upstream and downstream genes. Data were derived from HapMap. The region where the three APOBEC3H coding variants are located is denoted with the shading (only rs139298 has been included in HapMap). The insert shows the LD block where rs139298 is located; known eQTLs are indicated with the circle and with the APOBEC3 family member they affect. QTLs have been mapped in lymphoblastoid cell lines (rs139297, rs139314, rs139316, rs139317), monocytes (rs139291, rs139294, rs139314) or fibroblasts (rs139298).


Host–pathogen interactions are typically dynamic, vary spatially, and temporally, and are believed to be among the major determinants of molecular evolution. On the one hand, the evolutionary history of humans (as that of most living organisms) has witnessed continuous waves of viral infections that contributed to shaping our repertoire of antiviral genes. On the other hand, new viral species emerge and have encountered human populations very recently, therefore having no time to leave a detectable selective signature. Rather, these new pathogens interact with a repertoire of antiviral genes and mechanisms that has been largely shaped by other infectious agents. Notably, it has recently become clear that selective events exerted by unknown or extinct pathogens have the potential to affect the susceptibility/resistance to modern infections (Johnson and Sawyer 2009; Emerman and Malik 2010). Genes that modulate the susceptibility to HIV-1 infection or the progression to AIDS have been targets of balancing (Bamshad et al. 2002; Cagliani et al. 2010a; Cagliani et al. 2010b) or positive (Ortiz et al. 2009) selection in human populations, although the selective pressure underlying the detected signatures must have been exerted by pathogens other than HIV-1. It follows that analysis of selection patterns of antiviral response genes might provide valuable information on the allelic determinants of susceptibility to modern infections. The results we present herein provide a further example of a gene, APOBEC3H, involved in HIV-1/host interaction that has been targeted by a relatively recent selective pressure. Surprisingly, our data indicate that positive selection has driven the frequency increase of an APOBEC3H haplotype that originates protein products displaying limited stability and antiviral activity, and that fail to localize within the HIV virion core (Ooms et al. 2010).

APOBEC3 genes have been targets of natural selection throughout primate evolutionary history (Sawyer et al. 2004; OhAinle et al. 2006), indicating that diverse viruses in distinct species, over different time periods have exerted a selective pressure on these loci. In line with a previous report that analyzed a limited sample of human chromosomes (Zhang and Webb 2004), our data indicate that the APOBEC3G region we analyzed and its R186H variant, which has been associated with rapid progression to AIDS, are neutrally evolving in human populations; the same applies to a region in APOBEC3F that carries three nonsynonymous variants. Conversely, our data strongly support the notion whereby natural selection has shaped the distribution of APOBEC3H haplotypes in humans. High nucleotide diversity, a deviation of the SFS toward intermediate frequency alleles, and an excess of polymorphism levels compared to divergence are usually regarded as strong signatures of balancing selection (Charlesworth 2006). Yet, both balancing and positive selection are initiated by the spread in a population of a newly selected allele (or haplotype) until either selection opposes (balanced situation) or promotes (complete sweep) its fixation (Charlesworth 2006). Therefore, in their initial stage, the two selective regimes are indistinguishable. Haplotype analysis and the DIND test suggest that Hap I has rapidly increased in frequency in Europeans and Asians, an observation that is consistent with both balancing selection and with an ongoing selective sweep at APOBEC3H. These results are unlikely to be due to the possible confounding influences of demography, as application both of empirical comparisons and of multiple demographic models resulted in the rejection of neutrality.

OhAinle et al. (2008) previously noticed that the distribution of APOBEC3H haplotypes is very different in human populations with different ancestry. This observation, together with the independent acquisition of two destabilizing variants (105G and 15Del), led the authors to suggest that APOBEC3H might be subject to natural selection, possibly favoring the spread of unstable haplotypes in Eurasia (loss-of-function hypothesis) and the selective preservation of haplotypes encoding stable proteins in Africa (OhAinle et al. 2008). Herein, we provide a formal demonstration of natural selection acting on APOBEC3H. In agreement with OhAinle et al. (2008), our data suggest that one of the two variants resulting in decreased enzyme stability, namely glycine at residue 105, has been driven to high frequency by natural selection outside Africa, whereas the deletion of residue 15 (N15Del) seems to be neutrally evolving in all populations. An interesting possibility is that old selective pressures in the ancestral African environment have promoted the maintenance of APOBEC3H diversity, including the stable and effective HapII. Out-of-Africa migrations might have subsequently exposed human populations to novel pathogen landscapes where, despite lower stability of the protein product, one or more infectious agents favored the spread of haplotype I.

The selection target may be represented by the 121K allele and glycine 105 may have risen in frequency in EAS and CEU due to genetic hitch-hiking. Residue 121 has been shown to be critical for Vif-mediated degradation of APOBEC3H, with the presence of a lysine conferring full resistance (Li et al. 2010a). Therefore, lower protein stability might have been traded-off with a higher ability to circumvent the hijacking exerted by Vif or Vif-related proteins encoded by one or more unknown retroviruses. Alternatively, the selection target might not be represented by APOBEC3H coding variants, but by regulatory polymorphisms located in the region. Indeed, several eQTLs for three APOBEC3 family members (3G, 3F, and 3H) are in tight LD with rs139298 (which accounts for the D121K variant; Fig. 4). Specifically, these SNPs have been reported to modulate the expression of APOBEC3 genes in monocytes, lymphoblastoid cell lines, and fibroblasts, suggesting that one of the eQTLs (or a combination of them) may represent the selection target and affect the expression of APOBEC3 genes in tissues that are relevant to the transmission of viral infections, including HIV-1. This possibility would also help reconcile our data with previous in vitro studies that described haplotype II as encoding the most effective protein in HIV-1 restriction both using wild-type and vif-defective viruses (OhAinle et al. 2008; Li et al. 2010a; Zhen et al. 2010). Most analyses on the role of APOBEC3H in HIV-1 restriction have relied on infection experiments in cultured cells where distinct protein products, deriving from major haplotypes, are expressed from exogenous promoters (OhAinle et al. 2008; Li et al. 2010a; Zhen et al. 2010). Although valuable, these analyses take into no account the expression pattern of APOBEC3H (and other APOBEC3 family members) under physiological conditions. Recent data have indicated that APOBEC3H is expressed both in the cervix and in the colon (Refsland et al. 2010). Whether APOBEC3H expression in such tissues is modulated by SNPs in the positive selection region remains to be evaluated, but it is worth noting that antiviral mucosal responses might play a central role in the protection from sexually transmitted HIV-1 infection (Haynes and Shattock 2008). Also, APOBEC3H expression is strongly induced upon CD4± T-cell activation, a finding that led (Refsland et al. 2010) to suggest that transcriptional regulation of distinct APOBEC3 genes may have a central role in determining their effective contribution to HIV-1 resistance. In line with these findings, recent reports in experimental mouse models have indicated that in vitro studies might underestimate the effect that antiviral factors play in vivo (reviewed in Ross [2009]). Despite these considerations, it is also worth mentioning that Hap I-derived APOBEC3H proteins despite being less stable, are incorporated in HIV-1 virions at the same level as those deriving from Hap II, but fail to localize within the virion core due to lack of interaction with the nucleocapsid (Ooms et al. 2010). This observation suggests that whatever the expression level of APOBEC3H, its antiviral activity against HIV-1 is expected to be limited. Yet, previous data on APOBEC3G have suggested that the protein can block the infectivity of incoming HIV-1 virions without inducing editing (reviewed in Cullen [2006]); if this were also the case for APOBEC3H, Hap I may display some antiviral activity despite its mislocalization. Thus, we do not favor a scenario whereby selection operating on APOBEC3H may represent a “less is more” instance (Olson 1999), as in this case we would not expect the selected Hap I to be associated with protection from HIV-1. Indeed, our LD analysis indicates that SNPs accounting for major APOBEC3H haplotypes comprise a single block that covers the gene and its flanking regions. Therefore, the possibility that the association we observed is secondary to variants located in nearby genes can be ruled out. However, as mentioned above, the SNPs we analyzed are in tight LD with eQTLs for APOBEC3G, 3F, and 3H, suggesting that a selected regulatory variant(s) may mediate resistance to sexually transmitted HIV-1 (e.g., through increased expression in mucosal districts). Further experiments will be required to address this possibility. In particular, genotyping of APOBEC3H followed by infection assays using peripheral blood cells from healthy donors, and expression analysis in mucosal tissues might be instrumental in clarifying the role of APOBEC3H haplotypes in modulating natural resistance to HIV-1.

Research on APOBEC has mostly focused on 3F and 3G, results herein demonstrate that 3H has been much more heavily targeted during recent human evolution compared to the other two genes. Data indicating that 3H haplotypes are differentially represented in HESN reinforce the hypothesis that ancient events that selected gene variants are responsible for the modulation of resistance and susceptibility to current infectious agents.

Associate Editor: C. Burch


MC is supported by grants from Istituto Superiore di Sanita’“Programma Nazionale di Ricerca sull’ AIDS”, the nGIN EC WP7 Project, the Japan Health Science Foundation, 2008 Ricerca Finalizzata [Italian Ministry of Health], 2008 Ricerca Corrente [Italian Ministry of Health], Progetto FIRB RETI: Rete Italiana Chimica Farmaceutica CHEM-PROFARMA-NET [RBPR05NWWC], and Fondazione CARIPLO.