The existence of organ-specific human immunodeficiency virus type 1 (HIV-1) populations within infected hosts has been long lasting studied. Previous work established that population subdivision by organs occurs at the envelope env gene, but less is known about other genomic regions. Here, we used a population genetics approach to detect organ compartmentalization in proviral sequences of HIV-1 gag and pol genes. Significant population structure was found in pol (100% of cases) and gag (33%) pair-wise organ comparisons. The degree of compartmentalization positively correlated with the ratio of nonsynonymous to synonymous substitutions, and codons showing organ compartmentalization were more likely to be under significantly positive selection. This suggests that HIV-1 populations dynamically adapt to locally variable intra-host environments. In the case of pol gene, differential penetration of antiretroviral drugs might account for the observed pattern, whereas for gag gene, local selective pressures remain unexplored.
Progression of human immunodeficiency virus type 1 (HIV-1) infection is often accompanied by increasing viral diversity (Shankarappa et al. 1999). During highly active antiretroviral therapy, plasma HIV-1 RNA falls to undetectable levels. However, it has been shown that viral replication is not completely suppressed and that proviral DNA persists in latently infected cells (Chun et al. 1997; Finzi et al. 1999; Natarajan et al. 1999). Some of these cells are located in reservoirs, in which antiviral drugs are not efficient enough to extinguish the virus. Indeed, several studies have shown that most viral populations belonging to the same patient are compartmentalized and hence can have different evolutionary trajectories.
Although the most prominent case is that of the central nervous system, differences between samples taken from other body sites, such as blood and semen, blood and cervical swabs, or blood and lung have also been reported (reviewed in McGrath et al. 2001 and in Petito 2004). However, in some cases, nonstructured populations have also been found (Ball et al. 1994; Hughes et al. 1997; van der Hoek et al. 1998; van't Wout et al. 1998). Difficulties in drawing general conclusions from the above studies probably stem from the limited number of patients analyzed and the variety of statistical or phylogenetic methods used to detect compartmentalization.
Several molecular mechanisms can be involved in the maintenance of organ-structured viral populations. Resident macrophages are the main focus of infection in nonlymphoid organs and hence, cell tropism can account for differences between env sequences sampled from different organs (Ball et al. 1994; Korber et al. 1994; Reddy et al. 1996). Similarly, some viral epitopes can trigger local immune responses, thus creating spatially heterogeneous selective pressures (van't Wout et al. 1998). In addition, antiviral drug treatments can promote population differentiation, as suggested by comparisons between blood and brain (Wong et al. 1997; Gatanaga et al. 1999; Fellay et al. 2002; Strain et al. 2005), or blood and lymphoid nodes (Haddad et al. 2000). However, standalone, these data do not allow ascertaining whether organ compartmentalization is the consequence of differential selective pressures. Physical barriers to virus migration or limited cell trafficking, as well as differences either in host-cell turnover or in viral replication rate, might also promote divergence of viral subpopulations (Itescu et al. 1994; Karlsson et al. 1998).
To overcome difficulties caused by the variety of methods used to detect compartmentalization and the reduced sample sizes, we previously analyzed sequence data from the env hypervariable V3 region from several studies (Sanjuan et al. 2004) and we used the classical FST statistic and the analysis of molecular variance (AMOVA) (Excoffier et al. 1992) to quantify population structure. The results indicated that population differentiation correlated to the intensity of positive selection between organs and hence were consistent with adaptive changes in cellular tropism. Yet, no evidence was found for or against restricted virus migration between organs. Because contrary to migratory restrictions, selection can be gene-specific, the previous finding of organ differentiation at the env gene does not necessarily imply that the same should hold for other regions of the genome. Moreover, recombination rates in HIV-1 are high enough to produce nearly independent evolution of env, gag, and pol genes (Morris et al. 1999; Philpott et al. 2005; Rhodes et al. 2005) and rapid dissipation of linkage disequilibrium (Kitrinos et al. 2005).
Because the env gene is a major determinant of cell tropism (Liu et al. 1990; Cann et al. 1992), sequencing effort has usually concentrated on this region, leaving other potentially important genes relatively unexplored. The goals of this study are to quantify organ compartmentalization at gag and pol genes and to ascertain whether selection plays a significant role in the establishment of this population structure.
Materials and Methods
Proviral gag and pol sequences were downloaded from the NCBI database (http://www.ncbi.nlm.nih.gov). Accession numbers were drawn from Hughes et al. (1997, patients P4, P5, and P6) and Morris et al. (1999, patients NA021 and NA234) for gag gene, and from Wong et al. (1997, patients A, B, C, and D) and Gatanaga et al. (1999, patients 1 to 4) for pol gene. In these datasets, there were at least two nonidentical sequences per organ and two organs per patient, which are necessary conditions to perform an FST analysis.
We detected several problems with the data from Gatanaga et al. (1999). First, a large portion of sequences annotated as belonging to pol gene from patient 2 were indeed env V3 sequences (for instance AB007272, AB007289, or AB007268). Second, some sequences from patient 4 were clearly divergent. After performing a neighbor-joining tree, we found that they clustered together with pol sequences from patient 2 (not shown). We excluded the above env sequences and the pool of highly divergent sequences from patient 4. Table 1 shows relevant information about the sampled organs and the number of different sequences available from each patient.
Table 1. References, patients, and sampled organs for gag and pol regions. For each organ, the numbers of sequences that were analyzed are shown in parentheses.
Nucleotide alignments were performed with Clustal-X version 1.81 (Thompson et al. 1997) and edited manually to verify that reading frames were not disrupted.
POPULATION STRUCTURE ANALYSIS
To quantify population structure between pairs of organs of the same patient, we calculated FST statistics obtained from AMOVA (Excoffier et al. 1992). Briefly, the frequency of haplotype i in organ j can be written as xij=x+aj+bij, where ai and bij are the organ and the haplotype within-organ-specific effects, respectively. These two factors have associated variances σ2a and σ2b. Denoting by σ2T the total variance among haplotypes within a patient, FST equals the ratio σ2a/σ2T and can be estimated from a nested analysis of variance (ANOVA) as described in Schneider et al. (1997). The FST between each given pair of organs measures the level of population structuring. The statistical significance of FST was evaluated by a permutation test with 1000 pseudo-replicates. It could be argued that P-values themselves were subject to unknown statistical error. To take this into account, the standard error of the P-value associated to FST statistic, noted SEp, was also obtained from permutation tests. A statistic S = P + t0.95,df×SEp was calculated, where t0.95,df is the 95% confidence value of a t-distribution with df =ni+ nj− 1 degrees of freedom and ni and nj are sample sizes for organs i and j. Conservatively, FST can be considered significant if S < 0.05. These calculations were performed using ARLEQUIN version 3.01 (Excoffier et al. 2005) and SPSS version 12.0 (http://www.spss.com).
SYNONYMOUS AND NONSYNONYMOUS SUBSTITUTION RATES
We calculated mean synonymous (dS) and nonsynonymous (dN) pairwise distances using the Nei and Gojobori's method, modified with Jukes and Cantor's correction for multiple substitutions (Zhang et al. 1998). These analyses were conducted using MEGA version 3.1 (Kumar et al. 2004). The average pairwise ratio of nonsynonymous to synonymous substitutions (dN/dS) was calculated for each patient. For each organ i within each patient, an average intra-organ dN/dS was estimated using only the ni (ni– 1)/2 sequence pairs belonging to this organ. Similarly, for each organ pair (i, j), the average inter-organ dN/dS was obtained using only the ninj sequence pairs taken for different organs.
We also used the CODEML sitewise method to detect codons that were under significantly positive selection. First, for each set of sequences belonging to the same organ, we compared models of evolution using Modeltest version 3.7 (Posada and Crandall 1998). In all cases, the best model was Hasegawa-Kishino-Yano (HKY) (Hasegawa et al. 1985) with rates varying across sites according to a Gamma distribution. Then, consensus minimum evolution trees obtained from 10,000 bootstrap pseudo-replicates were done with MEGA version 2.1 and, because this program does not implement the HKY model, we selected the slightly more general Tamura-Nei model (Tamura and Nei 1993). Finally, we run CODEML within the freely available PAML package version 3.13 (Yang 1997, http://abacus.gene.ucl.ac.uk/software/paml.html). Model M3 was judged optimal among M0, M3, and M8 according to a likelihood ratio test. This allowed us to obtain for each codon the estimated ratios of nonsynonymous to synonymous substitutions (ω-values) and to test whether these ratios were significantly larger than one.
DRUG RESISTANCE ANALYSIS
Sequences from pol gene were tested for drug resistance mutations in the reverse transcriptase using the HIV Drug Resistance Database from the Stanford University site (http://hivdb.stanford.edu). We summed all resistance values for each site to obtain a total additive resistance value. Resistance values from Gatanaga et al. (1999) were directly taken from the original publication. Subsequent statistical analyses were performed with SPSS version 12.
GAG SEQUENCES ANALYSIS
We calculated, for each of four patients, all FST statistics between pairs of organs. After introducing a conservative correction to account for inaccuracy in P-values, 7/21 FST comparisons were statistically significant at the 5% level. Five cases remained significant (24%) after Bonferroni multiple test correction.
We calculated dN/dS for each patient and we found that in all cases, selection was predominantly purifying. The across-patient average was significantly lower than one (dN/dS= 0.509 ± 0.035, n= 5, Wilcoxon's Z=−2.023, P= 0.043). For each patient, we calculated dN/dS at interorgan and intraorgan levels and we found no statistical differences between them (interorgan dN/dS= 0.499 ± 0.050, intraorgan dN/dS= 0.497 ± 0.054, n= 5, paired Wilcoxon's Z=−0.135, P= 0.893).
To assess whether compartmentalization could be attributed to differential selective pressures, we divided the 21 organ pairwise comparisons into two groups according to the statistical significance of FST. As it can be seen in Figure 1, the average inter-organ selection was more relaxed for organ pairs showing population structure (dN/dS= 0.577 ± 0.070) than for those lacking population structure (dN/dS= 0.418 ± 0.025; Z =−2.164, n= 21, P= 0.031). In contrast, at the intraorgan level, there were apparently no differences in dN/dS between these two groups (dN/dS= 0.471 ± 0.056 and dN/dS= 0.452 ± 0.029, respectively; Z=−0.480, P= 0.689). However, the results were not fully conclusive, because differences between the two groups were not reproducible after using Bonferroni-corrected P-values.
To explore the relation between organ-specific selection and compartmentalization in further detail, we first sought to determine which codons were under significantly positive selection, for each organ within each patient. Then, we calculated FST indexes and their statistical significance on a per-site basis. Codons in which at least one position showed significant FST were classified as compartmentalized. Among noncompartmentalized codons, 25.5% showed evidence for positive selection, whereas among compartmentalized codons, this percentage increased to 42.9%. Codons that were both under positive selection and compartmentalized were 48.9% more frequent than expected by chance (Table 2). This statistically significant association between compartmentalization and positive selection (Fisher's exact test, P= 0.022) further suggests that adaptation is organ specific.
Table 2. Association between compartmentalization and positive selection. For each category, observed and expected (in parentheses) numbers of codons are shown. Counts from all patients and studies are pooled together. To avoid bias, only variable codons were included in the analysis. Most codons were polymorphic at only one of the three nucleotidic sites. Codons that were polymorphic at more than one nucleotidic site were classified as compartmentalized if at least one position showed a significant FST. Indeed, in most cases, FST significance was concordant for the two/three polymorphic sites and, in the rare nonconcordant cases, inspection of the sequences clearly revealed that the same nucleotidic site was the target of compartmentalization and positive selection. Codons were classified as being under positive selection if the ω-value was significantly larger than one in at least one organ.
Positive selection Within organs
POL SEQUENCES ANALYSIS
We calculated, for each of eight patients, FST statistics between pairs of organs. All 14 FST statistics were significant even after introducing a conservative correction to account for inaccuracy in P-values, indicating strong organ compartmentalization. After applying the correction of Bonferroni for multiple tests, 12 out of 14 comparisons remained significant.
We calculated dN/dS for each of the eight patients and we found that selection was predominantly purifying (dN/dS < 1) in six of them, though the mean value across patients did not significantly differ from neutrality (dN/dS= 0.722 ± 0.187, Wilcoxon's Z=−1.260, P= 0.208), most probably due to the low sample size. We then computed, for each patient, dN/dS at the interorgan and intraorgan levels. In 13 out of 14 cases, purifying selection was stronger at the intraorgan level (interorgan dN/dS= 0.658 ± 0.147, intraorgan dN/dS= 0.481 ± 0.080, paired Wilcoxon's Z=−2.542, P= 0.011).
To determine whether compartmentalization was explained by selection, we correlated FST between each organ pair to dN/dS at intra- and interorgan levels (Fig. 2). The correlation between FST and inter organ dN/dS was significantly positive, despite the presence of a clear outlier (Spearman's correlation ρ= 0.670, n= 14, P= 0.009, Fig. 2A). In contrast, the correlation between FST and within-organ dN/dS was weaker and not significant (ρ= 0.451, n= 14, P= 0.106, Fig. 2B). The results remained essentially the same after the removal of the outlier (interorgan ρ= 0.665, n= 13, P= 0.013; within-organ ρ= 0.418, n= 13, P= 0.156).
As above, the association between organ-specific selection and compartmentalization was further explored using a sitewise method. Among noncompartmentalized codons, 17.3% were under significantly positive selection, whereas among compartmentalized codons, this percentage increased to 31.3%. Codons that showed both compartmentalization and significantly positive selection were 56.3% more frequent than expected by chance (Table 2). This significant association (Fisher's exact test, P= 0.022) indicates that selection is involved in organ compartmentalization.
DRUG RESISTANCE SCORES
A possible explanation to organ-specific adaptation is that differences in drug penetrability or stability between different compartments could create differential selective pressures acting on pol gene. As a consequence, susceptibility to antiretroviral therapy should vary across organs. To explore this possibility, we obtained resistance scores for pol sequences belonging to each of the eight patients from Wong et al. (1997) and Gatanaga et al. (1999). All patients had been treated with zidovudine (AZT) but the therapy failed due to the appearance of resistances. Patients A and D from Wong et al. (1997) and patient 1 from Gatanaga et al. (1999) had also been intermittently treated with ddI and/or ddC but no primary mutations for resistance to these drugs were found. We thus focused on predicted AZT-resistance scores.
A nested ANOVA, accounting for the effects of the study (two studies), the patient (eight patients), and the organ (spleen, lymphatic nodes, cerebrospinal fluid, and brain), detected marginally significant differences between studies (F1,6= 5.190, P= 0.062) and no differences between patients (F6,11= 1.673, P= 0.217), whereas differences between organs were evident (F11,250= 33.368, P < 0.001). These differences remained fully significant when samples from nervous system (cerebrospinal fluid and brain) were pooled together (F10, 250= 36.420, P < 0.001). For data drawn from Wong et al. (1997), over all patients, samples from nervous system had a lower predicted resistance to AZT than samples from the rest of the body (nervous system score = 16.39 ± 2.458, n= 71; rest of the body score = 23.80 ± 2.126, n= 86; Mann-Whitney U=−2.217, P= 0.027), though these results could not be generalized for the whole dataset.
After carrying out a sitewise analysis, we found AZT-resistance mutations in 2.7% of noncompartmentalized codons, and in 9.4% of compartmentalized codons. Codons that were both compartmentalized and had some resistance mutations were 2.3 times more likely than expected by chance (Fisher's exact test, P= 0.028).
Using the FST statistic, specifically conceived for the analysis of population structure, we detected significant compartmentalization in the vast majority of pairwise organ comparisons involving the pol gene and, to a lesser extent, also in the gag gene. These results extent our previous findings with the env V3 hypervariable region (Sanjuan et al. 2004) and seem to indicate that in HIV-1, organ compartmentalization is genome-wide. In all three regions, there was a correlation between the ratio of nonsynonymous to synonymous substitutions and the degree of compartmentalization.
Does the link between dN/dS and FST prove the implication of selection in organ compartmentalization? In a large nonstructured population splitting into two subpopulations, without environmental changes or subsequent adaptation in either population, FST could increase through time, but with no change in dN/dS. Therefore, if compartmentalization arose as a mere consequence of restrictions to genetic flow, no correlation between FST and dN/dS could be expected. However, if some environmental changes occurred, whether before or after the split, there would be a transient increase in within-organ dN/dS which would remain detectable until the fixation of the new adaptive mutations. At the interorgan level, unless perfect parallel evolution occurred, changes in dN/dS could also be observed and, importantly, the latter would remain recorded, therefore producing a positive correlation between FST and interorgan dN/dS. Also, if environmental changes occurred in only one subpopulation, the other could be thought of as a viral reservoir in which ancestral genotypes would be archived. This would create a temporal rather than spatial structuring, but otherwise, the interpretation of FST and dN/dS values would remain the same. Importantly, the predictions would be the same if genetic flow was unrestricted.
In all cases, environmental changes in at least one body site would change the local selective pressures and the genetic composition of the resident viral population. Therefore, in the absence of drift, the finding of a positive correlation between inter-organ dN/dS and FST and a weaker, if any, positive correlation between within-organ dN/dS and FST strongly supports the role of selection in organ compartmentalization.
Despite a correlation between the dN/dS ratio and the degree of compartmentalization in env, gag, and pol regions, there is an important difference between env and the other two genes. The former is under predominantly positive selection (dN/dS > 1), whereas the latter are under predominantly negative selection (dN/dS < 1). In a setting with generalized positive selection, terminal branches of intrahost genealogies might harbor recent slightly deleterious mutations, whereas deep branches might accumulate most of the adaptive changes. As a consequence, interorgan dN/dS would tend to be higher than within-organ dN/dS. This effect would be more pronounced for increasing FST values, and thus could produce an artefactual correlation between FST and interorgan dN/dS values. However, this artefact can be argued out for gag and pol genes because there is no generalized positive selection. Because dN/dS ratios are lower than one, if present, this artefact would produce a negative correlation between FST and between-organ dN/dS values.
However, alternative nonselective interpretations are also possible in the case of generalized negative selection. First, our observation that compartmentalization is associated to relaxed negative selection can be thought of as the consequence positive selection acting on some codons at some organs. These sites would show a high dN/dS ratio hence elevating the average ratio. Alternatively, compartmentalization could merely be the consequence of restricted viral migration between organs. This would reduce the effectiveness of selection in purging deleterious mutations from subpopulations, and the presence of drift would make the dN/dS ratio tend to one. However, there are two observations that, without discarding the role of restricted gene flow and drift, strongly suggest that selection is necessary to explain the observations. First, under the nonselective hypothesis, compartmentalization should inflate dN/dS values both within and between organs, but the data only provide evidence for an increase in dN/dS between organs. Second, a sitewise analysis proves that codons showing compartmentalization are more likely to be under positive selection than noncompartmentalized codons.
For gag gene, little is known about the putative local environmental variations that could shape population structure, but for pol gene, there exists the possibility that organs and cell types differ in drug penetrability or degradation rates (Taylor et al. 2001; Fellay et al. 2002; Solas et al. 2003). Despite the fact that drug absorption and elimination can substantially vary across individuals (Telenti et al. 2002), we found consistent differences in resistance scores between organs. For one of the datasets (Wong et al. 1997), samples isolated from the nervous system had a higher predicted susceptibility to AZT, compatibly with reduced drug penetration through the hematoencephalic barrier. In a related study (Sheehy et al. 1996), it was found that proviral isolates from blood had predominantly AZT-resistant genotypes, whereas brain samples consisted mainly of sensitive genotypes. Similarly, differences between cerebrospinal fluid and blood (Di Stefano et al. 1995; Venturi et al. 2000) were reported. Recently, Strain et al. (2005) compared levels of drug resistance in plasma and cerebrospinal fluid RNA isolates from 18 individuals and, in 50% of cases, resistance values were significantly lower in the latter. This pattern was associated to higher levels of env C2-V5 region compartmentalization and intraorgan positive selection in some specific residues of this region. Differences in drug penetrability could also be responsible for compartmentalization in organs other than the brain, as suggested by differences in resistance scores between rectal mucosa and plasma (Monno et al. 2003). Interestingly, some authors (Cunningham et al. 2000) have suggested that drug-resistance genotypes could originate in body compartments where drug concentration is relatively low, because this would allow viral replication and hence favor the fixation of genotypes with gradually increasing levels of resistance, which would rapidly spread to organs where drug penetration is efficient.
Whereas the above data indicate that local differences in drug concentration can promote the evolution of genetically distinct pol variants and hence produce compartmentalization, this does not necessarily exclude the possibility that other selective factors might contribute to the evolution of compartmentalization in the pol gene. To address this question, population structure analyses should be conducted in untreated patients. Available data are insufficient to perform such analysis, but some clues were provided in the recent paper by Strain et al. (2005). Consensus pol sequences were used to evaluate the resistance profiles of plasma and cerebrospinal fluid samples and these same samples were tested for organ compartmentalization using C2-V5 env sequences. Population structure was detected in 11/13 treated patients, but only in 2/5 untreated patients.
In all, our results indicate that sequences from pol gene and, to a lesser extent, sequences from gag gene show organ compartmentalization and that, without discarding nonadaptive factors, there is evidence showing that selection contributes to generating this population structure. Differences in antiviral drug concentrations are only one of several possible mechanisms by which selective pressures might vary across organs.
Associate Editor: S. Elena
AVB and FMC contributed equally to this work. AVB was supported by a predoctoral fellowship from the CSAI. FMC was supported by CSIC Bioinformatics I3P program and a Marie Curie postdoctoral fellowship (Transfer of Knowledge EU FP-6, project number 014436). RS was recipient of a CSIC I3P postdoctoral contract and was supported by grant GV06/031 from the Generalitat Valenciana.