Characteristics and growth of the genetic HIV transmission network of Mexico City during 2020

Abstract Introduction Molecular surveillance systems could provide public health benefits to focus strategies to improve the HIV care continuum. Here, we infer the HIV genetic network of Mexico City in 2020, and identify actively growing clusters that could represent relevant targets for intervention. Methods All new diagnoses, referrals from other institutions, as well as persons returning to care, enrolling at the largest HIV clinic in Mexico City were invited to participate in the study. The network was inferred from HIV pol sequences, using pairwise genetic distance methods, with a locally hosted, secure version of the HIV‐TRACE tool: Seguro HIV‐TRACE. Socio‐demographic, clinical and behavioural metadata were overlaid across the network to design focused prevention interventions. Results A total of 3168 HIV sequences from unique individuals were included. One thousand and one‐hundred and fifty (36%) sequences formed 1361 links within 386 transmission clusters in the network. Cluster size varied from 2 to 14 (63% were dyads). After adjustment for covariates, lower age (adjusted odds ratio [aOR]: 0.37, p<0.001; >34 vs. <24 years), being a man who has sex with men (MSM) (aOR: 2.47, p = 0.004; MSM vs. cisgender women), having higher viral load (aOR: 1.28, p<0.001) and higher CD4+ T cell count (aOR: 1.80, p<0.001; ≥500 vs. <200 cells/mm3) remained associated with higher odds of clustering. Compared to MSM, cisgender women and heterosexual men had significantly lower education (none or any elementary: 59.1% and 54.2% vs. 16.6%, p<0.001) and socio‐economic status (low income: 36.4% and 29.0% vs. 18.6%, p = 0.03) than MSM. We identified 10 (2.6%) clusters with constant growth, for prioritized intervention, that included intersecting sexual risk groups, highly connected nodes and bridge nodes between possible sub‐clusters with high growth potential. Conclusions HIV transmission in Mexico City is strongly driven by young MSM with higher education level and recent infection. Nevertheless, leveraging network inference, we identified actively growing clusters that could be prioritized for focused intervention with demographic and risk characteristics that do not necessarily reflect the ones observed in the overall clustering population. Further studies evaluating different models to predict growing clusters are warranted. Focused interventions will have to consider structural and risk disparities between the MSM and the heterosexual populations.


I N T R O D U C T I O N
The fast intra-host evolution of HIV leads to the accumulation of genetic diversity [1][2][3][4]. This high diversity opens the opportunity of inferring local HIV transmission networks. Annotating the observed genetic clusters with demographic, clinical and behavioural metadata could enrich the design of focused prevention interventions, guiding the allocation of scarce resources to produce higher public health impact [5][6][7]. Several approaches for the prioritization of HIV genetic clusters for intervention have been reported, ranging from identification of rapidly growing clusters [8], clusters with incident cases [9,10], individuals with specific transmission risk characteristics, such as high network connectivity, use of injectable drugs, belonging to specific age groups or with specific sexual practices [11][12][13][14][15], to the development of combined transmissibility scores [6] or modelling [14,[16][17][18]. The HIV epidemic in Mexico is concentrated in key populations [19][20][21][22][23][24][25]. Although the role of heterosexual transmission increasingly plays an important role at the country level, the epidemic in the metropolitan zone of Mexico City remains highly concentrated in men who have sex with men (MSM) [22]. The Condesa Clinic, with two operational branches within Mexico City, is one of the largest HIV care facilities in Latin America, with over 18,000 clients. The clinic is also a major centre for HIV testing in central Mexico, diagnosing nearly 3500 new infections in 2020 (92% cisgender men, 6% cisgender women and 2% transgender men), that represent over a fourth of all new HIV diagnoses in the country and 70-80% of diagnoses in the metropolitan area of Mexico City [26]. Since 2016, the Center for Research in Infectious Diseases of the National Institute of Respiratory Diseases (CIENI/INER), a reference laboratory for HIV genotyping, works with the Condesa Clinic to perform HIV drugresistance surveillance [27]. Baseline HIV genotyping is not standard-of-care in Mexico [28] and this collaboration represents an additional effort to improve the HIV care continuum locally.
Here, we infer the HIV genetic network of Mexico City in 2020, and identify and describe actively growing clusters that could represent relevant targets for intervention.

Study population
All persons newly admitted at Condesa Clinic, including new diagnoses, referrals from other institutions and persons returning to care, were invited to participate. These inclusion criteria were defined according to the World Health Organization recommendations for performing pre-treatment drug-resistance surveillance [29]. Participants answered a computer-based, self-administered questionnaire, including socio-demographic, clinical and behavioural metadata. Paperbased questionnaires were available for persons preferring not to use the computer-based option. Participants then donated a blood sample for HIV genotyping. Enrolment took place from July 2019 to December 2020, excluding April to June 2020, when HIV services were temporarily restricted due to the coronavirus disease 19 (COVID-19) epidemic. The study was approved by the INER institutional review board (project code E02-20). Participants provided written informed consent to use sequencing data both for HIV drug resistance and HIV genetic network studies.

HIV sequencing
A fragment, including the complete HIV gag and pol genes (5462 bp; HXB2 positions: 769-6231), was amplified and sequenced using standard next-generation sequencing (NGS) techniques (Illumina, San Diego, CA) (see Supplementary Methods). Reads were filtered and assembled using HyDRA Web (Public Health Agency of Canada) [30,31]. Twenty percent consensus sequences were generated and used in HIV drug resistance and clustering analyses. This threshold has been previously defined to provide an excellent agreement between NGS and standard Sanger sequencing [32].

Clustering analyses
Clusters were defined by a genetic distance matrix method, using Seguro HIV-TRAnsmission Cluster Engine (Seguro HIV-TRACE), a locally adapted and secured version of the HIV-TRACE tool [34]. Seguro HIV-TRACE, like the Secure HIV-TRACE implemented for U.S. public health departments, permits the analysis and storage of HIV transmission clusters accessible only to registered users. The HIV-TRACE display was translated into Spanish and adapted to include the variables of interest and fulfil the local data security requirements (see Supporting Information). Clusters were defined as sequences with pairwise Tamura-Nei 93 genetic distance <0.015. This threshold has been previously demonstrated to be in line with the expected divergence between sequences within an individual [35] and in accordance with the genetic distance between named HIV-risk partners [36,37]. Georeferencing of participants according to municipality and zip code of residence was performed according to the National Institute of Statistics and Geography (INEGI) coding, using the program QGIS v3. 16.
To identify clusters with active growth, we divided the study period into five 3-month stages and looked for new clustering cases during each stage: (1) July-September 2019, (2) October-December 2019, (3) January-March 2020, (4) July-September 2020 and (5) October-December 2020; from April to June 2020, no enrolment took place due to limitations imposed by the COVID-19 sanitary emergency. Clusters with constant growth were defined as those with at least one node added across each stage; new clusters, as those formed during the last two stages; clusters with growth reactivation, as those with at least one node added during the last stage and no growth in previous stages; and clusters with no recent growth as those with no nodes added during the last stage.

Statistical analyses
Data were systematically collected and stored in a local server.  (Table S1). All analyses were performed using STATA v16. to nucleoside reverse transcriptase inhibitors, 3.5% to protease inhibitors and 0.9% to integrase inhibitors ( Figure 1). The number of links per node ranged from 1 to 12 with 242/1361 (18%) of edges belonging to dyads. Considering clusters with three or more individuals, the most common type of genetic link was between two cisgender men (1064/1119, 95.1%) and the least frequent was between two cisgender women (1/1119, 0.1%). Transgender persons were members of 18/1119 (1.6%) links.

Characteristics of individuals within clusters
Comparing clustering versus non-clustering individuals, we observed significant differences in median age ( Table 2). Assessing crude associations with belonging to clusters, cisgender men had higher odds of clustering than cisgender women (odds ratio [OR]: 2.52, p<0.001) and MSM had higher odds of clustering than heterosexual men (OR: 1.90, p = 0.006). Persons with recent infection, higher log viral load, higher CD4+ T cell count, higher education and using venues for sex had higher odds of clustering. Persons spending more time at home (compared to work), and of older age, had lower odds of clustering (Table 2). After adjustment for confounders, lower age (adjusted odds ratio [aOR]: 0.37, p<0.001; >34 vs. <24 years), being an MSM (aOR: 2.47, p = 0.004; vs. cisgender women), being a transgender woman (aOR: 3.81, p = 0.005; vs. cisgender women), having higher viral load (aOR: 1.28, p<0.001) and having higher CD4+ T cell count (aOR: 1.80, p<0.001; ≥500 vs. <200 cells/mm 3 ) remained associated with higher adjusted odds of clustering (Table 3). No associations were found stratifying the model by sexual risk categories. Belonging to larger clusters (>5 nodes) among clustering individuals was associated only with age (aOR: 0.55, 0.33-0.90, p = 0.02; >34 vs. <24)   after adjusting for sexual risk, viral load and CD4+ T cell count.  Figure 3). We observed a decreasing trend in clustering rate towards the end of 2020 after the onset of the COVID-19 pandemic (fifth vs. third period: p<0.0001) (Figure 3). The proportion of participants with at least one link with genetic distance <0.5% did not vary significantly across time stages. We observed a significant mean age increase comparing clustering individuals in the second (29.8 years, standard deviation [SD] 8.9) versus the fifth stage (31.6 years, SD 9.3) (p = 0.002). Considering clusters with three or more nodes, the most frequent links included persons of the third age quartile (28-34 years) (31.4%), followed by the first (<24 years) (30.2%). We observed 201/1361 (14.8%) links between persons with 10 or more years of difference in age. The proportion of links with genetic distance <0.5% varied across age quartiles: 23%, 22%, 17% and 29% in persons <24, 24-27, 28-34 and >34 years, respectively, and was significantly higher in persons >34 years compared to all other age strata (p<0.05). The average number of links per node (grade) showed a decreasing trend by age quartiles for MSM (2.9, 2.3, 2.4 and 1.7), but not for cisgender women (1. The percentages of cisgender women in clusters with no recent growth, recent formation, growth reactivation and active growth were 3.1%, 2.8%, 2.1% and 3.2%, respectively; for transgender women, 1.2%, 3.4%, 0.9% and 3.2%, respectively. The percentages of persons with CD4+ T cell counts under 200 cells/mm 3 were 31.0%, 46.3%, 31.5% and 30.1%; and the median age was 28, 29, 28 and 26, respectively. Among the 10 clusters with constant growth, seven involved only cisgender men, two included cisgender women and men, and one included transgender women, cisgender women and men, which suggests sexual risk intersectionality ( Figure 5). Clusters 16 and 164 included a majority of cisgender men <24 years. Clusters 164, 17 and 1 included high-grade nodes and cluster 164 showed the highest number of links with genetic distance <0.5% (11 links). Cluster 164 included highly connected nodes formed by older MSM with low genetic distance links to much younger (at least 10 years) MSM, while cluster 17 included highly connected nodes formed by MSM <24 years. Individuals forming cluster 219 were concentrated in two municipalities of Mexico City, while individuals forming the rest of actively growing clusters showed higher geographical dispersion, with seven of the clusters including persons residing in Mexico State. Also relevant is the presence of clusters with apparent bridge nodes linking possible sub-clusters (66, 62 and 1) ( Figure 5 and Figure S1).

D I S C U S S I O N
We implemented an HIV molecular surveillance system, Seguro HIV-TRACE, to construct and monitor the HIV genetic network in Mexico City. The population was enrolled at the largest primary HIV care facility in Mexico. Sampling density was high, with more than a third of the participants belonging to clusters. Clustering rate was similar or even higher to that observed in other large metropolitan areas [12,14]. However, sampling density in highly populated, geographically delimited settings such as Mexico City could be improved with network-informed strategies for in-depth sampling of high-risk populations, such as MSM [38]. In our study, enrolment bias could exist, as additional diagnoses, mainly in persons with formal employment residing in municipalities where social security clinics are available, could be missed. Also, not all persons diagnosed at the clinic were enrolled due to logistic complications and staff availability, mainly in the pm shift. However, comparing our study database with the database of the clinic, we estimate a significant improvement of the enrolment percentage, from approximately 40% at the beginning of the study period (late 2019), to nearly 80% in the end (late 2020), which could explain differences in sample size in the same months of different years along the study period (Figure 3), and could affect completeness of the network. In spite of this, the socio-demographic characteristics of the study participants matched expectations, according to previous reports [19,20,22,27]. The overall clustering rate was significantly reduced after the onset of the COVID-19 pandemic. This observation could reflect behavioural change, mostly within MSM, associated with social distancing or, alternatively, a change in diagnosis trends, with a reduced proportion of young MSM attending the clinic for an HIV test, given the discrete but significant increase in average age of clustering individuals observed towards the end of the study period. Nevertheless, HIV detection and clinical services at the Condesa clinics remained open during the COVID-19 sanitary emergency, becoming a huge support for other clinics and hospitals that suspended these services. This could have caused possible subtle changes in the demographics of the serviced population that might explain our observations [26].
After adjusting for confounders, we observed that local HIV transmission and network growth was strongly driven by young MSM with recent infection and higher education, which supports previous observations by our group [27], is consistent with the highly concentrated epidemic in the region, and suggests a clear population for immediate intervention. In agreement with previous reports [22], the socio-demographic characteristics of cisgender women and heterosexual cisgender men in the network contrast with those observed in MSM, being generally older, with lower education and lower socioeconomic status than the latter.
Leveraging network inference to suggest more focused interventions, we identified 10 clusters of immediate interest, given their constant growth across the study period with an average incorporation of two nodes per trimester. Prevention interventions, including pre-exposure prophylaxis, not yet widely available in Mexico, could be most cost-effective when focused on persons associated with specific transmission chains represented by clusters such as the ones selected here. Moreover, some of these actively growing clusters represent opportunities for intervention that would be missed if based only on the majoritarian clustering population characteristics. For example, cluster 66 illustrates a transmission chain combining sexual risk categories, including cisgender men (both MSM and heterosexual men), cisgender women and transgender women. This opens the opportunity of targeting contacts of connecting nodes between sexual risk groups, as well as possible hard-to-reach individuals not present in the cluster, but suggested by less frequently expected links such as between two transgender women. Also, clusters with highgrade nodes with different characteristics such as 164, including highly connected older MSM with low genetic distance  The colour of the nodes corresponds to attributes described in the legends of each panel, including classification by growth, age, enrolment period and CD4+ T cell counts. Links are coloured by genetic distance. The network was inferred using Seguro HIV-TRACE as described in the Methods. Clusters with active growth are defined as those in which at least one node was added during each time stage (trimester); clusters with recent growth include new clusters appearing during the last two enrolment stages, clusters with growth reactivation include clusters in which at least one node was added during the last stage and no growth was observed in previous stages; and clusters without growth as those for which no nodes were added during the last stage. Age is expressed in years and CD4+ T cell counts are expressed in cells/mm 3 . Abbreviation: TN93, Tamura-Nei genetic distance.
links to much younger (at least 10 years) MSM, and cluster 17 with highly connected MSM <24 years could represent opportunities to break important transmission chains with the involvement of nodes of unusual characteristics. Also relevant is the presence of clusters with apparent bridge nodes linking sub-clusters with different assortativity characteristics (66, 62 and 1). This type of cluster topology has been previously identified as possible target for public health intervention [39].
The present study has important limitations. As explained, enrolment bias could exist by including only one clinic and by missed opportunities of enrolment, even if this clinic concentrates the majority of local diagnoses. Although the demographics of the studied population suggests high representativeness with respect to the overall population of persons living with HIV in Mexico City, completeness of the network could be improved by increasing the proportion of newly diagnosed persons participating in the study. Importantly, the completeness of the metadata could be improved, as many vari-ables had a high proportion of missing data. Missing data makes it difficult to further discuss relationships and intersectionality of variables in the context of the network, limiting our conclusions to general overall observations for the clustering population. This is especially true for the sexual risk composite variable, for which nearly 60% of the data were missing, mainly limiting distinction between heterosexual cisgender men and MSM. Additionally, the captured metadata did not allow us to identify persons returning to care, and information on previous exposure to antiretroviral treatment was incomplete and did not allow us to study ART defaulters as a separate group, nor their role within the network. Information bias could also exist, as many participants refused to answer sensitive or private questions given that many had just received their diagnosis. We were able to improve response rates towards the end of the study period by increasing the number of staff overseeing enrolment and by adjusting the moment along the process of linkage to care  in which potential participants are invited to participate in the study.

C O N C L U S I O N S
The present study suggests that HIV transmission in the metropolitan zone of Mexico City is strongly driven by links between young MSM with higher education level, higher viral load and higher CD4+ T cell counts, suggesting an immediate target group for intervention. Nevertheless, leveraging network inference, we identified actively growing clusters which could be prioritized for focused intervention, that included intersecting sexual risk groups, highly connected nodes with demographic and risk characteristics that do not necessarily reflect the ones observed in the overall clustering population and bridge nodes between possible subclusters with high growth potential. Further studies testing different models to predict growing clusters and evaluating the effectiveness of focused intervention based on prediction of priority clusters versus interventions based on overall characteristics of the clustering population are warranted. Furthermore, interventions will have to consider structural and risk disparities between the MSM and heterosexual populations.

C O M P E T I N G I N T E R E S T S
The authors declare no competing interests.

A U T H O R S ' C O N T R I B U T I O N S
SAR, GRT and AGR conceived, planned and designed the study. CGM, VDC and ELO analysed the data. MMF performed HIV sequencing. HEPJ implemented the electronic questionnaire for data collection. CGM, DTT, DMLS and PIH coordinated sample collection and processing. VDC, AB, MS, MBR, PGE and IMG coordinated participant enrolment and data collection. SW and JOW designed and implemented Seguro HIV-TRACE for clustering analyses. SAR wrote the manuscript. All authors have read and approved the final manuscript.

A C K N O W L E D G E M E N T S
We thank Edna H. Rodríguez, Ramón Hernández-Juan and the CIENI/INER Virology Diagnostic Laboratory staff for performing HIV viral load and CD4+ T cell count tests. We thank AIDS Healthcare Foundation (AHF) Mexico for their feedback and support.

S U P P O R T I N G I N F O R M AT I O N
Additional information may be found under the Supporting Information tab for this article: Table S1. Comparison and selection of multivariable models. Figure S1. Summary of characteristics of clusters with active growth.