Extent of polymorphism and selection pressure on the Trypanosoma cruzi vaccine candidate antigen Tc24

Abstract Introduction Chagas disease, caused by the protozoan parasite Trypanosoma cruzi, is a major public health problem in the Americas, and existing drugs have severe limitations. In this context, a vaccine would be an attractive alternative for disease control. One of the difficulties in developing an effective vaccine lies in the high genetic diversity of T. cruzi. In this study, we evaluated the level of sequence diversity of the leading vaccine candidate Tc24 in multiple parasite strains. Methods and Results We quantified its level of polymorphism within and between T. cruzi discrete typing units (DTUs) and how this potential polymorphism is structured by different selective pressures. We observed a low level of polymorphism of Tc24 protein, weakly associated with parasite DTUs, but not with the geographic origin of the strains. In particular, Tc24 was under strong purifying selection pressure and predicted CD8+ T‐cell epitopes were mostly conserved. Tc24 strong conservation may be associated with structural/functional constrains to preserve EF hand domains and their calcium‐binding loops, and Tc24 is likely important for the parasite fitness. Discussion Together, these results show that a vaccine based on Tc24 is likely to be effective against a wide diversity of parasite strains across the American continent, and further development of this vaccine candidate should be a high priority.

Vaccination has the advantages of relying on short administration regimens, and the induction of multiple effector mechanisms against the pathogen may have high efficacy to control the infection and lower the possibilities of resistance (Bahloul et al., 2003;Boyer et al., 1997;Lai, Pakes, Ren, Lu, & Bennett, 1997;Lodmell & Ewalt, 2001;Lowrie et al., 1999). During the last decade, several vaccine types have been found immunogenic and protective in mouse models, providing proof-of-concept data on the feasibility of a preventive or therapeutic vaccine to control a T. cruzi infection (see for review Quijano-Hernandez & Dumonteil, 2011). However, one of the difficulties in developing an effective vaccine lies in the high levels of genetic variability of T. cruzi, which may lead to antigenic variability and immune evasion of some parasite strains (Haolla et al., 2009). Indeed, T. cruzi has been divided into seven discrete typing units (DTUs, TcI-VI) based on molecular markers (Telleria & Tibayrenc, 2017;Zingales et al., 2012), including two hybrid lineages (TcV and TcVI), and one found mostly in bats (TcBat) (Marcili et al., 2009;Ramírez et al., 2014). Therefore, it remains essential to identify how this genetic diversity is distributed in the endemic regions and to consider its impact on antigenic diversity for vaccine and diagnostic development.
Accordingly, the aim of this study was to evaluate in detail the extent of Tc24 diversity in multiple T. cruzi parasite strains and DTUs.
To do so, we quantified its level of polymorphism within and among T.
cruzi DTUs from multiple countries, and evaluated how its polymorphism may be structured by selective evolutionary pressures. Such analyses have been found important to assess forces driving protein evolution (Bitencourt Chaves et al.., 2017;Kumar et al., 2018).

| Tc24 sequences
Raw sequence reads from whole genome sequencing projects from 32 T. cruzi strains were obtained from the NCBI Sequence Read Archive database for analysis, as well as five annotated genome sequences obtained from the TriTryp database (Table 1). These strains covered TcI to TcVI DTUs, although TcI was over-represented, and originated from multiple countries across the Americas.
Tc24 nucleotide sequences were extracted from the reads of T. cruzi strain genomes by rapid sequence mapping at a medium-low sensitivity using the software Geneious 9.1. The aligned Tc24 sequence reads were annotated for variants using the SNP/Free Bayes function of Geneious (Garrison & Marth, 2012). Every significant change in the sequences was recorded to generate lists of Tc24 nucleotide sequence variants for each T. cruzi strain. For analysis of copy number, we used the annotated genomes of Dm28c (TcI) and TCC strains (TcVI), which have been obtained by long-read sequencing on a PacBio single-molecular real-time platform, and represent some of the most complete genome assemblies currently available for T. cruzi (Berná et al., 2018).
These genomes were searched for Tc24 sequence using BLAST, and only matches including the full-length coding sequence of Tc24 were considered. Similar BLAST searches of other assembled T. cruzi genomes were also performed. We calculated nucleotide diversity (π) and haplotype diversity (Hd) for Tc24 sequences. All the Tc24 nucleotide sequences were translated to the corresponding protein sequences using the software Geneious 9.1. Amino acid sequences were aligned using MUSCLE (Edgar, 2004a(Edgar, , 2004b, and phylogenetic trees were created using the Maximum-likelihood as implemented in PhyML. To determine whether there is a DTU or country effect in structuring Tc24 protein diversity, we compared phylogenetic distances (pairwise genetic distances) among nodes within and between different groups through a nonparametric Wilcoxon test using R software 3.6.1. We further tested for a spatial structure by evaluating isolation by distance through a Mantel test with 10,000 permutations.

| Analysis of selection pressures
Analysis of selection pressures on Tc24 nucleotide sequences was performed in MEGA software (10.0.4 version). We performed a single-likelihood ancestor counting (SLAC) analysis, which uses a combination of maximum-likelihood (ML) and counting approaches to infer nonsynonymous (dN) and synonymous (dS) substitution rates on a per-site basis for a given coding alignment and corresponding phylogeny. This method assumes that the selection pressure for each site is constant along the entire phylogeny (Kosakovsky Pond & Frost, 2005), and statistical significance is ascertained at each site using an extended binomial distribution (Kosakovsky Pond & Frost, 2005). We also performed a McDonald-Kreitman (MK) test to assess selection among T. cruzi Tc24 genes (Egea, Casillas, & Barbadilla, 2008). For estimates of divergence, we used a closely related T. rangeli Tc24 sequence (accession #KC544829).

| Epitope identification
We identified the Tc24 protein epitopes able to bind to the HLA-I alleles reported as more frequent in the Mexican mestizo population Finally, the number of peptides that can be recognized by each of the alleles evaluated and their location in the amino acid sequence of the corresponding protein was determined (Doytchinova, Guan, & Flower, 2006).

| Mapping of amino acid variants on 3D protein structure
We used the previously determined 3D structure of T. cruzi Tc24 protein (Wingard et al., 2008) to map the position of sites under significant selection pressure, and assess potential structural and functional constrains on the protein. Molecular graphics and visualization of residues under selection pressure were performed with UCSF Chimera (Pettersen et al., 2004).

| RE SULTS
We analyzed the full genome sequences currently available from 37 T. cruzi strains, to identify a total of 367 Tc24 nucleotide sequences, corresponding to 96 unique Tc24 protein sequences (211 amino acids) with 1 to 7 variant protein sequences per strain/genome.
Most strains (28/37) had two sequence variants, two had a unique Tc24 protein sequence, and some (8/37) presented 3-7 sequence variants. Phylogenetic analysis of this intra-strain sequence diversity showed two clear clusters of sequences, with a similar level of sequence diversity irrespective of the DTU of the strains (Figure 1ad sequence diversity appeared to be focused on a limited number of sites within the protein, which is otherwise highly conserved among T. cruzi strains. The four cysteine residues that were mutated to serine in our vaccine antigen to facilitate its large-scale production process (C4, C66, C74, and C124) corresponded to highly conserved residues.
We next analyzed potential selective pressures on Tc24   To further understand the selection pressure on Tc24, we assessed whether the codons under diversifying selection were located within potential epitopes with a high probability of HLA binding. We predicted 25 Tc24 protein epitopes with a high probability of binding to class I HLA alleles, with six binding to HLA-A*2, six to HLA-B*24, five to HLA-B*35, and eight to HLA-B*39 (Figure 3b).
In addition, some epitopes were predicted to have a high probability of binding to more than one HLA allele, such as peptide RLDEFTSGV that can bind to alleles A*02 and B*39 and peptide EFLEFRLML that can bind to alleles A*02 and A*24. Furthermore, the protein sequence comprised between amino acids 109 and 136 included multiple overlapping predicted epitopes for several HLA alleles, which corresponds to a conserved region of the protein. Detailed analysis of 17 nonredundant predicted epitopes indicated that only four (23%) had amino acids subject to significant diversifying selection ( Figure S1), while seven (41%) had amino acids subject to significant purifying selection, and an additional six (35%) were conserved but without significant selection. Thus, selection pressure for immune evasion could explain part of the diversifying selection detected on some of Tc24 residues. In addition, four predicted epitopes included a cysteine residue that was mutated to serine in Tc24-C4 antigen, corresponding to C66 in one predicted epitope and C124 in three overlapping epitopes.
Finally, we assessed Tc24 structural/functional constrains that may contribute to the selection pressure detected on the protein by mapping the amino acids under purifying and diversifying selection onto the 3D structure of Tc24 (Wingard et al., 2008).
Importantly, 17/35 sites under purifying selection (49%) were distributed within the four EF hand domains of the proteins, with four of these sites located within the Ca 2+ -binding loops (Figure 4), suggesting some important constrains to conserve these functional domains. On the other hand, only one of the seven sites under diversifying selection (14%) was located in one of the EF hand domains (EF3), and the remaining six sites were spread within some of the α-helices of Tc24, but not within these critical domains F I G U R E 3 Selective pressure and CD8 + T-cell epitopes in Tc24 antigen. (a) Selective pressures on Tc24 protein, expressed as dN-dS ratio, were determined by SLAC analysis. Statistically significant selection pressure is highlighted in green (purifying selection) and red (diversifying selection), respectively. (b) Localization of the Tc24 protein epitopes with a high probability of binding to HLA-I alleles.
Horizontal lines correspond to epitopes for the indicated HLA alleles ( Figure 4). Thus, functional/structural constrains on Tc24 protein appear to contribute at least in part to the overall strong purifying selection acting of the protein.

| D ISCUSS I ON
The development of an effective vaccine against T. cruzi needs to take into account the high levels of genetic variability of this parasite, as antigenic variability and immune evasion of some parasite strains may restrict the protective efficacy of a vaccine (Haolla et al., 2009).
Nonetheless, limited studies have investigated the antigenic diversity of T. cruzi vaccine antigens (Knight, Zingales, Bottazzi, Hotez, & Zhan, 2014). Members of the trans-sialidase family, the largest family of surface proteins of the parasite, were found to be under strong evolutionary pressure, likely from the immune system, for the selection of variants leading to immune evasion, and frequent recombination was identified as a contributing mechanism (Weatherly, Peng, & Tarleton, 2016). Diversifying selection and variant motifs within trypomastigote small surface antigen have also been identified, which has led to DTU-specific serological diagnostic of the infection (Bhattacharyya et al., 2010). In this study, we evaluated the extent of polymorphism of one of the leading vaccine antigen, the flagellar-associated calcium-binding protein Tc24, among multiple T.
cruzi strains from most of the American continent.
Our identification of Tc24 genes in the genomes of multiple strains of T. cruzi is in agreement with initial observations indicating that it is a multicopy gene located in tandem arrays (Porcel et al., 1996). We further identified 120 and 43 full-length copies in and it has been incorporated as part of recombinant antigen mixtures in some commercial tests.
Implementing a vaccine targeting this protein may impact the evolution of the parasite in the field and induce a vaccine escape phenomenon, with could produce potentially detrimental outcomes such as an increase in parasite virulence. Such a response has been already identified for various diseases (Kennedy & Read, 2017); it is therefore important to consider. One possibility could be to develop a "cocktail" vaccine targeting multiple proteins in order to distribute these selective pressures over multiple parasite antigens and therefore further reduce the opportunities for the parasite to escape vaccine-induced immunity. In that respect, we have proposed TSA-1 antigen as an additional component of our vaccine (de la Cruz et al., 2019;Dumonteil et al., 2004;Quijano-Hernández et al., 2013;Villanueva-Lizama et al., 2018), which was also found to be highly conserved among T. cruzi DTUs (Knight et al., 2014).
Nevertheless, this study has some limitations, the main one being that strain diversity may be further expanded as mentioned above, particularly for non-TcI parasite strains, as additional sequence variants may be present in these DTUs as well as from some of the less represented countries from our study. Further genotyping of Tc24 antigens from strains currently circulating in Chagasic patients across the Americas should help expand our study.
In conclusion, we have demonstrated that Tc24 antigen is highly conserved in parasite strains originating from a wide geographic range in the Americas and covering DTUs TcI to TcVI. In addition, diversifying selection pressure was restricted to a few residues, which would limit immune evasion, and most of the protein was under strong purifying selection. This was likely associated, at least in part, with functional/structural constrains on the protein. These results indicate that Tc24 is an excellent vaccine candidate, which would be effective against a wide diversity of T. cruzi parasite strains across the continent. Further development of this vaccine candidate should represent a scientific and public health priority.

ACK N OWLED G EM ENTS
This work was partially funded by grant #632083 from Tulane University School of Public Health and Tropical Medicine and grant

#187714 from the Carlos Slim Foundation via Baylor College of
Medicine.

DATA AVA I L A B I L I T Y S TAT E M E N T
Sequence data for this study are available at the TriTryp (https:// tritr ypdb.org/tritr ypdb/) and SRA (https://www.ncbi.nlm.nih.gov/ sra) databases.