Characterization of human respiratory syncytial virus (RSV) isolated from HIV‐exposed‐uninfected and HIV‐unexposed infants in South Africa during 2015‐2017

Abstract Background RSV is a leading cause of lower respiratory tract infection in infants. Monitoring RSV glycoprotein sequences is critical for understanding RSV epidemiology and viral antigenicity in the effort to develop anti‐RSV prophylactics and therapeutics. Objectives The objective is to characterize the circulating RSV strains collected from infants in South Africa during 2015‐2017. Methods A subset of 150 RSV‐positive samples obtained in South Africa from HIV‐unexposed and HIV‐exposed‐uninfected infants from 2015 to 2017, were selected for high‐throughput next‐generation sequencing of the RSV F and G glycoprotein genes. The RSV G and F sequences were analyzed by a bioinformatic pipeline and compared to the USA samples from the same three‐year period. Results Both RSV A and RSV B co‐circulated in South Africa during 2015‐2017, with a shift from RSV A (58%‐61% in 2015‐2016) to RSV B (69%) in 2017. RSV A ON1 and RSV B BA9 genotypes emerged as the most prevalent genotypes in 2017. Variations at the F protein antigenic sites were observed for both RSV A and B strains, with dominant changes (L172Q/S173L) at antigenic site V observed in RSV B strains. RSV A and B F protein sequences from South Africa were very similar to the USA isolates except for a higher rate of RSV A NA1 and RSV B BA10 genotypes in South Africa. Conclusion RSV G and F genes continue to evolve and exhibit both local and global circulation patterns in South Africa, supporting the need for continued national surveillance.


| INTRODUC TI ON
Respiratory syncytial virus (RSV) is the most common cause of acute lower respiratory tract infection (LRTI) in children globally. It was estimated that in 2015, 33.1 million episodes of LRTI, 3.2 million hospitalizations, and as many as 118,200 deaths were attributable to RSV in children <5 years of age worldwide, with the greatest burden of RSV-associated hospitalization and death occurring in infants younger than 6 months of age. 1 Moreover, developing countries have a much higher incidence of severe RSV LRTI compared to developed countries, with approximately 91% of all RSV-associated hospitalizations and 99% of deaths occurring in developing countries. 2 The RSV genome encodes 11 proteins. The two surface glycoproteins, the fusion (F) and the attachment (G) protein, are crucial for virus infectivity and pathogenesis, and are the major antigens to stimulate the production of neutralizing antibodies. [3][4][5] While the G protein is responsible for the attachment of the virus to the host epithelial cells, the F protein mediates viral entry by fusing viral and cellular membranes, leading to the subsequent release of viral RNA into the host cell cytoplasm. 6 RSV has two subtypes, A and B, which are further characterized into different genotypes, based on antigenic and genetic variability of the second hypervariable region (HVR2) of the G protein. 7 In contrast to the G protein, the F protein is well-conserved between the two RSV subtypes and among different genotypes. Six consensus antigenic sites have been identified in the F protein, either in its pre-fusion and/or post-fusion conformation. [8][9][10] Some of these sites are potential targets for prophylactic antibodies, such as site II, the target of the only approved anti-RSV immunoprophylaxis for high-risk infants (palivizumab); site Ø, the target of nirsevimab (MEDI8897) 11 ; site IV, the target of MK-1654 12 ; and site V, the target of suptavumab. 13,14 Immune pressure, coupled with the RSV error-prone RNA polymerase, leads to viral evolution and drift that should be carefully monitored.
Hospital surveillance studies from South Africa demonstrated that children born to HIV-infected mothers, but not HIV-infected themselves, have 1.4-to 2.1-fold greater risk of RSV-associated hospitalization and death, respectively, compared to HIV-unexposed children. 15,16 The increased burden of RSV disease observed in HIVexposed infants is likely due to reduced transplacental transfer of RSV antibodies in women living with HIV. 17 We investigated the molecular epidemiology and genetic variability of RSV isolates collected during 2015 to 2017 from HIV-exposed-uninfected and HIV-unexposed hospitalized infants <12 months old in South Africa. The RSV sequence data obtained from South Africa were further compared to RSV sequences obtained from the USA during the same period. 18

| Study population
Active surveillance was conducted at Chris Hani Baragwanath Academic Hospital (CHBAH) in Soweto, South Africa. Admissions to all pediatric medical wards, including a short stay ward, were screened for enrollment into the study from January 2015 to December 2017. The surveillance case definition included infants <3 months of age who were diagnosed with suspected sepsis or physician-diagnosed LRTI, and infants 3-12 months of age who had a physician-diagnosed LRTI.

| Ethics statement
The RSV surveillance study was approved by the Human Research Ethics Committee of the University of the Witwatersrand (131109) and conducted in accordance with the Good Clinical Practice guidelines. Caregivers were informed on the nature, purpose, and process of the study, and were provided written informed consent for the use of their infants' samples in future studies on infectious diseases.
Ethical approval for this specific analysis was also obtained from the same Ethics Committee (M170965).

| Sample collection and testing
Nasopharyngeal swabs were collected using a commercially available nylon flocked tipped swab (FLOQS, Copan Flock Technologies) as previously described 19  Sequence data from 147 samples were obtained and included in the analysis. A total of three samples were excluded from data analysis due to "quality/quantity not sufficient" (QNS). The sample consort diagram is shown in Figure 1A. the cDNAs containing the G and F genes were sequenced on the Illumina MiSeq instrument as described previously. 18

| Next-generation sequence assembly
Assembly of the sequencing reads into target amplicon sequences (contigs) was performed with the Next-Generation Sequencing Microbial Surveillance Toolbox (NGS-MSTB) 20 (manuscript is under review), a fully automated distributed pipeline that was implemented at AstraZeneca with a Common Workflow Language (CWL), and with a user interface based on the Galaxy bioinformatics workbench. 21

| Amino acid sequence analysis of the RSV F proteins
The F gene sequences in FASTA format were translated to amino acid sequences and aligned against the reference sequences derived from Netherlands RSVA/13-005275 (GenBank accession no. KX858757) and Netherlands RSVB/13-001273 (GenBank accession no. KX858756). Amino acid variation was determined and reported from pairwise alignments of sample sequences to the reference. The mapping of antigenic site changes onto the prefusion and post-fusion forms of the F protein structure has been described previously. 18

| Subtyping and genotyping analysis based on the RSV G gene
The assignment of RSV genotypes was performed with a combination of a nearest neighbor classifier and phylogenetic clustering, using a reference database of the previously described genotypes. 22 Phylogenetic analyses were conducted in mega7. 23 RSV G gene sequences were translated into protein sequences and aligned using Muscle 24 along with the reference sequences (GenBank accession no. KX858754 and KX858755) for RSV AG and BG, respectively.
Phylogenetic trees were generated using the maximum likelihood method based on the JTT matrix-based model and were visualized and annotated using ITOL v3. 25

| Distribution of RSV subtypes and genotypes in South Africa
A total of 742 RSV-positive nasopharyngeal samples were collected from HIV-unexposed (HU, n = 544, 73.3%) and HIV-  However, due to the small sample size, there was no statistically significant difference in the RSV A and B subtype distribution between HU and HEU infants in the selected samples.
The distribution of the RSV A and B genotypes based on the G protein sequence analysis, and by month of year when identified, is shown in Figure 1B  containing F15L/A103V/L172Q/S173L substitutions compared to the reference strain. 18 The frequency and individual polymorphisms in the RSV A and B F protein sequences are presented in Figure 3A.

| Polymorphisms in the RSV F protein
Overall, the RSV A F protein had fewer amino acid sequence changes at a frequency >10% compared to RSV B. There were only 2 amino acid changes in the RSV A F protein detected in more than 10% of the isolates, the A23T in the signal peptide, and an A518T near the transmembrane domain. RSV B F protein displayed several amino acid differences at a frequency >10% compared to the reference strains, and additional changes occurred over the 3-year period.
The polymorphic amino acid changes in the six major antigenic sites (Ø, I-V) 9,10,26 of the F protein were further examined, and amino acid variations at these sites with frequency >1% in their respective seasons were labeled on the pre-fusion and post-fusion forms of the protein structure ( Figure 3B). There were a total of 7 amino acid differences in the F protein antigenic sites of RSV A with frequencies ranging from 3.3% to 6.7% and 11 amino acid differences in the F protein of RSV B with frequencies ranging from 3.0% to 97.0%. Some differences were only observed in a specific season and were not detected in the subsequent season, such as RSV A G71D at site Ø, Y33H and N380H in site I, N165K in site V, and RSV B D73E and Q209K at site Ø. However, L172Q and S173L at the antigenic site V of RSV B increased from 14.3% and 9.5% in 2015 to 97.0% by 2017.

| Sequence comparison of RSV F proteins from South Africa and the USA
RSV F protein sequences from South Africa were compared to those collected in the USA OUTSMART-RSV study during the same period despite their age difference between these two populations. 18

| D ISCUSS I ON
The South Africa RSV cohort samples studied here represent local RSV epidemiology in the 2015-2017 period. We showed that the HIV exposure status of these infants did not affect the distribution of RSV subtype or genotype. We also noted a greater proportion of RSV-positive samples in male infants, compared to those of females, which is consistent with the findings from the USA surveillance study 18 and a recently reported study in Spain. 28 The RSV A ON1 genotype characterized by a 72 nucleotide (23 amino acid) duplication in the G-HVR2 has replaced NA1 and other genotypes as the most prevalent strain in the world over the past 10 years. 18,29 The RSV B BA9 genotype, which replaced BA10 over the three study years in South Africa (14.3%, 63.2%, and 97.0%, respectively), has become the most prevalent RSV B genotype worldwide. 30 Both BA9 and BA10 genotypes are characterized by a 60 nucleotide (20 amino acid) duplication in the G-HVR2 region. 31 The duplication sequence has been hypothesized to increase viral fitness. 32 In addition, we also observed a seven amino acid extension of RSV B G protein in 17 (23%) isolates. Interestingly, this same seven amino acid extension was also found in RSV B isolates from the USA surveillance study conducted during the same period, and at a similar frequency of 22%. 18 The biological significance of the seven amino acid extension observed for RSV B strains remains to be de- RSV epidemiology studies in South Africa prior to 2012 have been reported, 33,34 showing that positive selection drove RSV A and B evolution and replacement of genotypes. However, these studies were limited to the G protein gene sequencing only. Since F protein is a major target for anti-RSV monoclonal antibodies, it is critical to understand F protein sequence evolution and antigenicity changes in contemporary circulating strains. Based on analysis of the antigenic sites in the RSV F protein, RSV B had more variability than RSV A (Figures 3 and 4). The RSV A F protein had 7 changes at 5 antigenic sites at a frequency <10% compared with the reference strain. The RSV B F protein had a total of 11 changes distributed over 3 antigenic sites, with 2 changes in antigenic site I and 2 changes in antigenic site V at a frequency >10%. Thus, although the F protein structure is generally conserved, changes at various antigenic sites can occur.
The sequence containing L172Q/S173L changes in the antigenic site V (target of suptavumab 13,14 ) increased from <15% in 2015 to 97.0% by 2017, which was also observed in the USA and other countries over the same period. 18,35 These changes led to the resistance of RSV B to suptavumab. 36 The I206M/Q209R changes at antigenic site Ø (target of nirsevimab) were detected at a low frequency of 3.0% in 2017, but were present in the USA RSV isolates at a frequency of approximately 19% in the 2016/17 RSV season. 18 Importantly, these two amino acid changes do not affect the suscep-