Regional patterns of genetic diversity in swine influenza A viruses in the United States from 2010 to 2016

Background Regular spatial and temporal analyses of the genetic diversity and evolutionary patterns of influenza A virus (IAV) in swine inform control efforts and improve animal health. Initiated in 2009, the USDA passively surveils IAV in U.S. swine, with a focus on subtyping clinical respiratory submissions, sequencing the hemagglutinin (HA) and neuraminidase (NA) genes at a minimum, and sharing these data publicly. Objectives In this study, our goal was to quantify and describe regional and national patterns in the genetic diversity and evolution of IAV in U.S. swine from 2010 to 2016. Methods A comprehensive phylogenetic and epidemiological analysis of publicly available HA and NA genes generated by the USDA surveillance system collected from January 2010 to December 2016 was conducted. Results The dominant subtypes and genetic clades detected during the study period were H1N1 (H1‐γ/1A.3.3.3, N1‐classical, 29%), H1N2 (H1‐δ1/1B.2.2, N2‐2002, 27%), and H3N2 (H3‐IV‐A, N2‐2002, 15%), but many other minor clades were also maintained. Year‐round circulation was observed, with a primary epidemic peak in October‐November and a secondary epidemic peak in March‐April. Partitioning these data into 5 spatial zones revealed that genetic diversity varied regionally and was not correlated with aggregated national patterns of HA/NA diversity. Conclusions These data suggest that vaccine composition and control efforts should consider IAV diversity within swine production regions in addition to aggregated national patterns.

vaccine strategies are impaired when the strains in the vaccine do not reflect current diversity because of evolutionary processes. 5,6 Although all eight genes of the virus continuously evolve, evolution of the HA envelope glycoprotein has great potential for biologic consequence through antigenic shift (via reassortment of different gene segments), and antigenic drift (via gradual accumulation of point mutations), resulting in the cocirculation of multiple genetically diverse lineages and genetic clades. 7,8 In swine in the United States, five major H1 and H3 genetic lineages have been described. The first lineage of viruses circulating in U.S. swine emerged coincident with the 1918 human pandemic and is referred to as classical H1N1 IAV. This lineage was predominant and relatively conserved until the late 1990s when a second lineage was detected in swine. The second lineage was a novel triple-reassortant H3N2 virus, and in addition to being maintained in U.S. swine itself, it also reassorted with endemic classical H1N1 viruses, resulting in new genetic clades of H1N1 and H1N2 viruses. 9,10 Although reassortment of HA and/or the NA segments was commonly detected, the internal gene pattern was maintained, referred to as the triple reassortant internal gene (TRIG) constellation. 11,12 The third lineage of viruses that established in U.S. swine was the result of two spillovers of human seasonal H1 viruses; these are referred to as delta-lineage viruses. 9,10,13 The fourth is the H1N1pdm09 lineage of viruses, which has been repeatedly introduced into swine herds throughout the world and has undergone reassortment with other swine IAV. [14][15][16][17] An additional H3 spillover from humans to swine in the early 2010s resulted in the most recent establishment of a new lineage. 18 Given substantial expansion of genetic diversity within each of these lineages, it has become necessary to further divide the HA genes into  19 Identifying patterns of genetic diversity and how they change over space and time is critical for appropriate intervention efforts.
Quantifying IAV diversity in U.S. swine is challenged by the common practice of transporting swine across regions for production efficiencies. It is estimated that approximately one million pigs 20 are moved within the United States every day and large numbers of swine also enter the United States from Canada. This movement of pigs has previously been implicated in the dissemination of the δlineage from the Southeastern and South-central United States into the Midwest. 21 Despite this dynamic, current assessment of diversity has previously relied on national analyses of H1 and H3 viruses without separation into geographic regions; these data revealed yearly cocirculation of H1N1, H1N2, and H3N2, with H1-δ1 (1B.2.2), 3), and H3-Cluster IV-A representing the majority of HA sequences. 22,23 However, it is quite likely, given the rapid and extensive movement of swine, that the genetic diversity in one region of the United States may depend on movement patterns and that the national diversity patterns may not be evenly distributed across all states or regions. Better understanding the role of pig movement and the implications to IAV spread could facilitate surveillance efforts and provide objective criteria to help select appropriate vaccine components for improved regional control.
In 2009, following the emergence and spread of the 2009 H1N1 pandemic virus, the United States Department of Agriculture (USDA) initiated a national surveillance system to redress concerns over the quality and quantity of virologic swine IAV data. 22,23 The objectives of the surveillance are to monitor genetic evolution of IAV in swine, make isolates available for research, diagnostic reagents, and vaccine development through an IAV isolate repository, and provide publicly available sequence data for animal and human health purposes. Against this background, this manuscript describes the current genetic spatial and temporal diversity of swine IAV in the United States. We analyzed data collected through the USDA surveillance system from January 2010 to December 2016, quantified genetic diversity within and between the five USDA IAV surveillance reporting regions, determined whether aggregated national metrics of diversity are relevant at regional spatial scales, and developed four biologically informed spatial zones that may more accurately delineate U.S. IAV genetic diversity. (n = 2828) using default settings in MAFFT v7.222 25,26 with subsequent manual correction in MEGA7. 27 Maximum likelihood trees for each gene alignment were inferred using IQ-TREE v1.3.14 28 implementing the TVM + I evolutionary substitution model that was identified via the automatic model selection function. Branch support was assessed using the ultrafast bootstrap approximation 29 with 1000 replicates. The inferred phylogenetic trees were used to classify H1N1 and H1N2 sequences into previously defined genetic clades: H1α, H1β, H1γ, H1-γ2, H1pdm09, H1-δ1, and H1-δ2. 11,13,22 These clades correspond to a recently released a global nomenclature for H1 HA genes from swine IAV. 19 (H1-δ2). H3N2 sequences were assigned to Cluster IV, IV-A, IV-B, IV-C, IV-D, IV-E, IV-F, and human-like H3. 18,23,31 NA N1 isolates were assigned to either the classical or pandemic genetic clades, 23 while NA N2 isolates were assigned to the 1998-lineage or 2002-lineage. 32 The M gene was classified as either TRIG or pandemic. 23,33 These analyses used the resources of the USDA-ARS computational cluster Ceres on ARS SCINet.

| Regional analysis
The United States is divided into five different regions for IAV surveillance reporting purposes based on USDA-APHIS veterinary services districts, with district 1 and 2 combined into one was not included because of data limitations, n = 14 from 2010 to 2016) and nationally, we conducted hierarchical clustering on the Kendall rank correlations of distances between indices calculated separately for each year. Following these analyses using bureaucratic USDA reporting regions, we conducted a similar analysis to determine if U.S. states could be grouped into zones with more similar HA/NA pairing. First, we clustered the observed data from 2010 to 2016 using data from those states that comprised ≥1% of the total data. Second, we calculated Shannon's diversity indices from the HA/NA counts for those states, and then used distances between the diversity indices to perform hierarchical clustering using ward's method for linkage.

| Time series analysis
To study seasonal patterns of IAV in swine in the United States, we conducted time series analysis using the number of influenza isolates aggregated by month from January 2010 to December 2016.
The time series was decomposed using the ts and decompose functions in the forecast package 36,37 in R v3.1.2. 34 The additive time series decomposition was used because seasonal variation for the data was constant over time.

| HA, NA, and M evolutionary trends in swine IAV
During the study period, a total of 4458 isolates were analyzed, out of which 35% were H1N1 viruses, 36% were H1N2 viruses, and 26% were H3N2 viruses. A very small percentage of virus isolates were H3N1 (0.4%) or mixed subtype (3.3%), and one HA-H1 virus did not have an NA sequence to subtype ( Figure S1). We excluded mixed subtype viruses and the single H1 virus that did not receive NA-typing. Ten virus isolates had no state information available and were also excluded from analyses. Following removal, our detailed spatial and temporal analyses considered 4298 virus isolates. The To understand seasonal patterns in swine IAV, we aggregated the sequenced subtypes by month ( Figure S3) across the 7 years of our study (2010)(2011)(2012)(2013)(2014)(2015)(2016). These data revealed year-round detection of swine IAV in clinical respiratory submissions, with a primary epidemic peak in October-November of each year and a secondary epidemic peak in March-April of each year ( Figure S3C). Of 307 H1-δ2 (1B.2.1) viruses overall, 63% (n = 194) were from Region 1 (Figure 1). There was a steady increase in H1-δ2 viruses  Each region had an additional 10-17 unique HA/NA pairings that F I G U R E 2 Temporal and regional patterns in H3 swine influenza A in the United States. Swine H3 isolates collected from 2010 to 2016 within the USDA Influenza A virus Swine Surveillance System were classified to H3 phylogenetic clade (Cluster IV-A through F and humanlike H3) and are presented by year and USDA-APHIS veterinary service reporting districts (Region 5 was omitted due to insufficient data) ranged in detection frequency from 0.4% to 8.8% (Figure 3) that were not replaced by viruses with the dominant HA/NA clades.

| National and regional comparisons
Because of the difference in relative proportions observed between regions (Figure 3), we next compared national and regional diversity. We calculated Shannon's diversity indices by year for the entire United States and by region (Table 1) Figure 6E). All zones maintained minor HA/NA pairings whose F I G U R E 5 Correlation of genetic diversity between national and USDA-APHIS Veterinary Services reporting districts. A, Hierarchical clustering dendrogram of distances between diversity indices calculated for the whole of United States and the different regions. A complete agglomeration method was implemented. B, Kendall rank correlation plot demonstrating the association between diversity aggregated to a national level (USA) vs diversity delineated to Regions 1 to 4 of USDA-APHIS veterinary service IAV-S reporting districts (Region 5 was omitted due to insufficient data). Correlations with P-values >.01 are not considered significant and are depicted as blank cells in the correlation plot, as seen with Region 1 frequencies varied from year to year, but the relative proportions by year were zone-dependent.

| D ISCUSS I ON
The purpose of this study was to characterize the genetic diversity of the long-distance dispersal of swine also disseminates novel IAV clades. 21,45 However, our data suggest that the US swine herd is not endemic with a homogenous population of IAV, instead our data suggest that most IAV transmission and diversity is regional with some mixing due to pig movement, consistent with Kyriakis et al. 46 This is supported by the consistent detection of the major HA/NA clades in all regions, along with detection of minor genetic clades and subtypes within specific regions. This represents a distinct strength of the long-term USDA passive surveillance system data in that these data are now able to capture regional circulation and diversity, along with providing a baseline to identify the emergence of novel genetic clades. 18,45 Nationally However, the regional divisions currently implemented for surveillance reporting and described above are unlikely to reflect the realities of swine production systems. Consequently, we used the HA/NA data itself to generate spatial divisions or "zones" (

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.