Novel Druggable Hot Spots in Avian Influenza Neuraminidase H5N1 Revealed by Computational Solvent Mapping of a Reduced and Representative Receptor Ensemble

The influenza virus subtype H5N1 has raised concerns of a possible human pandemic threat because of its high virulence and mutation rate. Although several approved anti-influenza drugs effectively target the neuraminidase, some strains have already acquired resistance to the currently available anti-influenza drugs. In this study, we present the synergistic application of extended explicit solvent molecular dynamics (MD) and computational solvent mapping (CS-Map) to identify putative ‘hot spots’ within flexible binding regions of N1 neuraminidase. Using representative conformations of the N1 binding region extracted from a clustering analysis of four concatenated 40-ns MD simulations, CS-Map was utilized to assess the ability of small, solvent-sized molecules to bind within close proximity to the sialic acid binding region. Mapping analyses of the dominant MD conformations reveal the presence of additional hot spot regions in the 150- and 430-loop regions. Our hot spot analysis provides further support for the feasibility of developing high-affinity inhibitors capable of binding these regions, which appear to be unique to the N1 strain.

The avian influenza virus type A, especially subtype H5N1, is becoming the world's largest pandemic threat because of its high virulence and lethality in birds, quickly expanding host reservoir, and high rate of mutation (1). The virus functions in combination with two major membrane glycoproteins: hemagglutinin (HA) and neuraminidase (NA), which together play important roles in the interactions with host cell surface receptors. While HA mediates virion (i.e. viral particle) entry into the cell, NA facilitates viral shedding by cleaving the sialic acid linkage formed between the HA and sialic receptors on the surface of the host cell (2). Several approved anti-influenza drugs effectively target NA, which cleaves these terminal sialic acid residues and facilitates the release of viral progeny from infected cells (3). However, antigenic drift has given rise to new strains that are resistant to existing NA inhibitors, and antigenic shift is resulting in new virulent subtypes of the flu virus, underscoring the need to design novel NA and HA inhibitors that can be combined for optimal prophylaxis and treatment.
Neuraminidase enzymes are phylogenetically categorized into two groups: group-1 includes N1, N4, N5, and N8 and group-2 includes N2, N3, N6, N7, and N9 (4). Although active site residues are largely conserved across both the groups, different NA subtypes exhibit varied drug susceptibility (5) and resistance profiles (6,7). Currently available NA inhibitors, including oseltamivir (marketed as Tamiflu, Roche Pharmaceuticals, Basel, Switzerland) and zanamivir (marketed as Relenza, GlaxoSmithKline, Brentford, London, UK), have been designed against crystal structures of group-2 enzymes [Ref. (3) and references therein]. Tamiflu, which has been stockpiled by many nations in efforts to avert a possible pandemic, is the only orally available drug that is effective against H5N1; therefore, additional drug discovery efforts against this enzyme are exceptionally well-motivated.
The first crystal structures of a group-1 NA in apo form and in complex with currently available drugs (8) revealed that although the binding pose of oseltamivir (Tamiflu) was similar to that reported in previous crystallographic complexes (9,10), the 150-loop adopted a distinct conformation, opening a new cavity adjacent to the active site. Under certain crystallization conditions, however, the 150-loop adopted the same closed conformation as previously seen in group-2 NA structures, suggesting that a slow conformational change may occur upon inhibitor binding (8). Importantly, previously published group-2 NA structures have only reported the 150-loop in the closed conformation (9,10). Therefore, the new structural observation of the 150-loop in the open conformation by Russell et al. (8) was postulated to be able to be exploited in the development of more effective drugs against N1. However, standard X-ray crystallography experiments only provide a single snapshot of the structure, making the interpretation of dynamical properties a difficult task and naturally motivating further studies that are able to investigate and characterize dynamics.
Molecular dynamics (MD) simulation is a useful theoretical tool to study the properties of biomolecular systems with atomic resolution based on classical mechanics (11)(12)(13). They complement experimental results by providing distributions and time series of any physically definable observable, within force field accuracy and finite sampling limitations. Extensive all-atom explicit solvent MD simulations of the apo and oseltamivir-bound (i.e. holo) systems revealed that the 150loop and adjacent binding-site loops may be even more flexible than observed in the crystal structures (14). In the apo simulations, the 150-loop was observed to open to a greater extent than in the X-ray structures, and its motion was often coupled to an outward movement of the adjacent 430-loop. These coupled motions significantly expanded the active site cavity, increasing its solvent-accessible surface area as compared with both open and closed crystal structures. It was postulated that topological changes and additional expansion of the N1 inhibitor-binding pocket revealed by the MD simulations could potentially play an important role in the rational design of inhibitors for N1 (14).
In this work, we present new druggable hot spots, i.e. binding subregions that account principally for the ligand-binding energy, revealed in the N1 MD simulations by computational solvent mapping (CS-Map) (15,16). Originally developed for binding-site identification, the CS-Map algorithm replicates the experimental Multiple Solvent Crystal Structures (MSCS) method developed by Mattos et al. (17,18), where it was shown that consensus binding locations of co-crystallized solvent molecules are highly predictive of binding regions. Using only X-ray structures as input, numerous studies utilizing CS-Map have shown that the method accurately reproduces the results of MSCS experiments in addition to identifying important binding features for other well-studied proteins. The CS-MAP algorithm differs from similar methods such as GRID (19) and MCSS (20) in three major respects: (i) the initial rigid body search provides better sampling of protein surface regions with favorable electrostatics and desolvation than grid search or local minimization; (ii) the scoring potential accounts for desolvation and ligand flexibility; and (iii) the docked ligand positions are clustered and the clusters are ranked on the basis of their average binding affinity. Importantly, this allows us to capture consensus sites rather than the binding sites of individual probes. The combination of these elements yields an algorithm that shows remarkable robustness against variations in the protein structure and changes in energy parameters (16).
In a recent study performed by Landon et al. (21), CS-Map was applied to identify hot spots within druggable protein binding regions and showed excellent agreement with biophysical experiments, where the authors were able to successfully use CS-Map to identify hot spots for a range of important pharmaceutical targets. The study highlighted the accurate identification of a novel hot spot within the peptide-binding region of renin that was recently discovered at Novartis (Novartis, Basel, Switzerland) (22). After more than 20 years of research efforts, the discovery of this hot spot led directly to the development of the first orally bioavailable FDA-approved renin inhibitor. The same CS-Map-based hot spot study reported excellent correspondence between the hot spots predicted for the FK-506 binding protein and ketopantoate reductase to those originally identified by NMR (23) and isothermal titration calorimetry experiments (24), respectively. The authors of this study concluded that CS-Map is a valuable complement for biophysical experiments, in which the data resulting from mapping studies can be used to design and conveniently select drug discovery experiments.
Here, we apply CS-Map to identify additional hot spots within the sialic acid binding region of N1 using MD-generated ensembles as input. The application of CS-Map in conjunction with MD simulations is conceptually similar to the so-called 'dynamic pharmacophore' technique originally developed by Carlson et al. (25), which was one of the first experimentally verified methods to account for receptor flexibility in structure-based drug design through the use of MD simulations. This technique has been extended by the Carlson group to include the use of structural ensembles derived via NMR and X-ray crystallography, termed the 'multiple protein structure models' approach (26). Although both the dynamic pharmacophore method and the one described herein utilize small molecule probes to investigate the effects of structural flexibility on ligand binding, the CS-Map technique has the advantage of being directly comparable with MSCS experiments. As emphasized by Mattos and Ringe (15), the major problem with approaches exemplified by GRID (16) and MCSS (17), is that they result in too many energy minima on the surface of the protein, rendering it difficult to determine which of the minima are actually relevant. A careful study comparing the results of mapping calculations with X-ray structures of thermolysin determined in four different organic solvents (27) shows that both GRID and MCSS find minima close to the experimentally observed positions, but the closest minima are generally not among those with the lowest free energies, resulting in false positives (i.e. conformations with favorable energy which are not located near any experimentally observed binding site). However, it has been recently reported that re-ranking MCSS results with a potential that accounts for solvation improves agreement with experimental data (28).

Hot Spots in Avian Influenza Neuraminidase H5N1
The work we present here also combines a root-mean-square difference-(RMSD-) clustering algorithm to reduce the structural redundancy in the MD ensemble. Previous studies tackled this same computational issue using different algorithms (14,29,30). However, we note that the present study is based on considerably longer MD trajectories -equivalent of 160 ns -therefore providing an extensive sampling of the receptor configurational space. The combination of techniques in this work presents an efficient method to map target hot spots based on a reduced and representative ensemble of receptor structures, and it represents a novel application of existing computational methods for the investigation of protein-ligand interactions in the presence of significant conformational flexibility within a ligand-binding site.
The RMSD-clustering analysis of the explicitly solvated 40-ns tetramer MD simulations yielded a set of 10 structures representing the apo ensemble and 5 structures representing the holo ensemble. A series of mapping analyses were subsequently performed on the representative snapshots. The results suggest that low-energy hot spots within the 150-and 430-loop regions, determined by consensus binding positions of fragment-sized molecules, may provide additional opportunities for drug discovery that could potentially mitigate the effects of drug resistance. Furthermore, this study tackles two major methodological issues. First, it demonstrates the benefit role of employing MD simulations to capture receptor flexibility. Second, it provides a framework to apply CS-Map in conjunction with reduced and representative MD receptor ensembles to predict the binding of molecules within flexible regions with increased accuracy and efficiency.

Apo system setup
The crystal structures used in this study were 2HU0 for the oseltamivir-bound system and 2HTY for the apo system (8). Protonation states for histidine residues were defined at an apparent pH 6.5 using the PDB2PQR web server (31). All crystallographically resolved water oxygen positions were retained in the apo system, as well as Ca 2+ ions, which are required for optimal NA function (32). To mimic future experimental inhibition assay conditions, a 20-mM NaCl salt bath was introduced. The simulated system contained 112 311 atoms. Additional details on the simulation setup can be found elsewhere (14).
Holo (i.e. oseltamivir-bound) system setup The 2HU0 structure had a single oseltamivir molecule bound in the active site of chain B (8). To introduce the oseltamivir within each of the other chains, chain B was aligned to chain A, C, and D by superimposition of the backbone C a -atoms and the resulting transformation matrix was also applied to the oseltamivir molecule. As no water molecules or ions were reported in the 2HU0 structure, we structurally aligned the 2HTY and 2HU0 systems and kept all crystallographic water molecules that did not clash with oseltamivir in the binding pocket. The Ca 2+ ions were also retained as they are required for proper function. Amber9 was used to set up the oseltamivir-bound system using an identical protocol to the apo system (see earlier), with the exception of the additional oseltamivir parameters. Oseltamivir was parameterized as described in Ref. (14). The composite tetrameric oseltamivir-bound system, comprising one oseltamivir molecule in each of the four active sites and calcium ions in 20-mM NaCl salt bath, contained 112 457 atoms.

Molecular dynamics simulations
The energy of apo and holo systems were minimized for 5 · 10 4 steps using NAMD version 2.6b1 (http://www.ks.uiuc.edu/Research/ namd/) (33). During the first phase of the minimization, only hydrogen atoms were relaxed for 5 · 10 3 steps, holding all other atoms fixed. Hydrogen atoms, water molecules, and ions were relaxed during additional 5 · 10 3 steps. In the last cycle, the protein backbone was fixed, minimizing all other atoms for additional 5 · 10 3 steps. No constraints were applied to the final 2.5 · 10 3 steps. The system was then equilibrated at 310 K in the (N,p,T) ensemble, using rectangular periodic boundary conditions and the particle mesh Ewald approach to evaluate electrostatics (34). All hydrogen bond lengths were constrained using the RATTLE algorithm (35), thus allowing a 2-fs integration timestep. A multiple time-stepping algorithm was utilized, where bonded interactions were evaluated at every timestep, short-range non-bonded interactions were evaluated at every two timesteps, and long-range electrostatic interactions were evaluated at every two timesteps. A cutoff of 14 was enforced for the non-bonded calculations, including a switching function at 12 . The first 1-ns period was divided into a series of four 2.5 · 10 5 step runs, where harmonic constraints were employed with 1-fs timesteps to gradually relax the system before the free sampling phase. A harmonic constraint force constant of 4.0 kcal ⁄ mol ⁄ 2 was applied to protein backbone atoms with scaling factors of 1.0, 0.75, 0.50, and 0.25 for the sequential segments, respectively. Free dynamics were subsequently performed using a 2-fs timestep, for a total of 40 ns for each system. Trajectories were generated on the DataStar machine (San Diego Supercomputing Center, University of California, San Diego, La Jolla, CA, USA) (benchmark time of 0.3 days ⁄ ns using 112 parallel processors).

RMSD-clustering to extract representative MD structures
To generate a reduced, representative set of N1 structures for the CS-Map calculations, RMSD conformational clustering was performed based on a previously reported clustering algorithm (36) as implemented in the rmsdmat2 and cluster2 programs of the GROMOS++ analysis software (37), part of the GROMOS05 software for biomolecular simulation (http://www.igc.ethz.ch/GROMOS). The same procedure has been recently applied to the context of protein surface loops (38) and of an improved relaxed complex method for drug design (39).
For the apo and oseltamivir-bound simulations, each chain of the tetramer was extracted at 10-ps intervals over the 40-ns simulation. A total of 1.6 · 10 4 trajectory structures for each simulation were superimposed using all C a -atoms to remove overall rotation and translation. The RMSD-clustering was performed on the subset of 62 residues that line the entire binding-site area, which we define here as the binding-site residues: 117-119, 133-138, 146-152, 156,  Figure 1). These residues were clustered into batches of similar configurations using the backbone atom-positional RMSD of all atoms (including side chains and hydrogens) as the similarity criterion. A cutoff of 1.3 was chosen after evaluation of the dependence of cluster populations against the total number of clusters found for each simulation using a cutoff in the range 1.0-1.5 . Next, for both the apo and holo simulations, hydrogen bonding was monitored alternatively using structures from the X-ray crystallographic models, the entire MD ensemble, or the structures belonging to each of the three most populated clusters.

Computational solvent mapping
The CS-Map algorithm is described in the sequence below. Required input is a protein structure, where prior to initiation, any bound ligands and ⁄ or water molecules are removed. The 14 organic probes utilized in this study correspond to those employed in the previous hot spot analysis (21) (Figure 2). Mapping simulations were performed on the IBM Blue Gene supercomputer at Boston University, Boston, MA, USA (http://www.ibm.com), where the average CPU time for the first two steps averages 15 min using 220 processors.

Rigid body search
For each probe, we generate 222 initial positions on the protein surface along vectors originating from the center of the protein at every 18°. From each of these 222 points, we launch 30 simplex (40) runs with initial simplexes randomly selected around each point, resulting in 6660 docked conformations. Simplex moves are evaluated according to the energy equation: The terms in the equation denote the Coulombic contribution of the electrostatic energy, the desolvation energy, and the excluded volume penalty, respectively. The electrostatic energy (DE elec ) is the summation over all probes atoms of the product of the electrostatic field of the solvated protein, denoted as F, at a particular probe position, determined by a finite-difference Poisson-Boltzmann method (41,42), and the charge q of the probe atom. Mathematically, The desolvation term is calculated in accordance with the atomic contact potential (ACP) model (43), where partial charges are calculated using Quanta (http://www.accelyrs.com). The excluded volume term (V exc ) is set to zero if the probe does not overlap the protein.

Refinement of docked conformations
Resulting from the rigid body search are 6660 protein-probe complexes that undergo further minimization. At this stage, a van der Waals energy is included in the free energy calculation DG = DE elec + DE vdw + DG des *. The new desolvation term includes the solute-solvent van der Waals interaction energy. The electrostatic and desolvation terms are calculated using the analytic continuum electrostatic (ACE) model (44); each protein-probe minimization is performed using the Newton-Rhapson method implemented in CHARMM (45). While the protein is held rigid during the minimization, the probe molecules are allowed to be flexible. A maximum of 1000 minimization steps are needed; often fewer steps are required for convergence of DG.
Probe clustering, scoring, and ranking of conformations Once the probes are minimized in step two, they are clustered based on their residue interactions. Clusters are initiated by seeding with an unclustered probe with the lowest free energy. Next, the  Hot Spots in Avian Influenza Neuraminidase H5N1 remaining unclustered probes are searched to find the probe with the lowest distance score D to the seed probe. The score D is determined by the equation where u and v are binary strings representing the interactions for each ligand. If this score is below 0.35, then the probe is added to the cluster. The threshold score of 0.35 was chosen such that unclustered probes could be assigned uniquely to one cluster. Once more than two probes are part of the cluster, additional probes are added to the cluster by checking that the average distance score between the new probe and all existing members is below 0.35. If the new probe has an average score above 0.35, then the probe is rejected and a new search begins. Once all probes are checked for admittance to a cluster and no additional probes can be added, a new cluster begins by repeating the process. After initial creation of clusters, probes are re-clustered such that if the average overlap score D of two clusters can be improved by moving a probe from one cluster to the other, then the probe is moved. Subsequent to reclustering, clusters consisting of less than 20 probes are removed. For each remaining cluster i, the probability p i is calculated based on a partition function Q, which is the sum of Boltzmann factors over all conformations j : and Q i is obtained by summing the Boltzman factors over conformations in the i th cluster only. The mean values DG of free energy of cluster is calculated by the formulation

Determination of consensus sites
From each of the five lowest free energy clusters for each probe, the lowest free energy conformation is selected and superimposed with all chosen clusters. The locations on the protein where clusters of different probe types aggregate are termed 'consensus' sites, and these positions on the protein are considered to be putative binding pockets subject to further evaluation.

Results and Discussion
Clustering of MD snapshots to generate a reduced and representative receptor ensemble To distill the most dominant configurations of N1 from the MD simulations, RMSD-based clustering was performed on snapshots which were extracted from the trajectories for every 10 ps. Although the tetramer N1 was used in the simulations, the clustering analyses were carried out on individual monomer protein chains. Therefore, each 40-ns tetramer simulation yielded the equivalent of four times the monomer sampling (160 ns), and 1.6 · 10 4 structures for each system were employed in the analyses.
For the apo simulation, the 1.3-cutoff resulted in a total of 51 clusters, with 90% of the ensemble represented by 10 clusters (Supplementary Material (SM) Figure 1A). In comparison, the holo simulations revealed an overall less flexible system, as the same cutoff yielded a total of 27 clusters, with 90% of the ensemble being represented by five clusters ((SM) Figure 1B). For both the apo and holo simulations, although the entire binding-site region was used in the clustering, the greatest structural diversity is found in the 150-and 430-loop areas (Figure 3), indicating that these areas are particularly flexible.
To add physical and structural insight to the resulting clusters, a hydrogen-bonding analysis was performed on the 150-loop region (comprised of residues 140-152) for the X-ray crystal structures, over both the entire concatenated trajectory of the four monomers, and the structures within each of the most dominant clusters ( Table 1). The crystal structures have five fully formed hydrogen bonds between residues in the 150-loop region: K143-D142, H144-D142, S145-D142, N146-K143, and T147-N146. Hydrogen bond occurrences differ between the apo and holo simulations. Additional hydrogen bonds involving H144-D142 and S145-D142, which are not present in the crystal structures, form transiently throughout the 160 ns of MD.

Landon et al.
In the apo simulations, the most highly populated cluster (representing approximately 20% of the total apo ensemble, SM Figure 1A) is characterized by seven hydrogen bonds in the 150-loop area.
Although the hydrogen bond between T147 and N145 is lost, high occupancies of hydrogen bonds in the D142-K143-H144 region contribute to the overall stability of this structure. The 150-loop in the first cluster for the apo simulations is significantly more open than in the 'open-loop' crystal structure published in Ref. (8). It is interesting to note that this so-called 'wide-open' structure (14) seems to be energetically and structurally stabilized by locally-formed hydrogen bonds. The second most highly populated cluster also accounts for nearly 20% of the ensemble (SM Figure 1A) and the additional stability of the T147-N146 hydrogen bond. Notably, although several hydrogen bonds are less often formed on average, as compared to the first cluster the S145-D142 hydrogen bond is present over the entire cluster set. From a structural standpoint, the second cluster represents the open-loop crystal structure (PDB 2HTY). The third most highly populated cluster is, on an average, lacking the N146-K143 hydrogen bond and exhibits overall a reduced occupancy of hydrogen bonds. This cluster represents nearly 20% of the total apo ensemble, and its central member structure is similar to the closedloop crystal structure ( Figure 3).
In the holo simulations, the most highly populated cluster represents nearly 40% of the overall ensemble and has more hydrogen bonds than the second or third cluster (totally seven hydrogen bonds). These interactions persist throughout the simulations, as indicated by the high percentages for each hydrogen bond (Table 1). Structurally, the most dominant cluster in the holo simulations is similar to the open-loop crystal structure (8). The second cluster, which is structurally similar to the fully-open MD structure, is defined by a loss of the T147-N146 hydrogen bond and a marked reduction in the persistence of the hydrogen bonds between N146-K143 and S145-D142. The second cluster represents slightly more than 20% of the overall holo ensemble. By comparison with the apo simulation, the clustering results suggest that the presence of oseltamivir in the active site tends to decrease the amount of time the 150-loop samples wide open-loop configuration. The third clus-ter for the holo simulation is structurally similar to the closed-loop conformation and represents approximately 15% of the overall holo ensemble. This cluster of structures is defined by a loss of the S145-D142 hydrogen bond and markedly decreased values for the K143-D142 and H144-D142 hydrogen bonds relative to both the other clusters.
Mapping analysis of the apo and holo N1 X-ray structures Using the chemically diverse probe set defined in Figure 2, mapping analyses were performed on the N1 apo and holo structures available from the PDB (2HTY, 2HU0, and 2HU4). It is important to note that although mapping was performed using the entire structures as input, our analyses here are restricted to those consensus sites found within close proximity to the sialic acid binding region.
The following columns show occurrences, calculated from structures in only the first, second, and third most-populated conformational clusters. Hydrogen bonds are defined to have a maximum hydrogen-acceptor distance of 3.5 and a minimum donor-hydrogen-acceptor angle of 120°. Only hydrogen bonds occurring for at least 10% of the simulation are shown.
Hot Spots in Avian Influenza Neuraminidase H5N1 may be more robust to changes in conformation of the 150-loop position than molecules designed to bind within the 150-loop region; however, molecules that stabilize the open confirmation 150-loop may also be a viable approach to lead design.

Mapping analysis of reduced and representative apo and holo MD ensembles
Mapping of the 10 apo MD and 5 holo MD representative conformations was performed to explore the effects of receptor flexibility on the formation of hot spots within the 150-and 430-loop binding regions. For each mapping analysis, the entire surface of the protein was utilized by CS-Map. On an average, seven densely-populated (£10 probe clusters) consensus sites were found for each apo conformation and six were uncovered for each holo conformation. Figure 5 illustrates consensus sites formed within the binding regions of the apo and holo MD ensembles, where in Figure 5A, mapping results for the 10 apo conformations are superimposed onto PDB 2HTY, and in Figure 5B, mapping data for the five holo conformations are superimposed onto PDBs 2HU0 and 2HU4. Similar to mapping results for the X-ray structures, hot spots emerge within the sialic acid, 150-and 430-loop regions of both the apo and holo A B Landon et al.
MD ensembles. For both the apo and holo ensembles, the average rank of consensus sites, in terms of the number of clusters present, located within the sialic acid, 150-and 430-loop regions were fourth, second, and third, respectively. These rankings suggest that the 150-and 430-loop regions represent druggable sites; the relationship between druggability and consensus site ranking is supported by a previous study (21).
Overall, comparing the mapping results shown in Figure 4, a significant expansion in the volume occupied by probe clusters within the 430-and 150-loop regions is observed for both MD ensembles, suggesting that the loop can be stabilized in multiple conformations that would be amendable to ligand design. In Figure 6,  Table 2 provides statistics on the average sizes, in terms of cluster population, and cluster energy rankings of consensus sites found within each of the three binding regions for the representative MD ensemble structures. Consensus sites were formed in the 150-loop region for 9 of the 10 representative apo conformations; the only apo conformation that did not contain a consensus site in the 150loop region was the closed conformation of the 150-loop. Similarly, four of the five representative holo conformations contained consensus sites within the 150-loop region with the exception of the closed conformation. Within the 430-loop region, consensus sites were not observed in three of the representative apo conformations and one of the representative holo conformations, which is attributed to structural effects of the wide-open and closed conformations. Based on this data, we hypothesize that hot spots formed within the 150-and 430-loop binding regions are roughly comparable in terms of their ability to bind a variety of probe types with good affinity; however, given the effects of loop dynamics on hot  (5) 11 (3) The average number of fragment clusters populating each binding region is reported along with the average energy ranking of each fragment cluster in parentheses.  Hot Spots in Avian Influenza Neuraminidase H5N1 spot formation, it may be more feasible to develop a lead that stably binds the 150-loop region when compared with the 430-loop region.
Although the average affinity of probes for the sialic-acid binding region is also very high (Table 2), the diversity of probes binding in this region is significantly decreased when compared with those binding to both of the loop regions. In particular, only the most polar functional groups in the probe set -namely t-butanol, isopropanol, and phenol -bind to the sialic acid binding region, while nearly all probe types, both polar and non-polar, bind with high affinity to the loop regions. This finding suggests that the effects of charge on the bioavailability of N1 inhibitors may be alleviated by the addition of high-affinity, non-polar functional groups that can bind one of the loop regions. We note that such insights would not emerge from the application of CS-Map based on single, static crystal structures only.
As the last part of this analysis, in Table 3, we provide a list of residues predicted to significantly mediate either non-bonded and ⁄ or hydrogen-bonded ligand contacts within the sialic acid, 430loop, and 150-loop binding regions. Using the 10 lowest Boltzmann average free energy probe clusters determined for each holo MD representative, all non-bonded and hydrogen-bonded atomic-level contacts existing between residues of the binding region and the mapped probes were calculated using the program HBPLUS (http:// www.biochem.ucl.ac.uk/bsm/hbplus) (46). Atomic interactions were summed up for each residue and then normalized by the total number of interactions occurring within the binding region. In Table 3, we report only those residues that account for at least 2% of the total atomic-level interactions occurring within the binding region.
Residues marked with an asterisk indicate those accounting for greater than 5% of all atomic-level probe-residue interactions.
Previous studies have established that protein-probe interactions correlate strongly to known protein-ligand interactions (16,21). In this study, the residues predicted to strongly mediate ligand contacts within the sialic acid binding region -E227, E277, R371, and Y406 -are well-known to be important in the binding of Tamiflu and Relenza, suggesting that those predicted to mediate contacts within the 130-and 430-loop regions -in particular, Q136, D151, I437, K432, and T439 -are likely of importance in the development of a novel class of high-affinity N1 inhibitors.

Conclusions
The influenza virus subtype H5N1 is of great concern because of its high virulence and mutation rate, and some strains have already acquired resistance to the currently available anti-influenza drugs such as Tamiflu and Relenza. Two recent studies, one experimental (8) and the other computational (14), uncovered flexibility within the 150-and 430-loop binding regions of N1, and these findings were conjectured to provide new opportunities for drug design. Here, we examined the effects of this newly discovered receptor flexibility on the emergence of hot spots within these regions through the integration of extensive MD simulations, conformational clustering, and CS-Map. We have identified novel hot spots within flexible binding regions of the N1 neuraminidase and shown that such a clustering procedure significantly improves the efficiency of the CS-Map application.
Few computational methods can account for the effects of receptor flexibility, although its importance is well-established (47)(48)(49). Recent examples include the development of inhibitors for MAP kinase p38 (50) and PTP1B (51) that have entered clinical trials and the recently approved HIV integrase inhibitor, raltegravir (marketed as Isentress, Merck, Whitehouse Station, NJ, USA) (52,53). Despite these difficulties, by using a multi-faceted approach, we were able to predict the emergence of stable hot spots within the 150-and 430-loop regions of N1. These results provide further support for the feasibility of developing high-affinity inhibitors capable of binding these newly proposed areas. We hypothesize that a putative novel class of inhibitors, which are able to exploit these new hot spots, may potentially exhibit increased oral bioavailability and be less susceptible to structural mutations in N1 and ⁄ or evade existing mutations raised in response to currently available drugs.