The structural basis for high‐affinity uptake of lignin‐derived aromatic compounds by proteobacterial TRAP transporters

The organic polymer lignin is a component of plant cell walls, which like (hemi)‐cellulose is highly abundant in nature and relatively resistant to degradation. However, extracellular enzymes released by natural microbial consortia can cleave the β‐aryl ether linkages in lignin, releasing monoaromatic phenylpropanoids that can be further catabolised by diverse species of bacteria. Biodegradation of lignin is therefore important in global carbon cycling, and its natural abundance also makes it an attractive biotechnological feedstock for the industrial production of commodity chemicals. Whilst the pathways for degradation of lignin‐derived aromatics have been extensively characterised, much less is understood about how they are recognised and taken up from the environment. The purple phototrophic bacterium Rhodopseudomonas palustris can grow on a range of phenylpropanoid monomers and is a model organism for studying their uptake and breakdown. R. palustris encodes a tripartite ATP‐independent periplasmic (TRAP) transporter (TarPQM) linked to genes encoding phenylpropanoid‐degrading enzymes. The periplasmic solute‐binding protein component of this transporter, TarP, has previously been shown to bind aromatic substrates. Here, we determine the high‐resolution crystal structure of TarP from R. palustris as well as the structures of homologous proteins from the salt marsh bacterium Sagittula stellata and the halophile Chromohalobacter salexigens, which also grow on lignin‐derived aromatics. In combination with tryptophan fluorescence ligand‐binding assays, our ligand‐bound co‐crystal structures reveal the molecular basis for high‐affinity recognition of phenylpropanoids by these TRAP transporters, which have potential for improving uptake of these compounds for biotechnological transformations of lignin.


Introduction
High-affinity solute uptake is an essential requirement for the survival of microorganisms in low nutrient environments. There are three major families of highaffinity transport systems that rely on an extra-cytoplasmic solute-binding protein (SBP) to recognise and bind specific cargo and traffic it to hydrophobic subunits embedded in the cytoplasmic membrane [1]. The ATP-binding cassette (ABC) transporters discovered almost 50 years ago are by far the best characterised of these systems, and as their name suggests drive transport using ATP hydrolysis [2]. Conversely, the tripartite tricarboxylate transporters and the tripartite ATP-independent periplasmic (TRAP) transporters are SBP-dependent secondary transporters that use either the proton motive force or sodium (Na + ) ions for solute uptake [3]. Since their discovery by Forward et al. in 1997 [4], the advent of mass genome sequencing has revealed that TRAP transporters are widely encoded in bacteria and archaea [5]. They are key to the survival of human pathogens such as Vibrio cholerae [6] and Haemophillus influenzae [7] as well as being involved in plant virulence [8] and uptake of carbon sources in many environmental bacteria [9][10][11].
Tripartite ATP-independent periplasmic transporters are typically comprised of three subunits, a periplasmic SBP (the P subunit) and two membrane protein components (the Q and M subunits) [4,12,13]. The larger M subunit is composed of 12 transmembrane helices (TMHs) and is thought to form the translocation pore, whilst the function of the smaller four TMH Q subunit, which has both the N and C termini positioned in the cytoplasm, has not yet been definitively established [5,14]. Although the structure of the transmembrane pore-region and the mechanism of solute transport have yet to be determined for TRAP transporters, the structure and ligand specificity of several SBPs have been determined, revealing an expanding list of biologically important solutes [3,15,16].
Lignin has a structural role in maintaining cell wall stability in plants, and after (hemi-) cellulose is the most abundant organic polymer on the planet [17,18]. Because of its environmental abundance, lignin is also of great biotechnological interest as a potential starting material for the derivation of commercially valuable products [19]. However, like (hemi)-cellulose, lignin is generally resistant to degradation [20]. In the environment, lignin polymers are degraded upon breakage of b-aryl ether linkages by extracellular laccases and peroxidases, which are released by a consortium of aerobic bacteria and fungi [21][22][23]. This in turn leads to the release of a range of aromatic compounds, mainly phenylpropanoids, such as the hydroxycinnamates (HCMs) coumarate and ferulate [24], which are more readily available to bacteria as a source of carbon. Many organisms from various ecological niches can further degrade these lignin-derived aromatic compounds; of these, aerobic microorganisms such as members of the Roseobacter clade use the wellcharacterised b-ketoadipate pathway to convert protocatechuate derived from phenolic compounds such as lignin monomers to b-ketoadipate and subsequently to the tricarboxylic acid cycle intermediates succinyl-CoA and acetyl-CoA [25].
The metabolically versatile photosynthetic purple bacterium Rhodopseudomonas palustris anaerobically degrades lignin-derived phenylpropanoid compounds via a 'non-b-oxidation' pathway, which has been described in detail elsewhere [26][27][28]. The gene cluster that encodes these enzymes also encodes a TRAP transporter (TarPQM) and an ABC transporter (CouPSTU) (Fig. 1A), the expression of which are under the control of the coumarate responsive transcriptional regulator CouR [28][29][30]. The SBP from the CouPSTU ABC transporter (CouP_Rhp) has been shown to bind a subset of phenylpropanoids, with the structure of CouP in complex with ferulate revealing the basis of molecular specificity in this family of transporters [29,31]. The SBP of TarPQM (TarP_Rhp) also binds several aromatic ligands, including coumarate, with sub-micromolar affinity [29,32]. However, neither the structure of TarP_Rhp nor that of any of its close homologues from other bacteria have been determined; thus, the molecular basis of aromatic ligand specificity in TRAP transporters remains unknown.
In this study, we use bioinformatics and phylogenetic analysis to identify putative lignin-derived phenylpropanoid TRAP transporters from a range of environmental bacteria known to be enriched for the use of TRAP transporters in their biology [33]. In addition to TarP_Rhp, two other TarP homologues were selected for detailed analysis, the proteins from the halophilic c-proteobacterium Chromohalobacter salexigens (Csal_0280; TarP_Csal) and the a-proteobacterial Roseobacterium Sagittula stellata E-37 (SSE37_24379; TarP_Sse). Both of these aerobic marine-dwelling organisms are known to grow on a range of lignin-derived aromatic compounds [34][35][36][37]. Of these, TarP_Sse has been shown to be required for growth of S. stellata E-37 on ferulate and coumarate, confirming its involvement with phenylpropanoid metabolism [37]. In this study, tryptophan fluorescence ligand-binding assays show that TarP_Csal and TarP_Sse bind to a range of lignin-derived aromatic compounds with subtly different specificities to each other and to that of TarP_Rhp. High-resolution crystal structures of the three different TarP proteins in complex with a range of aromatic compounds reveal a highly similar mode of binding involving recognition of the carboxyl group of the ligand via an invariant arginine residue, which is conserved in all known TRAP transporter SBPs. Comparison of the three structures to each other and with other TRAP SBPs reveals that  The genetic arrangement of the tarPQM genes in the genomes of (i) Rhodopseudomonas palustris CGA009 (Rhp), (ii) Chromohalobacter salexigens DSM 3043 (Csal) and (iii) Sagittula stellata E-37 (Sse). Locus numbers and gene names are labelled below/above the corresponding gene arrows. The tarPQM genes are shown in mauve with the periplasmic binding protein subunit encoding tarP gene outlined in black. In panel (i), the couPSTU ABC transporter and couR transcriptional regulator genes are shown in green and yellow, respectively. In panel (iii), box genes encoding enzymes for benzoate degradation are shown in red. In all panels, genes encoding proteins that have known/hypothesised roles in HCM degradation are brown and genes encoding proteins with unknown roles in HCM degradation, genes not predicted to be involved in HCM degradation or hypothetical proteins of unknown function are grey. (B) (1) The general chemical structure of HCM compounds. R 1 and R 2 substituent group positions are at C4 and C3 in the phenyl ring, respectively. (2) The chemical structure of 4HPA. Images generated with Marvin JS (ChemAxon, Budapest, Hungary). (C-H) Overviews of the structures of TarP_Csal (green, C-E), TarP_Sse (blue, F, G) and TarP_Rhp (beige, H) in complex with ligands (coumarate, green/magenta; ferulate, orange; caffeate, brown; cinnamate, purple; and 4HPA, yellow). (I) A superposition of the TarP_Csal and TarP_Sse complexes with coumarate (coloured as in A and D). (J) A superposition of the TarP_Csal complex with coumarate and TarP_Rhp complex with 4HPA (coloured as in A and F). (K) An overlay of TarP_Csal in complex with coumarate and the apo structure of TarP_Csal (pink) superimposed on domain 1. The black arrow indicates the axis of rotation between the two domains. Helices are shown as cylinders, and ligands are shown as spheres to indicate the position of the binding pocket in each case. Images in C-K were generated using PYMOL. amino acid substitutions elsewhere in the binding pocket confer ligand specificity, whilst comparison of TarP_Rhp with CouP_Rhp reveals how the same ligand cargo is recognised differently in a secondary TRAP transporter and a primary ABC transporter from the same organism. These data expand the structural and biochemical characterisation of SBPs for the uptake of lignin-derived aromatics and the role of TRAP import systems in the utilisation of ecologically and biotechnologically important HCM compounds.

Results and Discussion
Enrichment of TRAP transporters with potential specificity for HCMs in marine proteobacteria Although TRAP transporters are widespread across the bacterial domain, we have previously noted their enrichment in phylogenetically diverse marine bacteria, presumably to enable scavenging of a diverse range of organic acids present at low concentrations in the ocean/aquatic environments [33]. Remarkably, some of these organisms have over 20 genes encoding TRAP SBPs, compared to just one in the model organisms Escherichia coli and Bacillus subtilis [33]. To assess the potential diversity of the transporters within these organisms, we extracted the sequences for all full-length TRAP SBPs from a subset of marine bacteria enriched for TRAP transporters [33], namely C. salexigens DSM 3043, Aurantimonas manganoxydans (formerly Aurantimonas sp. str. SI85-9A1), Ruegeria pomeroyi DSS-3 (formerly Silicibacter pomeroyi DSS-3), Jannaschia sp. CCS1, S. stellata E-37 and Labrenzia aggregata IAM 12614 (see Table S1 for a list of genomes and geneID abbreviations). The sequences were aligned and compared to the TRAP SBP subset from the known HCM degrader R. palustris (including TarP_Rhp), and genome context analysis was used to identify the linkage of potential catabolic genes. One striking clade, represented by at least one example from each genome (Fig. S1A,B), shares a close relationship with mandelate-binding SBPs and are linked to genes with likely functions in HCM degradation. On this selection basis, Csal_0280 from C. salexigens (TarP_Csal) and SSE37_24379 from S. stellata (TarP_Sse) in addition to TarP_Rhp (see Fig. 1A for gene contexts) were recombinantly expressed and purified for further biochemical characterisation.
Binding assays indicate related patterns of ligand specificity in the TarP SBP family To investigate ligand specificity, we screened TarP_Sse and TarP_Csal proteins by tryptophan fluorescence spectroscopy in the presence of equimolar amounts of coumarate, caffeate, cinnamate or ferulate. The basic structure of these ligands is an alpha, beta unsaturated, phenylpropanoate. The phenyl ring has additional hydroxyl or methoxy substituent groups at the C3 and/or C4 positions (Fig. 1B). A change in fluorescence emission upon exposure to the ligand indicated a ligand-induced conformational change in the environment of intrinsic tryptophan residues. Ligands that resulted in large quenches (25-40%) were selected for ligand titrations, monitoring the peak fluorescence emission over time, allowing determination of dissociation constants (see Fig. S2 for representative titrations and Table 1 Table 1), it was not possible to co-crystallise it with any of these ligands. Crystallisation of TarP_Rhp appeared to be dependent on the formation of a pyroglutamate at the N terminus of the protein (Fig. S4), which results in crystal formation taking many months (see Materials and methods for further details of crystallisation conditions). TarP_Rhp also co-purified and crystallised in complex with 4hydroxyphenylacetate (4HPA) (Fig. 1B), which is a common by-product of aromatic amino acid fermentation in microorganisms [38] and can also be derived from the degradation of lignin [39]. TarP_Rhp binds 4HPA in solution with a K d of~0.4 nM (Fig. S2), which is~20 times tighter than coumarate, providing an explanation as to why it remains intrinsically bound. Attempts to crystallise TarP_Rhp that had been gently denatured (to remove intrinsically bound ligands) and refolded were not successful. This may have been due to the presence of a di-sulphide bond between C117 and C235, which is visible in the final refined TarP_Rhp structure (Fig. S4). Despite not being able to determine structures of TarP_Rhp in complex with HCM ligands, the structure bound to 4HPA was determined to atomic resolution (1.1 A), enabling us to carry out modelling studies with the protein coordinates. We also determined highresolution co-crystal structures of both TarP_Sse and TarP_Csal in complex with coumarate, TarP_Sse with cinnamate and TarP_Csal with caffeate and ferulate; the latter structure is surprising given that fluorescence-based solution-binding assays did not detect binding of ferulate to TarP_Csal. The apo structure of TarP_Csal was also determined, enabling observation of the conformational change between an open (apo) and closed (ligand-bound) conformation of this subfamily of TRAP SBPs. Comparison of these structures ( Fig. 1C-K), in the context of the binding assays, reveals a conserved mechanism of ligand binding and identifies differences within the binding pockets that facilitate ligand specificity.
The arrangement of the binding pocket in the TarP family of SBPs The overall fold of all three TarP family homologues is typical of DctP-type TRAP SBPs (InterPro: IPR038404), consisting of two a/b domains connected by hinge regions with a ligand-binding site buried at the domain interface. The organisation of the secondary structure elements in each of the three proteins is conserved, and the closed, ligand-bound conformation is highly similar (RMSD Ca: 1.2-1. 8 A) (Fig. 1C-K). Comparison of the TarP_Csal apo structure and the complex with coumarate shows that domain 1 (residues N-1-124, 150-155, 213-258, 298-324-C) and domain 2 (residues 129-148, 157-210, 262-293) are connected by a number of flexible regions (residues 125-128, 149-156, 211-212, 259-261, 294-297) (Fig. S5), enabling the protein to open and close by a 13°rotation of one domain against the other around an axis that lies between the two domains (Video S1 and S2, Fig. 1K). Notwithstanding that we may have crystallised a partially open conformation, this degree of domain rotation is much smaller than in the archetypal TRAP SBP, SiaP, where the domain rotation is~28°between the open and closed forms (comparisons made between the apo structure, PDB ID: 2CEY and ligand-bound structure, PDB ID: 2XXK) [15]. This may be explained by both the larger size of the ligand (sialic acid) and the requirement to capture an extensive network of intrinsically bound water molecules along with the ligand in SiaP [40], which may require a more open conformation. In all three TarP proteins in this study, the binding pocket is largely lined with hydrophobic residues that generate packing interactions with the ligand, as well as conserved hydrogen bonding groups, which interact with the phenyl ring substituents at one end of the ligand and a carboxyl group at the other, conferring ligand specificity. The details of these interactions are shown in Figs 2A-C and 3A-C with the ligand fit to the density shown in Figs 2D and 3D. The features of the structures are further analysed in the following sections.
A conserved glutamate controls binding at the phenyl ring end of the ligand In TarP_Sse and TarP_Csal, a sequentially and structurally conserved glutamate from domain two (E188/ E190, respectively) is responsible for forming a hydrogen bond to one or both hydroxyl groups on the phenyl rings of coumarate, caffeate and ferulate, but not cinnamate, which lacks ring substituents (Figs 2A,B and 3A-C). With a coumarate ligand, there is an additional set of water-mediated hydrogen bonds between the glutamate and the phenyl ring -OH of the coumarate (involving two buried waters in TarP_Sse and one buried water with TarP_Csal) ( Fig. 2A,B). In TarP_Rhp, this glutamate is not sequentially conserved (aligns to G187 in the TarP_Rhp sequence), and in the structure, a serine (S188) satisfies the demand for a hydrogen bond to the hydroxyl group of the 4HPA in the corresponding region of the binding pocket, along with an additional interaction to H17 (Fig. 2C). There is, however, a glutamate positioned nearby (E251) in TarP_Rhp, that is spatially equivalent to E188/E190 in the other two SBPs (Fig. 2E,F). Although we could not obtain co-crystal structures of TarP_Rhp with any tight-binding HCM ligands, based on their increased backbone length compared to 4HPA (Fig. 1B), and the spatial conservation of E251, it seems likely that this glutamate may facilitate binding of HCMs in TarP_Rhp. This suggests that for this subfamily of SBPs, a glutamate acting as a hydrogen bond acceptor (given the pH of the crystallisation experiments) is important for binding the hydroxyl group(s) on the phenyl ring of the ligand. We note that the full hydrogen bonding potential of the hydroxyl group is not utilised by any of the proteins for which we have determined structures.

Understanding differences in cinnamate-binding affinity
Our assays show that although TarP_Sse and TarP_Csal both bind coumarate with similar affinity, TarP_Sse has 2-fold weaker binding for cinnamate and TarP_Csal shows no binding to cinnamate (Table 1). TarP_Rhp also has much lower binding affinity for cinnamate than coumarate (~5-fold weaker, see Table 1 and ref. [29]). Together this suggests that the absence of the phenyl ring hydroxyl group in cinnamate has a detrimental effect on binding, which is not surprising given the clear demand for hydrogen bond formation between the ligand and the protein in this region of the binding pocket. Comparison of the TarP_Sse/cinnamate and TarP_Sse/coumarate complexes shows that despite the loss of the hydrogen bond to E188, there is no dramatic change in the position of the ligand, the surrounding residues (sidechain movements < 0.4 A) or the adjacent water structure. The volume of the binding pocket is approximately equivalent in both structures, with a slight increase in volume in the cinnamate complex compared to the coumarate complex (36 A 3 vs 34.9 A 3 , respectively) ( Table 2). This perhaps reflects the loss of the hydrogen bond in the cinnamate complex, which may generate a slightly more open binding pocket. These subtle changes are consistent with a 2-fold change in binding affinity, with K d values of 120 AE 4 nM and 247 AE 21 nM for TarP_Sse with coumarate and cinnamate, respectively (Table 1). In TarP_Csal, the structural basis for the total lack of cinnamate binding is less clear because the hydrogen bonding groups around the ligand-binding pocket are conserved between TarP_Csal and TarP_Sse. However, the residues that provide packing interactions between the ligand and the walls of the binding pocket are different. A direct comparison between coumarate binding in TarP_Csal and TarP_Sse demonstrates that (F) After refinement of the TarP_Csal/ ferulate structure, the difference map (green mesh, contoured at 3 r) indicated that there are two binding modes of ferulate in the structure (the major binding mode is shown in orange, surrounded by the experimental map, contoured at 3 r). The minor binding mode of ferulate (aqua) binds in a similar position to caffeate, cinnamate and coumarate from the other structures. This location is associated with a different position of F12, F13 and A222 (shown in grey from TarP_Csal/cafferate structure), evidence for which can also be seen in the difference map. All images in this figure were generated using PYMOL.

442
The although the volume of the pocket is similar (Table 2), the ligand position is slightly different (Fig. 4A,B). This is in part due to Q251 in TarP_Sse, the Ne atom of which is positioned 3.2 A away from the oxygen atoms of the phenyl ring -OH group of the coumarate, forming a steric interaction, and 3 A away from the nearest oxygen atom of E188, forming a long hydrogen bond ( Fig. 2A). This forces the phenyl ring to sit higher in the pocket than it does in TarP_Csal, where this glutamine is replaced with a methionine (M254), which cannot form the same set of interactions with the ligand in this region of the binding pocket and therefore occupies a subtly different space within the structure (compare Fig. 4A,B). This amino acid difference results in a~1 A change in the position of the phenyl ring -OH group, causing the hydrogen bonding network between the ligand -OH and E190/E188 to differ between the two proteins (Fig. 4A,B). In TarP_Csal, E190 forms two hydrogen bonds, one to the hydroxyl group of the phenyl ring (2.6 A) and second to a water molecule (HOH1) (2.6 A) that is also hydrogen-bonded to the phenyl ring -OH (2.9 A), forming a ring of five bonded atoms (Fig. 4A). In TarP_Sse, the displacement of the ligand -OH group by Q251 draws E188 into a slightly different position outside of hydrogen bonding distance of HOH1. To fulfil the hydrogen bonding network, a second, solvent-exposed water molecule (HOH2) forms a bridging interaction from E188 to the phenyl ring -OH via HOH1 (distances: HOH2 ? HOH1 2.9 A and HOH1 ? phenyl -OH 3.0 A), forming a ring of 6 bonded atoms. HOH1 and HOH2 are held in these positions by mainchain interactions with the amino groups of N16 and V17, respectively (Fig. 4B). In the TarP_Sse/cinnamate complex, the position of HOH1 and HOH2 are conserved, satisfying the hydrogen bonding network around E188 despite the loss of the ligand -OH group (Fig. 4C). Our observed changes in the water networks involved in substrate binding by TarP_Csal and TarP_Sse, in which ligand binding depends on a more limited set of hydrogen bonding   TarP_Csal (pale green), the coumarate (COU, green) phenyl ring -OH group forms hydrogen bonds (orange dashes) to E190 and a water molecule (HOH1, numbered 1), which is also hydrogen-bonded to E190, forming a ring of four bonded atoms. (B) In TarP_Sse (pale blue), the coumarate (magenta) is in a slightly different position due to the position of the Q251 sidechain; this changes the water structure around the phenyl ring -OH of the ligand. HOH1 is outside of hydrogen bonding distance to E188, thus a second water (HOH2, numbered 2) forms bridging interactions with the ligand, forming a ring of five bonded atoms. The two water positions are stabilised by hydrogen bonds to mainchain amino groups (N16 and V17). (C) In the TarP_Sse (pale blue) structure with cinnamate (CIN, purple) the water structure is maintained despite the loss of the phenyl ring -OH group on the cinnamate, with HOH1 and HOH2 providing stabilising interactions with E188. All images in this figure were generated using PYMOL.

443
The interactions for TarP_Csal, offer a possible explanation for the lack of cinnamate binding by TarP_Csal. The absence of the hydroxyl group in cinnamate could be responsible for disrupting the water network with a concomitant decrease in binding affinity for the ligand.
In SiaP, achieving high-affinity ligand binding relies on a strict network of 14 buried water molecules within the binding pocket and disrupting this network by point mutations has been shown to severely reduce the binding affinity of sialic acid [40]. Our study shows that water molecules are also important for modulating ligand-binding affinity in the TarP SBPs, with 1-4 buried water molecules playing a key role in the hydrogen bonding networks surrounding the ligands in all of the proteins investigated here.
A conserved Y/R pair and a buried water molecule are responsible for binding the carboxyl end of the ligand In all three TarP proteins, a strictly conserved Y/R pair (TarP_Rhp: Y72/R145, TarP_Sse: Y72/R146 and TarP_Csal: Y74/R148) forms a core component of a highly networked salt bridge between the arginine and the carboxyl group of the ligand (  the carboxyl group of the ligand is completed by a second conserved tyrosine and an additional histidine residue, which interact with the carboxyl via a spatially conserved buried water molecule (TarP_Rhp: H122/ Y207, TarP_Sse: H123/Y211, TarP_Csal: H124/Y214). The number of buried water molecules in this region appears to be species-specific. TarP_Rhp and TarP_Sse both have one, whilst TarP_Csal has an additional buried water molecule 3.6 A away from the first, which forms an extra hydrogen bond to the ligand carboxyl. Its position is stabilised by hydrogen bonds to T184 (which is a valine in TarP_Sse) and the mainchain (Figs 2B and 3A). As with the interactions around the phenyl ring end of the ligand and the example of SiaP, this suggests that the specific location of buried water molecules is important for ligand recognition in TRAP SBPs.
The arginine from the Y/R pair is sequentially and spatially conserved in all TRAP family proteins [16,41,42]. Positioned at the end of b6 within domain 2, it is anchored by several hydrogen bonds to the mainchain of two adjacent b-strands (b7 and b8); carbonyl and amide of V186 (TarP_CsaI), T184 (TarP_Sse) and T183 (TarP_Rhp) and the carbonyl of M168 (TarP_CsaI), M166 (TarP_Sse) and A165 (TarP_Rhp). In TarP_Csal and TarP_Sse, an additional hydrogen bond between the Ne atom of the arginine and a threonine sidechain (T183 and T185, respectively) further enforces this position. This results in the arginine essentially substituting for the Cterminal residue of b6 (G149, G148 and A147 in TarP_Csal, TarP_Sse and TarP_Rhp, respectively) by fulfilling the hydrogen bonds on the edge of the surrounding strands. This generates a split b-sheet, of which the arginine is a key structural component. The surrounding secondary structure restricts the rotational freedom of the arginine sidechain, forcing it to adopt an unusual rotamer in the TarP_Rhp and TarP_Sse structures. In the SBP from the HCM ABC transporter, CouP_Rhp (PDB ID: 4JB0) [29], an equivalent arginine (R197), is responsible for forming the same kind of interaction with the carboxyl group of similar ligands (Fig. 5C). However, the fold of CouP_Rhp is quite different to TarP_Rhp and the structural context around the arginine is also distinct. In the CouP_Rhp complex with ferulate, R197 packs against an underlying tyrosine (Y166), and apart from a single hydrogen bond to D168, the only other hydrogen bond, other than with the ligand, is to a molecule of glycerol that sits close by in the crystal structure. As this arginine is solvent-exposed and on a loop region, it seems feasible that it functions as a flexible cap to the binding pocket. This contrasts with the situation in the TarP SBPs, where the arginine has an integral role in stabilising secondary structure elements within domain 2, is buried and is highly conformationally constrained.
In all our TarP structures, the arginine forms an end-on, bidentate twin-nitrogen, twin-oxygen interaction with the carboxyl group of the ligand. An inplane interaction is favoured for intramolecular salt bridges within proteins, as this optimises the overlap of the guanidinium hydrogen atoms with the syn lone pairs of the carboxyl oxygen atoms. However, in many high-resolution crystal structures deviations up to 50°o ut of plane are common [43,44]. In all of the TarP_ Csal and TarP_Sse complexes, the carbon backbone of the ligand (including the carbon of the carboxyl group) lies~70°outside of the plane of the guanidinium group (Fig. 5B), which is unusual when compared to these types of interactions within proteins in general and also, more specifically, within TRAP proteins [41]. The only structure in this study that is in-plane is the TarP_Rhp/4HPA complex, which is probably due to the smaller size and greater conformational freedom of the ligand, compared to the HCMs.
In CouP_Rhp, the carbon backbone of the ligand is more in-plane with the guanidinium group of the arginine, but here the two groups are twisted 45°in relation to each other (Fig. 5D). A suboptimal arrangement of the ligand carboxyl in relation to the critical arginine may indicate a shared mechanism of solute release, ensuring that binding of the ligand to the SBP is not favoured over release and subsequent transport. This may be even more important in the TRAP transporters where transport is independent of ATP hydrolysis. In SiaP, the carboxyl group of the sialic acid is in-plane with the critical arginine, but SiaP has a double arginine motif providing end-on and side-on interactions. The much bigger size of sialic acid compared to HCMs requires a bigger binding pocket and the ligand occupies a different region of the binding pocket to the HCMs. Moreover, binding of sialic acid relies on a network of several water molecules. Thus, even though the overall fold is the same in both of these TRAP SBPs, the interactions between the ligand and the protein are quite different, suggesting that SiaP may require a different mechanism of solute release.
Finally, we note that the electron density around the coumarate in both the TarP_Sse and TarP_Csal structures shows that the carbon backbone of the ligand is visibly bent, suggesting that there is an element of strain in the conformation of the ligand, which is reflected in the refinement, with plane distortion centred around the C3 atom of the ligand. Atom C3 is À0. 28  defined by the 6 neighbouring atoms of the ligand in TarP_Sse and TarP_Csal, respectively (Fig. S6). This could also indicate one mechanism of solute release in TRAP transporters. A), respectively (Fig. 6). The R148 sidechain is essentially pinned to b7 (via M168), sitting much further away from b8 (~8 A) than it does in the closed complex. The refined sidechain atoms of R148 had 2-fold higher B factors than the surrounding protein, suggesting that it is flexible. Consequently, the water structure surrounding R148 is disordered, such that many of the exposed hydrogen bonding groups along the edges of b7 and b8 do not have corresponding solvent molecules. This contrasts with the many ordered solvent molecules in and around the binding pocket in the apo structure. All but four of these water molecules are expelled to bulk solvent during domain closure of TarP_Csal (Fig. 3B), providing an entropic driver for ligand binding. Domain closure forces the sidechain of R148 closer to b8, which requires a change in the R148 rotamer, flipping the direction of the Ne by 180°so that the hydrogen atom points towards b8 and picks up an interaction with T185. Of the four hydrogen bonds formed by R148 in the open structure, two are retained during the conformational change to the closed complex (R148 NH2-M168 carbonyl and R148 carbonyl-V186 mainchain amide), one is lost and one is replaced, whilst four others are made. As well as the expulsion of bulk solvent, this net gain in hydrogen bonding groups, forming part of an extended network of interaction around the carboxyl group of the ligand, represents a key enthalpic driver for binding affinity in TarP SBPs. Video S1 and S2 illustrate the major changes in domain movement and bonding described in this section.

Ferulate has multiple modes of binding in TarP_Csal
In our binding assays, we could not detect any binding of TarP_Csal to ferulate (Table 1); however, with the high concentrations of ligand in the crystallisation experiment, we were able to obtain a co-crystal structure of this complex. There is evidence from the electron density map that ferulate has two modes of binding in the structure, with the conformation of the binding pocket adjusting to accommodate either (Fig. 3E,F). The minor mode of binding (not modelled in the deposited structure) is very similar to that of caffeate, including evidence from the difference map that the surrounding protein also has a second conformation that closely matches its conformation in the Fig. 6. Ligand binding is driven by the conformational change in the region surrounding the critical arginine (see also Video S1 and S2). In the open conformation of apo-TarP_Csal (pink), the R148 sidechain is pinned against b7 via two hydrogen bonds (orange dashed lines) to the mainchain carbonyl of M168, whilst its mainchain amide and carbonyl form hydrogen bonds to mainchain groups on b8 (T185 carbonyl and V186 amide). The direction of b7 and b8 are indicated by grey arrows and the strands are drawn with a transparent molecular surface (grey) to show their proximity to R148. The approximate location of the binding pocket is indicated by the grey circle (labelled BP) and the residues that mediate binding of the carboxyl group of the ligand in the closed complex (Y74, H124, Y214) are shown for context. All the water molecules in this region of the structure are shown as small red spheres. Interestingly, the buried water molecule that forms part of the network of interactions around the ligand in the closed structure (hydrogen-bonded between H124 and Y214) is prebound in the open structure. All images in this figure were generated using PYMOL. caffeate complex. The major binding mode, represent-ing~80% of the protein complexes in the crystal (based on the relative size of the map peaks), has the carboxyl group in approximately the same position as caffeate, but the phenyl ring of the ligand is rotated bỹ 160°and is packed against the opposite face of the binding pocket, under A222. This position requires the helix bearing A222 to move by~2. 5 A (Ca-Ca) compared to its location in the complex with caffeate. On the other side of the binding pocket, F12 and F13 adopt different rotamers to fill the space that is created by the repositioning of the ligand within the binding pocket of the protein. Analysis of the volume of the binding pocket in the TarP_Csal/ferulate complex shows that it expands as a result of these conformational changes (50.4 A 3 vs 39.2 A 3 with caffeate) ( Table 2). Presumably, the lack of binding of ferulate to TarP_Csal in our assays is due to ineffective closure of the protein around the ligand, which is in part due to the arrangement of hydrophobic groups around the binding pocket in TarP_Csal, which are incompatible with the position of the C3-methoxy group of the ligand. TarP_Sse has weak, but detectable binding for ferulate (K d 486 AE 72 nm) and a comparison of the TarP_Csal and TarP_Sse structures shows that F13 is replaced with a leucine (L11) in TarP_Sse (Fig. 2B,C), generating space adjacent to where the ferulate C3methoxy group would sit in the optimal binding mode. Thus, it seems feasible that TarP_Sse could achieve a fully closed complex, explaining why binding of ferulate could be detected in this species variant.

Modelling coumarate binding in TarP_Rhp
Although HCM ligands bind tightly to TarP_Rhp ( [29]; Table 1), we could not obtain crystals of these complexes (see Materials and methods section for further detail). We therefore used the structures of our protein/ligand complexes from the other two species variants to model coumarate binding in TarP_Rhp (K d 8 AE 5 nM). Using the protein structure from the TarP_Rhp/4HPA complex and without moving any of the sidechains, we manually docked a molecule of coumarate, using the conformation of the ligand from the TarP_Sse/coumarate complex. By optimising the hydrogen bond distances between the ligand and the protein at both ends of the binding pocket, we were able to obtain a reasonable fit (Fig. 7A). The only problem area was a clash between the phenyl ring of the coumarate and the indole ring of W247. With subtle changes in sidechain positions, this clash could be alleviated, so to remove bias from our interpretation, we performed computational modelling with flexible fitting. Water molecules were removed from the model, and rotational freedom was allowed for all of the sidechains surrounding the binding pocket and for the conformationally free bonds of the ligand. The top result places coumarate in the expected position, with a slight conformational change in the ligand backbone and a subtle rotation of the pose of the ligand in the binding pocket, compared to the manually docked model. The only major difference in sidechain rotamer is S188, which rotates by 120°, creating space for E251, W247 and H17 to form hydrogen bonds with the phenyl ring -OH of the ligand (Fig. 7B). This result agrees with our initial observation regarding the demand for a glutamate hydrogen bond acceptor to interact with the phenyl -OH group of the HCM ligand in this subfamily of TRAP SBPs. Very little movement in any of the other residues is required to bind coumarate, and we could directly model back in the buried water molecule that sits between Y207 and H122 from the TarP_Rhp/4HPA structure whilst maintaining reasonable hydrogen bonding distances. We therefore conclude that the structure of TarP_Rhp is compatible with binding coumarate and other HCMs, notwithstanding our inability to generate crystals of these complexes.

Comparison of the binding pockets of TarP family SBPs with those of other TRAP SBPs
To put our findings into a wider context, we drew upon a high-throughput study of ligand specificity in TRAP SBPs that determined structures of a range of TRAP SBPs in complex with their endogenously bound ligands [16]. Some of the ligands that cocrystallised in these structures have been confirmed to be the natural cargo of the SBP in question, whilst others have yet to be fully characterised. Three proteins from the Vetting et al. study [16] co-crystallise with aromatic/HCM-like molecules. These are SBPs from Bordetella bronchiseptica strain RB50 bound to mandelate (PDB ID: 4P56, Uniprot: Q7WJQ1), Ru. pomeroyi strain DSS-3 bound to a number of hydroxybenzoate derivatives (PDB ID: 4PAI, 4PAF and 4PBH Uniprot: Q5LSJ5) and Polaromonas sp. Strain JS666 (PDB ID: 4MNC Uniprot: Q122C7) bound to benzoyl formate compounds. All three species (or their close relatives) have aromatic degradation pathways [45,46].
A comparison of these three structures to our TarP proteins (Fig. S7) reveals some conserved structural motifs and some marked differences. The overall folds are clearly related to the TarP SBPs (all atom RMSDs; PDB ID: 4P56, 1.6 A, PDB ID: 4PAI, 6.0 A and PDB ID: 4MNC, 6.5 A, compared to TarP_Csal). Of the three examples, the hydroxybenzoate-bound SBP from Ru. pomeroyi has the lowest sequence conservation around the binding pocket. Like SiaP, it has a double arginine motif at the carboxyl end of the ligand and the rotamer of the critical arginine is different to the TarP SBP family. The hydrogen bonding groups around the hydroxy-benzene are similar to those in the TarP SBPs, but there are many unfulfilled hydrogen bonds in this region. The binding pocket is also very spacious (73. 8 A 3 ) given the small size of the ligand and contains a number of water molecules that are connected to external bulk solvent, suggesting that the pocket is not fully closed. Together, this implies that in the crystal structure the ligand may be bound suboptimally and therefore may not be the true cargo of this SBP.
Like the SBP from Ru. pomeroyi, the benzoyl formate-bound SBP from Polaromonas sp. Strain JS666 differs significantly from the TarP SBPsubfamily, both in terms of the relative organisation of the secondary structure elements within its fold and the sequence conservation around the end of the binding pocket where the benzene ring is bound. There are no conserved acidic hydrogen bonding groups around the benzene ring of the ligand and this region of the binding pocket is largely hydrophobic, dominated by a tyrosine residue (Y44) that blocks the end of the pocket. Three water molecules accompany the ligand in the pocket, making mainchain interactions with the protein and packing against the benzene ring of the ligand. The critical arginine is held in a highly networked salt bridge with the additional threonine on the adjacent beta strand, as in TarP.
Finally, the mandelate-bound SBP from B. bronchiseptica RB50 is the most structurally similar to the TarP SBPs. The position of the arginine and its rotamer are exactly the same as in the TarP SBPs ( Fig. 2A-C,E-F), including its involvement in a highly networked salt bridge. There is also high sequence conservation around the rest of the binding pocket, including the strictly conserved buried water that sits between the Y/H pair (Y234/H146) to one side of the ligand carboxyl group. The other waters in the pocket are not conserved, and they are linked to bulk solvent via a channel adjacent to the ligand. This is due to subtle packing differences between the loop at the N-terminal end of b6 and the neighbouring secondary structure in the mandelate-binding protein that creates more space, whereas in TarP_Csal, for example, the sidechains of D86, P150 and T216 fill this channel. The glutamate that binds the -OH group of the ligand phenyl ring in the TarP SBPs in spatially conserved in the B. bronchiseptica RB50 protein (E212), but because it does not engage with the mandelate it occupies a different position; therefore, there are a number of unsatisfied hydrogen bonds to the surrounding residues. A tryptophan residue (W274) dominates the space in this region of the binding pocket applying a selection for the smaller mandelate ligand, over, for example, larger HCMs. However, with a subtle rotation of the sidechain, the binding pocket would closely resemble that of a TarP SBP. In support of this, orthologues of the B. bronchiseptica RB50 mandelatebinding SBP group very closely with the TarP SBPs as shown in the phylogenetic tree in Fig. S1B.

Conclusion
We have demonstrated the basis of ligand specificity in a group of TRAP transporter SBPs that bind and transport HCM ligands formed as part of lignin degradation in marine environments. Like other known TRAP transporter family members, the proteins in our study rely on a critical interaction between the carboxyl group of the ligand and a conserved arginine residue on the protein that is conformationally restricted within the structure. This differs from SBPs from ABC transporters, where an arginine with a similar role forms a flexible latch to close the binding pocket. Along with variations in the surrounding network of hydrogen bonding groups and buried water molecules, this interaction confers ligand specificity to this family of SBPs.
We have further shown that the position of spatially conserved water molecules within the binding pocket can subtly modulate ligand-binding specificity, even between very structurally similar ligands. This supports evidence from studies with SiaP that show the position of water molecules is important and represents a shared mechanism for ligand specificity. In addition, analysis of our protein/ligand complexes and comparison with other TRAP transporter SBPs suggests that there is a level of strain on the bound conformation of the ligand (Fig. S6), related both to the structurally constrained binding pocket and relative orientation of the critical arginine to the carboxyl group of the ligand. This may represent a mechanism for ligand release in these ATPindependent transport systems, but further work is required to test this directly. Overall, a deeper understanding of how proteobacteria use TRAP transporters to acquire HCMs may help in exploiting such transporters in the future production of commercially valuable chemicals from lignin-derived aromatic feedstocks.

Production and purification of recombinant protein
Escherichia coli BL21 (DE3) transformed with either pETc-sal_0280 or pETsse37_24379 was grown at 37°C to an optical density at 600 nm (OD 600 ) of 0.6 in LB medium containing carbenicillin (50 µgÁmL À1 ) (Melford Laboratories, Ipswich, UK). Overexpression of genes encoding recombinant TarP_Sse or TarP_Csal was induced by the addition of 0.4 mM isopropyl-b-D-thiogalactopyranoside (IPTG), and cells were incubated at 25°C at 250 r.p.m. for a further 5 or 3 h, respectively. TarP_Rhp was produced using the same method, but cells were grown postinduction for 2 h at 37°C. Selenomethionine (SeMet) protein was produced using the same method as for the native proteins, but IPTG-induced overexpression was carried out in M9 minimal media supplemented with 40 mgÁL À1 L-selenomethionine [52]. Cells were harvested by centrifugation ( 25 min, 4°C), and the cell-free extract was applied to a Hi-trap HP Nickel affinity column (GE Healthcare, Amersham, UK). Native and SeMet proteins were eluted over a 20-500 mM imidazole gradient in 20 mM sodium phosphate buffer pH 7.4 containing 500 mM sodium chloride. To remove any endogenously bound ligands that may have co-purified with TarP_Sse and TarP_Csal, purified protein was ureatreated and dialysed, as described in Salmon et al. [29]. TarP_Rhp was not urea-treated due to the presence of a disulphide bond in the structure, which caused issues with correct re-folding. Proteins were concentrated and buffer exchanged into 50 mM Tris/HCl pH 7.2, 100 mM NaCl prior to structural studies.

Tryptophan fluorescence spectroscopy
Changes in the UV fluorescence of intrinsic tryptophan residues in recombinant TarP_Sse and TarP_Csal were measured on a Cary eclipse fluorimeter (Agilent Ltd, Stockport, UK) in 10 mM Tris/HCl pH 7.4 at 30°C in a 3 mL stirred quartz cuvette. For emission scan experiments, samples were excited at 280 nm (5 nm slit width) and emission was recorded at 300-400 nm (20 nm slit width). Ligand titrations were performed with 0.2 µM recombinant protein (unfolded, dialysed and refolded for TarP_Sse and TarP_Csal) in 10 mM Tris/HCl pH 7.4 buffer at 30°C with excitation at 280 nm and emission at 340 nm using 5 nm excitation and 20 nm emission slit widths, respectively.

Protein crystallisation
SeMet TarP_Sse was concentrated to 10 mgÁmL À1 prior to the addition of 6 mM coumarate. Automated crystallisation screens were carried out with a Hydra II crystallisation robot using commercial screens (Nextal, Molecular Dimensions, Sheffield, UK) (290 K). This identified several highsalt conditions that were subsequently optimised using hanging-drop vapour diffusion with a 1 : 1 ratio of protein to mother liquor. This resulted in the production of large cuboidal crystals that grew over a few days in conditions containing 0. Native TarP_Rhp (7 mgÁmL À1 ) was screened in the presence of 6 mM coumarate by sitting drop vapour diffusion. Thin, plate-like crystals grew from a drop containing 0.1 M Tris/HCl pH 8 and 20% (w/v) PEG 6000. However, unlike the other two TRAP proteins, where crystals grew in days, crystals of TarP_Rhp grew over a number of months and were not readily reproducible in either commercial screens or by manual optimisations.

Data collection, structural determination and analysis
All crystals were cryoprotected in their mother liquor plus 25% (w/v) ethylene glycol (PEG-based conditions) or glycerol (ammonium sulphate conditions) and then subsequently mounted on a liquid nitrogen cold stream (100 K) prior to data collection. All datasets were collected on the MX beamlines at the Diamond Light Source (Table 3). For TarP_Sse, Seleno-MAD data were collected from a single crystal of the protein in complex with coumarate at two wavelengths (12 663 and 12 659 eV). Data were processed using Xia2 [53], which determined that the crystal belonged to the spacegroup P2 1 with cell dimensions of a % 83 A b % 88 A c % 97 A and angles of a = c = 90°b % 92.3°. SHELXCDE [54] was used to determine a selenium substructure, from which preliminary phases were determined and an initial model was built. Forty-one selenium sites (~10 per subunit in the AU) were found for TarP_Sse, from which~70% of the four subunits within the asymmetric unit were built automatically by SHELXE. Model building was completed using PHENIX PHASE and BUILD [55] before ligand density was interpreted in COOT [56] using ligand coordinates generated in JLIGAND [57]. The model coordinates were refined in REFMAC5 [58]. The TarP_Sse/cinnamate complex belonged to the same spacegroup as the SeMet structure and was determined to 1.9 A resolution by molecular replacement with PHASER [59] within CCP4I [60] using the protein coordinates of the SeMet/coumarate complex as a search model.
For TarP_Csal, Seleno-MAD data from a single crystal grown in the presence of caffeate were collected at three wavelengths (12 748, 12 672 and 12 700 eV). The data were processed by Xia2 in spacegroup P2 1 2 1 2 with cell dimensions of a = 81.91 A b = 119.59 A c = 61.96 A and angles of a = b = c = 90°. The structure was determined to 1.67 A resolution by Seleno-MAD using SHELXCDE [54], which autobuilt a preliminary poly-Ala backbone from 18 selenium sites contained within two molecules in the asymmetric unit. Model building, refinement and interpretation of the ligand   [59] using two search models generated from each domain of the closed monomer of TarP_Csal.
For TarP_Rhp, high-resolution (1.1 A) native data were processed using Xia2 [53], revealing that the crystal belonged to the space group P2 1 2 1 2 1 with cell dimensions of a = 38.67 A b = 50.5 A c = 142.56 A and angles of a = b = c = 90°. The structure was determined ab initio using Arcimboldo [61], which auto-built the poly-Ala backbone (305/336 amino acids) of a monomer of TarP_Rhp that was contained within the asymmetric unit. Even though the protein was co-crystallised in the presence of 5 mM coumarate, the ligand density in the binding pocket corresponded to a single molecule of 4HPA, which had presumably co-purified with the protein. The resulting structure was built and refined in the same way as the TarP_Sse and TarP_Csal structures. The electron density showed that after cleavage of the PelB leader peptide, two residues had been lost from the N terminus of the protein, resulting in an N-terminal glutamine, which had subsequently spontaneously cyclised to form a pyroglutamate. The position of the pyroglutamate in the crystal lattice was such that it packs within a depression on a neighbouring molecule, generating several crystal contacts. It is therefore likely that crystallisation is dependent on the formation of the Nterminal pyroglutamate, which can form spontaneously in solution over a matter of weeks, potentially explaining why crystallisation took many months. Intriguingly, the cyclised glutamine is predicted to be the N-terminal residue in the mature protein, indicating that the formation of an Nterminal pyroglutamate may be important for stability of this protein. In all structures, the predicted N-terminal glutamine residue in the mature protein is numbered as residue 1.
Binding pocket volumes were calculated with CASTp using a 1. 4 A probe radius [62]. Structure validation was carried out with MOLPROBITY [63], and COOT [56]. An analysis of the flexible regions and motion between the open and closed structures of TarP_Csal was carried out with DYN-DOM [64].

Computational modelling
Computational modelling was carried out with FlexAID (NRGSuite v2.48I plugin for PYMOL v2.1.0 [65]; The PyMOL Molecular Graphics System, Version 2.0, Schr€ odinger, LLC.) using desolvated protein coordinates from the TarP_Rhp/4HPA structure and a molecule of  coumarate from the Trap_Sse/coumarate structure. Sidechain flexibility was enabled for 15 sidechains around the  binding pocket (R145, Y72, F211, Y67, W12, H17, W247,  E251, S188, H122, F184, L191, V13, F250 and F192), which was defined by a sphere with radius 8 A centred on the binding position of 4HPA. Flexibility was also enabled for the two bonds in coumarate that have rotational freedom and distance constraints of 2.7 A were applied between the carboxyl group oxygen atoms of coumarate and the NH1 and NH2 nitrogen atoms of R145.

Accession numbers and data availability
Coordinates for the novel structures reported in this study have been deposited in the Protein Data Bank with the following PDB codes: 7NQG, 7NR2, 7NRA, 7NRR, 7NSW, 7NTD and 7NTE. Data collection and refinement statistics can be found in Table 3.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article.    Fig. S4. Details of structural features from the TarP_Rhp complex with 4HPA. Fig. S5. The domain organisation of TarP SBPs. Fig. S6. Analysis of plane distortions in the coumarate ligand. Fig. S7. Superposition of TarP_Csal with possible related TRAP SBPs. Table S1. Locus tag identifiers of the TRAP SBPs used in this study for the phylogenetic analysis in Fig. S1. Video S1. Overview of the conformational changes associated with coumarate binding in TarP_Csal, with major residues surrounding the ligand binding pocket (E190, Y74 and R148) highlighted. The video was generated using Chimera by interpolating between the open apostructure and the closed ligand-bound structure of TarP_Csal. The protein backbone is shown as pale green ribbons with important residues and secondary structure elements shown as sticks. Coumarate is shown as darker green sticks. Video S2. Conformational changes around the key arginine (R148) in TarP_Csal in more detail, including the changes in the hydrogen bonding network between the arginine and the surrounding beta sheet during domain closure (see also Figure 5A and 6). Hydrogen bonds and distances are shown in orange. The protein backbone is shown as pale green ribbons with important residues and secondary structure elements shown as sticks. Coumarate is shown as darker green sticks. Video generated using Chimera as for Video S1.