DUF916 and DUF3324 in the WxL protein cluster bind to WxL and link bacterial and host surfaces

Abstract Bacterial WxL proteins contain peptidoglycan‐binding WxL domains, which have a dual Trp‐x‐Leu motif and are involved in virulence. It was recently shown that WxL proteins occur in gene clusters, containing typically a small WxL protein (which in the mature protein consists only of a WxL domain), a large WxL protein (which contains a C‐terminal WxL domain with N‐terminal host‐binding domains), and a conserved protein annotated as a Domain of Unknown Function (DUF). Here we analyze this DUF and show that it contains two tandem domains—DUF916 and DUF3324—which both have an IgG‐like fold and together form a single functional unit, connected to a C‐terminal transmembrane helix. DUF3324 is a stable domain, while DUF916 is less stable and is likely to require a stabilizing interaction with WxL. The protein is suggested to have an important role to bind and stabilize WxL on the peptidoglycan surface, via the DUF916 domain, and to bind to host cells via the DUF3324 domain. AlphaFold2 predicts that a β‐hairpin strand from DUF916 inserts into WxL adjacent to its N‐terminus. We therefore propose to rename the DUF916‐DUF3324 pair as WxL Interacting Protein (WxLIP), with DUF916, DUF3324 and the transmembrane helix forming the first, second and third domains of WxLIP, which we characterize as peptidoglycan binding domain (PGBD), host binding domain (HBD), and transmembrane helix (TMH) respectively.


Modelling Method
TrRefine Rossetta   S5).PGBD is in blue, the buttressing loop in maroon, HBD in red, and conserved residues in cyan.The most obvious difference is the lack of cyan residues at the top of (b).iBAQ is a measure of the relative abundance of each protein.Abundance is the estimated proportion of total protein represented by the pDUF916 construct.The peptide score is a measure of uniqueness, with a larger score indicating higher confidence in the protein annotation.
Fig S1.Sequence alignment of 175 PGBD proteins from WxL clusters (TableS5).Sequences are identified using the GenBank code.Conserved residues (defined as residues invariant in at least 60 sequences) are indicated at the bottom of the alignment.The alignment was conducted using MUSCLE, and further Fig S2.Sequence alignment of 175 HBD proteins from WxL clusters (TableS5).Sequences are identified using the GenBank code.Conserved residues (defined as residues invariant in at least 60 sequences) are indicated at the bottom of the alignment.The alignment was conducted using MUSCLE, and further organised using SeaView.

Figure S3 .
Figure S3.Structure prediction of WxLIP proteins by Phyre2.Structures A-D are respectively EfmWxLIP1, EfmWxLIP2, EfmWxLIP3 and EfsWxLIP.The predicted structures were obtained in April 2021.

Figure S4 .
Figure S4.Robetta prediction of WxLIP proteins.The proteins are the same as those shown in Fig S3.The predicted structures were obtained in April 2021.

Figure S5 .
Figure S5.Conserved residues in WxLIP.The comparisons are done between (a) proteins in the clusters of human symbionts from Figures 2 and 3, and (b) all PGBD/HBD proteins from bacterial genomes (TableS5).PGBD is in blue, the buttressing loop in maroon, HBD in red, and conserved residues in cyan.The most obvious difference is the lack of cyan residues at the top of (b).

Figure S6 .
Figure S6.Design of constructs for locus C domains.(A) Sequence alignment of WxLIP proteins.EfmWxLIP1, 2 and 3 represent Locus A, B and C of E. faecium DO, and EfsWxLIP represents DUFE from E. faecalis V583.The sequence highlighted in yellow represents the signal peptide, dark blue is PGBD and cyan is HBD.Conserved residues are highlighted in red, sequences with one variation are highlighted in purple, and the highly conserved sequence NQIDK is boxed.β-strand is shown as a blue arrow, the green spiral represents alpha helix, and the brown spiral represents the C-terminal transmembrane helix.The sequences highlighted in red boxes indicate the starting and ending point of the three different constructs; SAS to NE is pDUF916 1, SAS to NN is pDUF916 2 and NE to NN is pDUF916 3. (B) Protein sequences.pDUF916 1, pDUF916 2 and pDUF916 3, with the 6 His-Tag at the N-terminus in sky blue followed by linker in bold.(C) Properties of the three constructs.The three constructs code for EfmWxLIP3 PGBD, EfmWxLIP3 PGBD and HBD, and EfmWxLIP3 HBD (ie DUF916, DUF916+DUF3324 and DUF3324) respectively.

Figure S7 .
Figure S7.Schematic representation of the different E. faecium DO Locus C EfmWxLIP3 domain constructs in pOPINF vectors.The three vectors pDUF916 1_pOPINF, pDUF916 2_ pOPINF and pDUF916 3_ pOPINF are shown; Green is respectively PGBD, PGBD+HBD, and HBD in the three constructs, with a His-Tag on the Nterminal end in sky blue.