Endomembrane trafficking is one of the most prominent cytological features of eukaryotes. Given their widespread distribution and specialization, coiled-coil domains, coatomer domains, small GTPases and Longin domains are considered primordial ‘building blocks’ of the membrane trafficking machineries. Longin domains are conserved across eukaryotes and were likely to be present in the Last Eukaryotic Common Ancestor. The Longin fold is based on the α-β-α sandwich architecture and a unique topology, possibly accounting for the special adaptation to the eukaryotic trafficking machinery. The ancient Per ARNT Sim (PAS) and cGMP-specific phosphodiesterases, Adenylyl cyclases and FhlA (GAF) family domains show a similar architecture, and the identification of prokaryotic counterparts of GAF domains involved in trafficking provides an additional connection for the endomembrane system back into the pre-eukaryotic world. Proteome-wide, comparative bioinformatic analyses of the domains reveal three binding regions (A, B and C) mediating either specific or conserved protein–protein interactions. While the A region mediates intra- and inter-molecular interactions, the B region is involved in binding small GTPases, thus providing an evolutionary connection among major building blocks in the endomembrane system. Finally, we propose that the peculiar interaction surface of the C region of the Longin domain allowed it to extensively integrate into the endomembrane trafficking machinery in the earliest stages of building the eukaryotic cell.
Over the past few decades, details of the protein machinery mediating subcellular trafficking in eukaryotic cells have been and continue to be elucidated primarily via elegant experimental work in model systems such as yeast and mammalian cell lines . This machinery is largely composed of protein families that perform similar mechanistic functions but at discrete cellular locations or transport pathways. These organelle-specific paralog families are often conserved amongst the broadest spans of eukaryotic diversity implying their ancient nature as present in the Last Eukaryotic Common Ancestor (LECA) . Extensive comparative genomic and phylogenetic studies have reconstructed the history of many protein families within the endomembrane system including the Rabs , the SNAREs , the various vesicle coats [5, 6] and the cargo adaptors [7, 8]. This information has been used to construct a mechanistic explanation for the evolution of non-endosymbiotic organelles [2, 9]. However, while the steps proximal to the LECA are tractably reconstructed, only a few attempts have gleaned information about the stages leading back from LECA to earlier eukaryotic ancestors.
Some of the protein domains conserved in membrane traffic are common to different complexes involved in multiple trafficking steps. Hence, they can be considered as primordial building blocks, harkening back to the deepest origins of the endomembrane system. The Coatomer Protein (COP)I, II, and Clathrin complexes share with the nuclear pore complex a common architecture which is composed of a unique arrangement of β-propeller and α-solenoid domains, thus representing a coatomer building block . This arrangement is potentially shared with the planctomycetes , although whether this is a homologous or analogous relationship and the functional role of these proteins in prokaryotes remain unclear. Small GTPases regulate vesicle formation, tethering and fusion in the eukaryotic cell; homologues of the small GTPase Ras superfamily and MglA are found in prokaryotes [8, 12]. SNARE-like coiled-coil domains are found in diverse other protein families in both prokaryotes and eukaryotes . SNAREs are conserved in all eukaryotes [4, 13, 14] and provide the driving force for membrane fusion .
The Longin domain was identified and defined as a ∼120-residue conserved domain  with globular structure consisting of five antiparallel β strands sandwiched in between α helices . It was originally found to regulate SNARE fusion [16, 18]. Determination of subunit structures from the Adaptor Protein (AP) clathrin adaptor, TRAnsport Protein Particle (TRAPP) and Signal Recognition Particle (SRP) complexes [19-21] also revealed the involvement of Longin domains in vesicle budding and tethering events . Identification and characterization of many Longin-like trafficking proteins [23-25] suggested the Longin domain to be one of the central building blocks of the trafficking machinery . Recently, Longin domains were recognized also in the COPI complex  and in DENN proteins, a family of Rab GTPases regulators [28, 29].
Longin domains were sometimes referred to as ‘Profilin-like’ because of a somewhat similar fold shape. However, Profilin belongs to the huge and highly heterogeneous group of PAS/GAF domains  that are present in bacteria, archaea and eukaryotes and play a wide range of cellular functions, mostly other than trafficking . By contrast, all Longin domains show a homogeneous cellular role and taxonomic distribution  and a peculiar connectivity between secondary structure elements (‘topology’, see ), that is different from the PAS/GAF topology of Profilin . We show here that Longin domains represent a large group consisting of seven ‘homologous superfamilies’  which are not significantly similar to Profilin or any other PAS/GAF domain. Therefore, special features of binding epitopes – rather than the fold per se – are likely to account for the special adaptation of Longin domains to the eukaryotic trafficking machinery . We performed comparative in silico analysis of Longin and GAF conformational epitopes to unveil common or peculiar features possibly involved in trafficking. In addition to highlighting unique features of the Longin domain, our results suggest an evolutionary hypothesis for its integration with other building blocks of the endomembrane system in the ancient eukaryotes before the LECA, over 2 billion years ago.
Results and Discussion
Comparative genomic analysis and classification of Longin homologous superfamilies
Since the discovery of the Longin domain , various individual analyses, performed with different taxon sampling [14, 24-26, 33, 34], addressed the distribution and evolution of a number of Longin ‘homologous superfamilies’ [Class, Architecture, Topology and Homologous (CATH) classification, see  and next section for more details]. In order to update the dataset, we performed PSI-BLAST searches  using the sequences of Longin domains reported in literature as queries. When structural data was not available, fold and topology were inferred by modelling (see Methods).
Current taxonomy  divides eukaryotes into six large ‘supergroups’ , of which four are well-defined: Opisthokonta (animals and fungi), Amoebozoa, Excavata and Archaeplastida (plants). In the light of current debate about the last two supergroups [37-39], we will treat the relevant taxa sampled as a single ‘SAR/CCTH’ supergroup pending resolution in the field. In order to examine the taxonomic distribution of Longin domains in a systematic fashion, we performed a comparative genomic survey sampling broadly across eukaryotic diversity, providing here the first comprehensive view: Longin domain proteins are classified into seven homologous superfamilies: Longins, Adaptins, Sedlins, SANDs, Targetins, DENNs and AVLs (Figure 1). For sake of clarity, hereafter the name of the homologues of each protein in Homo sapiens is used.
Sensu strictu Longins, i.e. VAMP7, YKT6 and Sec22b homologues  are known to spread across all eukaryotic supergroups sampled , a result that we extend here with increased taxon sampling.
Adaptins consisting of (or containing a) Longin domain are the σ and μ subunits of AP complexes 1–4 , which are involved in the process of vesicle budding at the Golgi, the endosomal compartment and the plasma membrane. These subunits are embedded in the trunk of the complex where they are crucial to complex assembly . Longin domains in the σ and μ subunits of the recently identified AP5 complex (involved in trafficking from the late endosome, ) were identified and are modelled here (Figure 2). The COPI complex plays the same role in the Golgi to endoplasmic reticulum (ER) retrograde trafficking route, and its overall structure is similar to AP complexes: indeed, δ and ζ subunits of COPI are structural and functional homologues of AP μ and σ subunits, respectively . Adaptins are found in the diversity of eukaryotes implying their presence in the LECA .
Sedlins are components of the TRAPP complexes I and II, which localize at the surface of cis- and trans-Golgi, respectively, and are required for tethering ER-derived vesicles to the Golgi membrane and for intra-Golgi trafficking. Sedlin is conserved also in the TRAPP III complex, which plays a role in autophagosome formation. Three Longin subunits are found in a solved TRAPP structure : Sedlin/TRAPPC2, Bet5/TRAPPC1 and Synbindin/TRAPPC4. This last structure shows domain insertion within a loop of the Longin domain . Furthermore, we found the Longin fold to be predicted also for TRAPPC2L . While eukaryotic genomes may lack some of the other TRAPP II components, Sedlin homologues are quite well conserved and found in organisms from all eukaryotic supergroups .
SANDs superfamily encompasses in H. sapiens HPS-1/4, Mon1A/B and Ccz1/C7orf28A proteins . Mon1 and Ccz1 are involved in Rab conversion ; the dimeric Mon1-Ccz1 complex, but neither protein alone, is the Rab Guanine nucleotide Exchange Factor (GEF) in yeast . Moreover, Mon1A and B may have different functions . According to structural modelling, SANDs represent the first example of proteins endowed with multiple Longin domains (not shown). HPS-1 and HPS-4 are components of, and direct interactors within, the Biogenesis of Lysosome-related Organelles Complex-3 (BLOC-3) . Members of this superfamily are classified together based on previous analyses ; however, since SANDs do not retrieve one another as the closest reciprocal best scoring sequences upon BLAST analysis, the superfamily is tentative. Our searches extend taxonomy sampling to haptophytes, additional opisthokonts and amoebozoans. A recent evolutionary analysis of the BLOC complex showed that homologues of human SANDs are present in opisthokonts and amoebozoans at least . We here extend this observation with homologues of Mon1 and of Ccz1/C7orf28A in all supergroups. Our findings are also consistent with those of Cheli and Dell'Angelica , identifying HPS1 or HPS4 homologues in excavates, in addition to unikonts. To shed more light on the evolution of Mon1A/B homologues in diverse eukaryotes, we undertook a phylogenetic analysis. On the basis of our results (Figure 3), the duplication giving rise to Mon1A and Mon1B potentially occurred in the vertebrates, thus proteins in other taxa should be classified simply as Mon1 homologues.
Targetins are defined herewith as homologues of the N-terminal (N-ter) region of the α subunit of the SRP Receptor (SR), called SRx ; they mediate targeting of the ribosome to the ER by association with SRb . Our searches identify putative SRx homologues in taxa from each eukaryotic supergroup.
DENNs are regulators of the Rab GTPases function [28, 29]; we identified the Longin fold and topology in DENN-related proteins such as Avl9, FAM45A, FAM45B and LCHN. DENNs also span across all eukaryotic supergroups.
In conclusion, all seven Longin homologous superfamilies are widely conserved across all eukaryotes, albeit with some potentially interesting instances of loss (HPS1/4 in the mega plant clade) or lineage-specific innovation (e.g. Phytolongins in land plants, ). Importantly, for issues of the pre-LECA evolution of the membrane trafficking system, this analysis confirms both the specific presence of the Longin domain only in eukaryotes (already in the LECA) and apparent absence from bacteria and Archaea, highlighting its role as a building block in eukaryotic subcellular trafficking .
Structural comparison and classification
CATH is a long established (since 1997) and updated  hierarchic system for the classification of individual protein domains and the acronym is derived from the four main levels of classification. Level 1 (Class) is quite general as indeed all known domains sort into only four classes. Of note, almost half this universe belongs to Class 3 , grouping mixed α-β domains (Figure 4). Level 2 (Architecture) is also very general as it corresponds to the 3D shape (overall, ‘gross’ fold), i.e. independent of connectivity among secondary structure elements. Grouping at this level is rather broad, as for instance the three-layer α-β-α sandwich (Figure 4) is the most common fold in Class 3 and, together with the non-bundle fold from Class 1 (mainly α), it is one out of the two most common folds in nature (CATH data). Each architecture contains a (often large) number of different domain folds that can be distinguished at the next level down: Level 3 or Topology. Fold similarity is rough among domains that only share class and architecture  and becomes more accurate only when they also have the same topology. Immunoglobulin-like domains represent a well-known example of grouping at the topology level. Domains grouping at Level 3 may have quite different functions; instead, domains from the same ‘Homologous superfamily’ (Level 4) show high structural similarity, similar functions and are likely to have evolved from a common ancestor. Longin and PAS/GAF domains are in different categories from Level 3 onward (Figure 4). CATH criteria and current evidence clarify that the occasionally used ‘Profilin-like’ name is no longer suitable for as extensive and diverse a set as the Longin domains. The single Profilin superfamily is in fact characterized by a topology shared with GAF domains [30, 31].
It is noteworthy that the topologies of Longin and PAS/GAF domains are permutated rather than randomly remixed. At the same time, all Longin domains show homogeneous distribution (widely conserved in, and only in, eukaryotes) and cellular function (part of the trafficking machinery), whereas PAS/GAF domains are present also in Archaea and Bacteria and they play a wide range of molecular and cellular functions. In particular, among domains with GAF topology, only Roadblock and Profilin domains (Figure 4) are involved in trafficking, whereas other GAF and all PAS domains are not.
In order to get a systematic perspective on the structural features of the Longin domain-containing proteins, as well as the PAS/GAF proteins, we undertook proteome-wide fold recognition at the DALI server . Searches were iterated to retrieve the largest ‘available’ (as limited to Protein Data Bank (PDB) structures) dataset. Longin and PAS/GAF structures (including only the domain of interest from multi-domain structures) were used as primary queries; then, newly retrieved structures were used as further queries. Among the retrieved hits, none was found to belong to any domain family other than queries. Such a distribution is neither proof for evolutionary nor for functional relationship, because DALI recognizes the 3D ‘shape’ of the fold regardless of other very important features like topology or homology: its method breaks the input structure into hexapeptide fragments and aligned profile fragments may appear in different orders within the structures being compared . Moreover, when using other fold recognition tools (e.g. PDBeFold at the EBI server), further domains with similar fold were retrieved from protein families other than queries (not shown).
Once topology and function from the query and the retrieved structures are compared, no preferential cross extraction is observed, i.e. when using trafficking domains as queries, also non-trafficking ones are retrieved, and vice versa.
Therefore, the DALI dataset was re-investigated in more detail. Relevant variation was observed in number, length and 3D orientation of secondary structural elements, but the different domain topologies were found to share a central block corresponding to a five-stranded β flat interrupted by insertion of an α helical region connecting β2 to β3, hereafter referred as ‘principal’ (αp) because of its position (Figure 4). Some PAS domains consist of only such a ββαpβββ block, in turn corresponding to the PAS signatures deposited in sequence profile databases (PROSITE PS50112, Pfam PF00989 and others). PAS domains are sensors involved in a wide range of cellular functions ; in addition to the PAS block/signature, they may have zero to two additional N-terminal α helices and a single C-terminal α helix. The PAS block is found also in GAF domains where it is always nested in between additional α helices (Figure 4). Sharing of both fold and topology among several PAS and GAF domains is in agreement with their evolutionary relationship [31, 32].
GAF domains also play a wide range of cellular functions: they are present in phosphodiesterases (PDEs)  and adenylate cyclases . Profilins  regulate actin polymerization and interact with cytoskeletal proteins linking actin to extracellular membrane  and Roadblocks are involved in scaffolding and cytoskeletal organization [52, 53].
Systematic comparison of number and position of secondary structural elements shows that the PAS block/signature is a simplification as the αp region in some PAS/GAF domains consists of multiple helices (PDB: 1y28 and 3mkf are just examples). The central block is also found in Longin domains, but the Longin topology always shows (i) no additional N-terminal α helix and (ii) at least two additional C-terminal α helices (Figure 4). Conservation of such Longin-specific topology may account for ancient evolutionary branching and for molecular adaptation as a building block to the more complex trafficking machinery of multi-compartment cells .
This prompted us to go into a deeper investigation of conservation within the central block. When comparing non-homologous sequences, conserved profiles can be identified by alphabet simplification followed by structure-based multiple alignment and position-specific comparison of amino acid properties. Therefore, ββ-αp-βββ sequence fragments from representative members of Longin and PAS/GAF homologous superfamilies were used for multiple alignment (Figure 5). The αp-β3 subregion shows high divergence and no profile conservation, according to its involvement in binding several different intra- and inter-molecular interactors . In β strands 1, 2, 4 and 5, hydrophobic patches (green) are conserved, but such a feature is widely spread as it mediates core stabilization rather than any specific function.
As shown in Figure 4, homologous superfamilies are found only among domains having the same topology.
Significant similarity among Longin and PAS/GAF domains is still absent, when using as query sequence only the region corresponding to the central block. In order to further investigate the idea of convergent versus divergent evolution of these protein families, we tried to detect any trace of a possible, very ancient ‘PAGALO’ common ancestor for PAS/GAF and Longin domains by using signature scanning and PSI-BLAST.
Proteome-wide scanning by PAS signatures neither retrieve GAF nor Longin domains. Therefore, we used the only part of the central block that is profile-conserved in all domains (β 1, 2, 4 and 5, excluding αp-β3) to set up Position-Specific Score Matrices (PSSM) by PSI-BLAST . When using default algorithm settings for stringency, PAS/GAF queries did not extract Longin domains and vice versa. Instead, the reciprocal retrieval of PAS and GAF confirmed their evolutionary relationship [30, 31]. When using lower stringency settings, additional sequences outside of the Longin and PAS/GAF domains were extracted, confirming the non-specific nature of the hydrophobic patch conservation. Last but not least, no significant trace of homology among Longin and PAS/GAF domains could be found using specific methods such as HMMER or HHPred. Indeed, regions of local similarity could be found, but similarity was very weak, query coverage was poor and retrieved hits belonged to many different protein domain types, likely representing basic structural motifs (not shown).
In conclusion, different topologies and fold variation, absence of detectable sequence homology or of any significant common profile, still leave open the convergent evolution hypothesis. It is common knowledge that: (i) fold similarity is neither proof for evolutionary relationship (convergent evolution is not scarce) nor for similar functions  and (ii) even within a highly conserved fold, recognition i.e. specific interactions strongly depend on variation of linear and/or conformational epitopes (a well-known example: antigenic variation in influenza virus surface proteins).
Comparative analysis of binding epitopes
In order to shed light on shared and unique determinants possibly accounting for the special adaptation of Longin domains to the trafficking machinery, and considering that adaptation to trafficking cannot be explained by the fold shape per se, we started comparative analysis of binding epitopes, focusing on the large Longin domain group and on the only two GAF superfamilies involved in trafficking, i.e. Profilins and Roadblocks. Sharing of features with non-trafficking PAS/GAF domains could thus be used as a ‘negative control’ of relevance for trafficking.
When comparing domains with similar architecture and different (permutated) topologies, progressive numbering of secondary structure elements can be misleading, because differently numbered elements can occupy similar 3D positions within the shared architecture. For this reason, we used standard numbering only for the β flat (common to all permutations) and named α helices other than αp based on their 3D position. Therefore, additional α helices close to either β2 or β3 (the strands that occupy ‘outer’ positions in the flat) are hereafter named as αx or αy, respectively (Figure 6).
Three surface regions can be identified by structure superposition and clustering and suggested to represent important binding epitopes for Longin domains and for the two GAF superfamilies involved in trafficking: Profilin  and Roadblock  domains. The A region mediates intra- and inter-molecular interactions (often resulting in dimerization) and it is mainly formed by αp, β3 and the N-terminal part of β4; in some instances, β1 N-terminal, β5 C-terminal and αy N-terminal are also involved. The B region consists of the surface formed by αp and the β1-β2 hairpin; in the case of Longin domains, also the αx-αy helices are part of this region. It is involved in binding to small GTPases. The C region is the most variable one, because of the differing number of additional α helices. In different Longin homologous superfamilies, this binding region may involve either an individual αx or αy helix, or both; instead, in Profilins it is centered in between the two helices.
Borders of the A, B and C regions are partially overlapping. In particular, the αy C-terminal is usually common to the A and C regions. Also the αp helix is a borderline structural element in between the A and B regions, because the patch between αp and β3 is part of the A region and the other patch towards β2 is exploited within the B region.
In sensu strictu Longins , the A region is involved in either intra- (to the SNARE motif) or inter-molecular binding to alternative interactors; this mechanism is crucial to regulate membrane fusion and subcellular localization [56, 57]. In DENNs, the Longin domain interacts with the C-terminal moiety via the A region (Figure 7A). In both Sedlins (Longin topology) and Roadblocks (GAF topology) a peculiar dimerization mode occurs, hereafter named as ‘flat dimerization’: the two αp-β3 regions of the monomers are flanked side by side, obtaining an antiparallel 10-stranded flat by β completion, where the two αp helices lie across the flat in an antiparallel manner (Figure 7B). This interaction is stable and physiologically relevant [52, 58]. Flat dimers share a very similar shape, and protrusion of the αp helix at the β3 side might be crucial to establish contacts between the monomers: when such a protrusion is missing, flat dimerization does not occur (Figure 7C) or, even when dimers are formed (e.g. in Targetins), these are both ‘non-flat’ and less stable . This feature is an example of functional specialization mediated by shaping of the variable (Figure 5) αp-β3 region and highlights the importance of the insertion at the C-ter of αp. Another example is the Profilin:Actin interaction, that is accomplished by a substantial insertion in the αp-β3 loop, which occupies the corresponding site at the A region (Figure 7A).
The B region is of special importance due to its interaction with small GTPases and the evolutionary adaptations thereof. Small GTPases are important regulators of vesicular trafficking and cytoskeleton dynamics, and their expansion is thought to trace eukaryotic evolution . The SRx-GTPase structure was presented as a prototype for the interaction of small GTPases with Longin domains . Further structural and functional evidence is accumulating on Longin and also GAF domains binding to a variety of small GTPases [28, 43-46, 59-66].
Recent crystallographic analyses reveal that Arf1 GTPase is involved in ‘unlocking’ the trunks of the AP and COPI complexes for cargo binding, hence orchestrating vesicular transport [67, 68]. Integral components of the trunks are the σ and μ N-terminal Adaptins (Longin domains). Within the reported trunk conformations, the Longin domains are not in direct contact with Arf1, which mediates unlocking by binding to the outside of the α-solenoid subunits. However, trunk opening (not complete in the unlocked AP-1 structure) swings out the μ C-terminal domain and thus the μ N-terminal Longin domain gets solvent accessible. Upon complete unlocking, an Arf1/adaptin contact via the B region might be well possible.
A synopsis of GAF:GTPase and Longin:GTPase interactions is reported in Table 1; superposition and clustering of available structures show that this interaction is mediated mainly by the B region (Figure 8A). A recently discovered example from the Roadblocks concerns the eukaryotic target of rapamycin (TOR) pathway . Evidence that both Roadblock and GTPase modules are fused in the same polypeptide in both yeast Gtr1p/Gtr2p and human homologues RagA/RagB and RagC/RagD [65, 70] stress the intimate association between the two domains. The recent structure of a DENN  identifies a whole Longin homologous superfamily interacting with small GTPases. The extraordinary conservation of the GAF:GTPase partnership is manifested by evidence from bacteria, where the founding member of the Ras-family GTPases MglA interacts with its effector MglB, a prokaryotic Roadblock [66, 71]. MglB mediates the MglA-GTP to MglA-GDP transition, thus preventing accumulation of MglA at the lagging cell pole and inhibiting reversal frequency . Therefore, even in prokaryotes the GAF:GTPase interaction seems to be somehow involved in ‘trafficking’ events. These lines of evidence were recently used to suggest (i) Roadblocks as likely ancestors (by circular permutation) for Longin domains and (ii) possible general ‘rules’ (with exceptions) for dimeric binding to small GTPases . However, multiple, alternative evolutionary hypotheses (which are further discussed in the concluding remarks section) can be envisaged. In fact, Roadblocks and Profilins only represent two out of a number of domain superfamilies with PAS/GAF topology and up to seven domain superfamilies with Longin topology are known (Figure 4). In addition, no Longin-GAF homology is found, even in the central ββ-αp-βββ block common to the permutated topologies (Figure 5). Furthermore, while highlighting conservation of some functional and structural features, variation in Longin and GAF binding to small GTPases has to be stressed as well ( and this work). Therefore, multiple mechanisms rather than any rules with exceptions are likely to mediate binding to small GTPases, according to the structural diversity of small GTPases, of their interactors and of Longin and GAF domains as well. So far, the DENN:Rab35 complex is the only example from the Longin superfamily known to exploit a different surface for the interaction. However, in this case the actual buried surface at the Longin:GTPase interface is rather small, while in the same complex the GTPase is primarily interacting with the C-terminal lobe of the DENN module. A sterically restricted, conserved glycine at the β1-β2 hairpin (Figure 8B) was proposed to be crucial to Longin:GTPase binding . However, the presence of a longer β1-β2 hairpin in DENNs suggests the structural feature responsible for the GTPase discrimination might be the steric occupancy of β1-β2 hairpin as a whole. When comparing the same structures by superposition upon the GTPases, the binding surface for either Longin or GAF domains was found to be variable, according to evidence that Longin or GAF domains interacting with GTPases were assigned to have either GEF or GTPase Activating Protein (GAP) activity [24, 61, 66, 73] and within the complexes, the GTPase can be either nucleotide-bound or nucleotide-free. Current evidence is unable to determine whether such different functions are derived by either divergence or convergence. However, they clearly provide an example of evolutionary specialization of a domain epitope within a conserved interaction context and, despite difference in details, binding to GTPases via the B region in eukaryotic endomembrane targeting can be traced back to ancient bacterial transport systems, and its general architecture and function could be similar for a variety of GTPase mediated targeting events.
Table 1. Known examples of interaction between a Longin or GAF domain and a small GTPase. When known, activity (towards the GTPase) is reported. Numbers in the last column correspond to reference in the list of the manuscript
This region is probably the most relevant to the special adaptation of Longin domains to the highly complex trafficking machinery of eukaryotes, because it is the one showing the highest surface variation among Longin and PAS/GAF domains. Even among Longin domains, involvement of the x and y α helices in this region is found to vary. For instance, in the Longins and Sedlins superfamilies either αx or αy are involved, whereas in the Adaptin superfamily both helices are involved in binding. Concerning GAF domains, in Profilins the binding site is centered in between the two helices (Figure 9). In Roadblocks, one interesting example is represented by the dynein-light chain/Km-23 complex , the only GAF domain showing no αy at the C side. It is, however, able to reconstitute a full PAS/GAF domain when in complex with the dynein intermediate chain. Variation of the broad C side depends on varying number of additional α helices. The diversity requires further data to shed light into the structural code of this region.
Binding regions co-operativity and αp dynamics
Co-operative binding between the A and C regions concerns both Longin and GAF domains. In Longin domains, co-operativity occurs in: (i) Sec22b, between the SNARE motif (A region) and the Sec23:Sec24 complex (C region) , (ii) VAMP7, where the recognition of the A region by the SNARE motif is essential for Varp binding at the C region  and (iii) Synbindin, between Ypt1 (B region) and Bet3p:Trs31p (C region) . In GAF domains, co-operative binding has been reported for Profilins, between Actin (A region) and a poly-Pro motif (C region) . We suggest here that co-operativity is likely to occur also in the other GAF superfamily involved in trafficking, i.e. in Roadblocks. Superposition of the monomeric form with the flat heterodimer of p14 shows that the αp helix shift at the A region is accompanied by structural change and re-orientation of one, out of the two, helices at the C side (Figure 10A). This in turn exposes a broader hydrophobic area and a deep groove (Figure 10B) suitable for motif binding and resembling poly-Pro binding by Profilins. Co-operativity between the A and C regions might be shared by other Roadblocks showing a similar shape of the C region (Figure 10C). The hypothesis is corroborated by functional data as for the MP1 protein, which is known to bind to a MEK poly-Pro region  and needs to be heterodimeric in order to anchor the MEK-ERK pathway to late endosomes .
The αp helix is conserved in its amphipathic nature and its flexible anchoring to the underlying curved β strands, thus creating hydrophobic patches along both sides of the helix . This feature allows for helix displacement and explains the different orientations of the helix with respect to the β strands. However, so far, evidence of αp helix flexibility has been reported only for the Longin topology [24, 40]. By structural superposition of structures of the same domains in different complexes, we can show that, upon flat dimerization, the αp helix may undergo significant conformational change in the Roadblock family as well (Figure 10D). Therefore, αp flexibility is likely important for the molecular adaptation to the trafficking machinery. Given that αp is the borderline element in between the A and B regions, variations in its orientation might reciprocally modulate interactions at these sides.
This systematic assessment of the distribution of the Longin domains confirms the conservation of all superfamilies across the diversity of eukaryotes, according to the suggested role as a building block of the eukaryotic subcellular trafficking machinery . We provide now an unambiguous classification which we hope will allow for clearer discourse on comparative function and evolution of Longin domains.
The comparative fold, topology, profile and epitope analysis of Longin and PAS/GAF domains allowed us to highlight conserved structural features, as well as to pinpoint differences and special adaptations. The α-β-α sandwich is one of the most common folds in nature and it is shown by a large diversity of protein domains , including Longin and PAS/GAF superfamilies. Therefore, sharing of such a widely conserved architecture is not indicative for any special relationship. Indeed, despite some apparent similarities , detailed analyses demonstrate that the real fold is not ‘common’ to Longin and PAS/GAF domains such as Profilin or Roadblocks. In fact, fold recognition analysis results in a collection of domain groups with similar shapes, each type however being characterized by significant local variation. Evident diversity concerns topological differences, depending on the optional presence and orientation of helical elements considered ‘additional’ with respect to an apparently conserved ‘central block’. However, relevant local variation was also found in the block itself, and no meaningful common profile is shared for this region. In the central block, especially αp-β3 (a crucial part of the A region) shows high sequence variability, and variation is increased by the varying number and length of α helices found to correspond, in PAS/GAF domains, to the αp region. At the sequence level, no significant homology is found among Longin and GAF domains, even when using as a query sequence the only invariant structural regions of the central block (β1-β2-β4-β5). Indeed, Profilins and Roadblocks are GAF superfamilies, and GAF domains in turn are related to PAS domains; the PAS/GAF families include a wide number of functionally and structurally heterogeneous domains [30, 31].
In the absence of detectable sequence homology and common profiles, a direct evolutionary link between Longin and PAS/GAF domains cannot be established. Several possible evolutionary scenarios can be envisaged. A deeply ancient ‘PAGALO’ ancestor could exist and the current distribution of the proteins could be explained by divergent evolution, with the exclusively eukaryotic Longin topology being derived from the more broadly distributed PAS/GAF domains such as the Roadblocks by circular permutation. A variation is the converse, that the loss of the Longin topology occurred along evolution from proto-prokaryotes to current Archaea and Bacteria. As an alternative to circular permutation, novel topologies may have arisen by recombination among contiguous, different α-β-α sandwich domains from the same protein. Last but not least, convergent evolution of the protein domains to a common architecture cannot be ruled out. Fold shaping by pressure for molecular adaptation to well-established interaction schemes is common and might have occurred, hence explaining the common features shown by domains that indeed are relatively small and consist of a few elements. It has been reported that ancient shuffling and duplication events resulted in creating multiple subclasses of PAS domains, with unique sequence features . This might have occurred also with (i) ancient GAF domains along their 2-billion year history  and (ii) hypothetical ‘PAGALO’ ancestors.
Regardless of their evolutionary derivation, perhaps the most intriguing question concerns the impressive expansion and specialization of the Longin topology superfamilies in the context of the eukaryotic subcellular trafficking machinery. What was, in other words, the molecular advantage of the Longin topology in performing trafficking functions in eukaryotes? Although we cannot provide definitive evidence, we can propose an explanation based on functional advantage. The A and B regions are shared characteristics between the two topologies, whereas the most noticeable difference lies in binding of proteins to the C region. This region is responsible for binding poly-Pro motifs in both Profilin and Roadblock domains (Figure 10). Poly-Pro motifs are present in both prokaryotes and eukaryotes and represent an ancient ‘interaction code’ based essentially on linear motifs, as their 3D fold is constrained by the rigidity of the left-handed proline helix itself . By contrast, the C region is much more heterogeneous in Longin domains, thus being able to interact with huge protein complexes (and usually with more than one protein at the same time), i.e. with conformational, rather than linear, epitopes. Therefore, the Longin domain topology allows for a shift from a linear interaction code to a more complex 3D code, which would be expected as a much more complex trafficking system evolved during the prokaryotic–eukaryotic transition. The high conservation of the peculiar topology of the Longin domains together with the wider plasticity of their C region might account for the extraordinary adaptation to (and conservation in) the eukaryotic trafficking machinery.
Blast and PSI-BLAST  searches were performed using query sequences from the following databases: JGI for Ostreococcus tauri (v2.0 filtered models), Chlamydomonas reinhardtii (Phytozome v8.0), Phytophthora ramorum (v1.1 final predicted proteins), Thalassiosira pseudonana (v3.0, chromosome, filtered model proteins), Emiliania huxleyi (v1 all proteins), Monosiga brevicollis (MonBr1, all proteins); The Galdieria sulphuraria Genome Project for G. sulphuraria; Baylor College of Medicine for Acanthamoeba castellanii. NCBI non-redundant protein database was used for all the remaining organisms (models and environmental samples were excluded from the dataset to scan). BLAST using servers hosted by genome websites or NCBI were carried out using default settings. Concerning PSI-BLAST algorithm parameters, default settings were used except for matrix (BLOSUM 45), expect threshold (e = 100) and exclusion threshold (t = 0.1). HMMER  and HHpred  were used online at the hmmer.jamelia.org server and the toolkit.tuebingen.mpg.de/hhpred server, respectively.
For the Mon1 family, a dataset consisting of 26 sequences from 19 different taxa was aligned using muscle v3.6 and then manually adjusted as required. The resulting alignment, consisting of 376 positions, was trimmed, incorporating only unambiguously aligned regions of homology into the final mask, using MacClade v4.08 to display the alignment graphically. Three algorithms were used for phylogenetic inference: MrBayes v3.1.2, PhyML v2.4.4 and RAxML v2.2.3. Model testing by ProtTest v1.3  determined that a WAG+ gamma distribution of rates among sites (alpha = 3.46) best fit our dataset, and was implemented into MrBayes and PhyML, and the PROTCATWAG model was implemented in RAxML. Posterior probabilities were determined after 1 000 000 Markov Chain Monte Carlo generations, with the burn in value determined graphically by removing all trees before the plateau. Bootstrap values were determined for both PhyML and RAxML using 100 pseudoreplicates for each method. The resulting phylogenetic trees were visualized using FigTree v1.2.
Fold recognition searches were done at the Dali server . Structure modelling was performed by threading using Phyre2 Server . Fold superposition and clustering were performed at the TopMatch server . Binding surface analysis was performed using the PISA webserver . Default settings were used for methods above. PyMol  was used for molecular viewing, surface analysis and figure editing. Surface residues are coloured based on the Kyte–Doolitle scale, i.e. Ile, Leu, Val, Phe, Cys: red; Ala, Gly, Met, Trp, Tyr: orange; Thr, Pro, Ser: yellow; His, Asn, Asp, Gln, Glu, Lys, Arg: green.
We thank Andrea Zanardo and Fabio Cesarato for data collection, Marco Vecchiato and Luca Tassoni for help in PSI-BLAST and profile analysis. We also thank Sandro Vivona for critical reading and Marta Lipinska for useful discussion. N. D. F. was supported by a fellowship from CPDA 077345/07 project to F. F. K. W. and I. S. are supported by the collaborative research centre 638 (SBF638) of the German research foundation (DFG). A. S. is supported by a studentship from the Natural Science & Engineering Research Council (of Canada). J. B. D. is supported by an NSERC Discovery Grant and an AITF New Investigator Award. J. B. D. is also the CRC in Evolutionary Cell Biology. F. F. is supported by funding of the Italian Ministry for University and Research (MIUR).