Evolutionary divergence of the nuclear pore complex from fungi to metazoans

Abstract Nuclear pore complex (NPC) is the largest multimeric protein assembly of the eukaryotic cell, which mediates the nucleocytoplasmic transport. The constituent proteins of this assembly (nucleoporins) are present in varying copy numbers to give a size from ~ 60 MDa (yeast) to 112 MDa (human) and share common ancestry with other membrane‐associated complexes such as COPI/COPII and thus share the same structural folds. However, the nucleoporins across species exhibit very low percentage sequence similarity and this reflects in their distinct secondary structure and domain organization. We employed thorough sequence and phylogenetic analysis guided from structure‐based alignments of all the nucleoporins from fungi to metazoans to understand the evolution of NPC. Through evolutionary pressure analysis on various nucleoporins, we deduced that these proteins are under differential selection pressure and hence the homologous interacting partners do not complement each other in the in vitro pull‐down assay. The super tree analysis of all nucleoporins taken together illustrates divergent evolution of nucleoporins and notably, the degree of divergence is more apparent in higher order organisms as compared to lower species. Overall, our results support the hypothesis that the protein–protein interactions in such large multimeric assemblies are species specific in nature and hence their structure and function should also be studied in an organism‐specific manner.


Introduction
The nuclear membrane is embedded with a multiprotein structure called the nuclear pore complex (NPC), which aids bidirectional transport of cargos. NPCs act as a selectivity barrier for the transport of cargos across the nuclear envelope and are permeable to small molecules such as ions and small metabolites. 1 The tomographic studies of NPC depict it to be made up of three layers namely, cytoplasmic filaments, the spoke region which is anchored to the nuclear membrane and the nuclear basket towards the nucleoplasm, thus forming an hour-glass shape architecture. 2 These three regions are populated by distinct proteins called the nucleoporins (Nups) and are followed by a number which is their molecular weight. 3,4 The electron tomography-based structures from Xenopus oocytes and later mammalian cells show that the NPC follows an eight-fold rotational symmetry where each of the eight spokes consists of six subcomplexes; hNup88-hNup214-hNup62 complex present on the cytoplasmic side, followed by two scaffold ring complexes annotated as the Y-shaped complex (hNup133-hNup107-hNup85-hNup160-hNup96-hSec13-hSeh1-hNup37-hNup43-hAladin-hELYS) and adaptor ring complex (hNup93-hNup155-hNup35-hNup205/ hNup188). 5,6 The adaptor ring is anchored to the nuclear membrane with the help of transmembrane proteins (hPOM121-hNDC1-hGp210-hTMEM33) and hNup93 of this subcomplex extends to interact with the central channel (hNup62-hNup54-hNup58) forming the pore of the NPC. 7,8 hNup153-hNup50-hTPR complex form the nuclear basket/ring of the mammalian NPC. 9 The total mass of the intact NPC varies from about 60 MDa of yeast to 112 MDa for vertebrates. 10,11 This difference in size is marked by a number of factors such as a distinct number of Nups present in different species, the length of orthologous pairs of Nups, the oligomerization of Nups present in the various rings of the NPC, and post-translational modifications of mammalian Nups. 12,13 Although the composition of NPC is believed to be taxonomically conserved, a number of proteomic-based studies from various species such as Aspergillus nidulans, 14 Schizosaccharomyces pombe, 15 Chaetomium thermophilum, 16 Arabidopsis thaliana, 17 Caenorhabditis elegans, 18 Trypanosoma brucei, 19 and Tetrahymena thermophila 20 report otherwise. Interestingly, all these species have differences in the number of Nups identified (Table S1). Overall, it is evident that there is significant variation in the number of Nups in different species as per the studies published till date.
The structural architecture of the individual nucleoporins is believed to indicate common ancestry with other eukaryotic endomembrane coatomer proteins such as those of COPI and COPII. 21 Recently, Ancestral Coatomer Element I (ACE1) was defined as the common architectural motif across diverse coatomer molecules thus defining both sequence and structural conserved motifs between the vesicle transport and nuclear pore complexes. 22 The fold analysis study of the S. cerevisiae Nups also shows that all the nucleoporins can be classified into seven classes which being α-helical, β-propeller, coiled coils, cadherin fold, RRM (RNA recognition motif ), autoproteolytic fold (hNup98), and the unstructured FG repeat regions. 23 These predicted fold types were assigned to nearly 28 Nups of S. cerevisiae which indicates that all of the Nups originated from a minimum set of precursor proteins by wide-ranging intragenic and intergenic duplications. 23 There has been enormous growth in the availability of tomography structures of the intact NPC from various sources such as Xenopus oocytes, 2 D. discodieum nuclei 24 and human carcinoma cell line, 25 and the recent cryoelectron tomography structure from Xenopus oocytes significantly improved the resolution to~20 Å in the Y-shaped region of the NPC and 50 Å-60 Å in the central channel region. 26 Of the total number of X-ray crystallography structures of Nups deposited in the PDB, there are only 12 non-redundant partial domain structures from vertebrates sourced from H. sapiens, [27][28][29][30][31][32][33][34][35][36][37][38][39][40] M. musculus, 41-43 R. norvegius, [44][45][46] and X. leavis 47 and four protein complex structures available from H. sapiens, 48,49 R. norvegius, 50 and X. leavis. 47 The scarcity in the crystal structures from metazoans origin is complemented by the presences of 11 non-redundant partial domain structures from fungi sourced from S. cerevisiae 48,51-59 and C. thermophilum 6,60 along with 10 protein complex structures available from S. cerevisiae [61][62][63][64][65][66][67] and C. thermophilum 6,60 (Table S2).
Recently, there have been reports whereby using the X-ray crystal structures of either individual Nups or complexes from C. thermophilum, and the cryo-ET maps of human NPC, the complete interaction network within the human NPC has been deciphered. 60 The proteinprotein interaction network of the NPC has been reported for both yeast and human through yeast twohybrid method 68 but owing to its complexity, the picture is still unclear for the human NPC. Although the predicted folds seem to be conserved for all the species, there are obvious differences in the size of many orthologous Nup pairs which would lead to differences in their tertiary structure assemblies. Due to lack of availability of crystal structures of Nups from the mammalian origin, these aspects have not been studied well to date.
In this study, we aimed to identify the pattern of evolution of the nucleoporins from a structural perspective. Through in-depth phylogenetic analysis, we report that the evolution of NPC is divergent in nature. By performing secondary structure and fold composition analysis of the metazoan (H. sapiens) nucleoporins in comparison to fungal species (S. cerevisiae and C. thermophilum), we were able to identify distinct structural features in some of the nucleoporins, which can also be explained in terms of evolution. We report that a majority of Nups of metazoan origin harbor differences at sequence/domain/secondary structure or tertiary structure level. Furthermore, our phylogenetic analysis across various species could identify the differences that certain Nups have domains specific to a group of species. The residue-specific dN/dS analysis work out the codon substitution rates in the group of sequences provided and explain the probabilities of a codon/amino acid being under positive selection pressure(mutability) or purifying selection pressure (evolutionary conserved). This analysis gave us a clear insight into the presence of differential selection pressure on the sequences of Nups to harbor specific domains amongst different species. Our analysis sheds light on the regions of homologous Nups that are under positive selection pressure and which might lead to different interaction networks across species. These dissimilarities of specific nucleoporins from various species also indicate that NPC assembly could function in a species-specific manner and is likely to be linked with the unique structural features of Nups of a particular species.

Results and Discussion
Secondary structure prediction and domain analysis of all nucleoporins S. cerevisiae and C. thermophilum are two wellstudied fungal species with respect to nucleoporins, hence orthologue Nups of these two species were taken as query to identify the metazoan homologs (H. sapiens) through HMM-based searches and validated through reciprocal searches. [Refer to Supplementary Material for details. HMM search results are listed in Tables S3(A-F) and UniProt Ids are listed in Table S4 (nomenclature translation table)]. Percentage sequence identity and similarity comparisons were made through both HMM searches as well as standard global alignments (Needleman-Wunsch algorithm) amongst the homologs of these fungal (S. cerevisiae and C. thermophilum) and metazoan (H. sapiens) species (Table S5). We could classify nucleoporins in two categories as those either showing moderate sequence conservation viz. Nup62, Nup54, Nup98, Nup155, Nup188, Nup205, Nup93, Sec13, TMEM33 and Rae1 (~20% sequence identity and~30% sequence similarity) ,and few Nups (Nup35, Nup42, Nup214, Nup160, Nup58, Nup88, and Nup50) that have very low sequence conservation (less than 20% sequence identity).
Despite low percentage sequence similarity, it may be argued that a structural conservation exists in the homologous nucleoporins owing to common coatomer ancestry. To decipher the similarity or differences, which might exist, we performed secondary structure predictions for all the Nups from three species (H. sapiens, S. cerevisiae, and C. thermophilum) using PSIPRED 69 (Fig. 1). Along with secondary structure, the PFAM domains for each nucleoporin were also analyzed using HMMSCAN (Table S6). Notably, we observed some nucleoporins exhibit significant differences in the secondary structure and domain organization among the three species in spite of conservation at the sequence level (Fig. 2, S1). It is interesting to note that there are differences even between the two fungal species i.e. C. thermophilum and S. cerevisiae which are marked with * in Figure S1.
Comparing the composition of H. sapiens NPC with that of C. thermophilum [Figs. 2(a,c)], a number of differences can be highlighted. Apart from lacking the vertebrate-specific Nups (Nup358, Nup37, Nup43, and Aladin), C. thermophilum does not have a true homolog of Nup50 and Nup153 which are the major components of the nuclear basket/ring. [70][71][72] Instead, it has two additional Nups, Nup56 and Nup152, which have only the Ran-binding domain. While comparing the trans-membrane Nups from these two species, it was observed that the orientation in which Gp210 (C-terminal) 73 and POM152 (N-terminal) anchor the nuclear membrane 74 is distinct.
Interestingly, S. cerevisiae has many more components present in its NPC when compared to both C. thermophilum and H. sapiens [ Figs. 2(a,b)]. First being the presence of paralogs for certain Nups such as hNup98 has three orthologs in budding yeast i.e. Nup145N, Nup116, and Nup100. Similarly, hNup155, hNup35, and hTPR have two orthologs each being Nup157/Nup170, Nup53/Nup59, and MLP1/MLP2 respectively. Apart from these, Nup60 present in the nuclear basket is unique to S. cerevisiae and its orthologs are absent in both H. sapiens and C. thermophilum.
It can be hypothesized that these sequence guided analysis and compositional differences would have implications on the overall assembly of the NPC in different species as well as protein-protein interactions of individual Nups across species. Details and implications of some of these differences apart from the ones mentioned above are discussed in detail in the following sections.

Distinct domain organization is present among species in ELYS and Nup133
A. Three classes of domain organization are predicted across species in ELYS. Embryonic large molecule derived from yolk sac (ELYS) has been identified as a transcription factor and was shown to interact with the outer ring complex of the NPC (Nup107-Nup160). 75 It is reported to be present in C. thermophilum though it is absent in S. cerevisiae and various other fungal species. When we analyzed the domain organization of H. sapiens ELYS, it was found to be composed of two domains namely ELYSbb [β propeller domain (PF16687.4)] and ELYS-a [α helical domain (PF13934.5)] followed by a long unstructured region. In C. thermophilum ELYS, only the ELYS-a (α domain) is present and the protein itself is smaller as compared to that of H. sapiens ELYS (299 amino acid in C. thermophilum vs. 2266 amino acid in human) (Fig. 3).
Through our phylogenetic analysis for ELYS of 77 representative species, we observed that there are many species, which have only the ELYS-bb domain along with those similar in domain organization to H. sapiens and C. thermophilum It is known that the β propeller region of ELYS is important for interaction with Nup160 of the Y-shaped complex as well as with Nup37, which is exclusively present in vertebrates. 76 The shorter ELYS-a containing only the α helical region of S. pombe is known to interact with Nup120 of the Y-shaped complex. The vertebrate ELYS is also known to help in the recruitment of POM121 and NDC1 to the NPC. 77 Hence, the additional β propeller domain in human ELYS, strongly suggest that interactome of Nup107-Nup160 complex is likely to be distinguished than its counterpart in fungal species.
B. Insertion/deletion in Nup133 lead to different domain identification. Nup133 is one of the major components of the Y-shaped complex (outer ring complex) of the NPC. 5 This protein is mostly present in all species ranging from fungi to metazoan and is known to be evolutionarily conserved. The secondary structure fold defined for Nup133 is a seven bladed β propeller domain followed by α helical domain. There are three structures known for these distinct domains, two for the β propeller region (PDB ID: 4Q9T [fungal] and 1XKS (vertebrate)) and one for partial α helical region (PDB ID: 3KFO (fungal)). Although in our sequence analysis we observed the difference in its PFAM domain identification between H. sapiens, S. cerevisiae, and C. thermophilum. For H. sapiens, Nup133 was predicted to have only a single PFAM domain Nucleoporin_C (defined as the non-repetitive C-terminal protein [PF03177]), whereas for S. cerevisiae, it was predicted to have a Nucleoporin_N domain [defined as N-terminal half which forms seven-bladed β propeller structure (PF08801)]. However, for C. thermophilum, both the domains were predicted to be present. This mis identification at the sequence level could be either due to insertion/deletions or sequence variability amounting from mutations. The structure-guided alignment of these three representative species (Fig. S2) also depicts that there are insertions/ deletions at a number of places in the complete sequence which would account to additional secondary structures being formed.
Our phylogenetic analysis of Nup133 in 84 representative species based on structure guided alignments revealed three distinct classes of domain organization. Fungi contain both Nucleoporin_N and Nucleoporin_C PFAM domain with an exception of Saccharomyces species, which were predicted to consist of only Nucleoporin_N PFAM domain as per HMMSCAN. Higher organisms, such as arthropods, nematodes, and plants were majorly predicted to have both the domains but chordates, platyhelminths, amoebozoan, and stramenophiles were predicted to have only Nucleoporin_C domain (Fig. 4).
To access the hypothesis of sequence variability which can be translated back to the structure as well, we conducted a selection pressure analysis on the residues of Nup133 by dividing the complete dataset of 84 representative sequences into three groups based on the domains that were predicted by HMMSCAN. These three groups were analyzed separately to calculate the Bayes Empirical Bayes (BEB) probability of a residue being under purifying (conserved) (dN/dS <1), neutral (dN/dS = 1) or positive (not conserved) (dN/dS > 1) selection pressure (Fig. S3). It was observed that in all the three groups a number of residues show a high probability of being under positive selection pressure. This would limit even as sensitive methods as HMMSCAN to correctly identify the domains.
We compared the structural organization of Nup133 by superimposing the available crystal structure of Nup133 N-terminal domain from H. sapiens 31 (PDB ID: 1XKS) and Vanderwaltozyma polyspora (PDB ID: 4Q9T) and obtained an RMSD of 2.12 Å for the Cα atoms. It can be clearly observed that there are various insertions in both the structures depicting distinct secondary structure features at certain stretches (Fig. S4). We also mapped the positive selection sites onto the crystal structures of the Nup133 N-terminal domain as discussed above to know their exact positions (Fig. S5). This analysis shows that the residues which do not superimpose well are mostly under positive selection pressure. Thus, from all these observations we may hypothesize that these insertion/ deletions would reflect in the interaction network of the Y-shaped complex in a species-specific manner.

Structured region of central channel Nups depict differential selection pressures
Three metazoan Nups (Nup62, Nup54, Nup58) and corresponding Nups in fungi (Nsp1, Nup57, Nup49) are known to form the lining of the central transport channel (CTC) of the NPC and provide the selective permeability barrier for the biomolecules across the nuclear envelope. Based on sequence analysis and secondary structure prediction, we observed that Nup62 is comparatively more conserved Nup among the species (average of 25% sequence identity and 35% sequence similarity between human and yeast). Human Nup54 has an extended structural domain (α/β region) of about 143 amino acids, which spans only 66 amino acids in yeast homologous Nups (Nup57). Interestingly, at the sequence similarity level, structured regions of human Nup58 and yeast Nup49 (determined using PSIPRED) showed low e-values when C. thermophilum was used as a query and H. sapiens as the subject. The HMMER search with C. thermophilum as query could identify the human homolog at the fourth iteration with two regions of similarity (e-values of 0.0005 and 4.4) (Fig. S1). Moreover, in terms of secondary structure, human Nup58-structured region is completely α helical and is flanked by FG repeat regions at both ends whereas yeast Nup49 has a shorter structured α helical region and FG repeats are present only at the N-terminus.
To understand the evolution of the structured regions of CTC proteins, we deployed the selection pressure analysis on these protein sequences. Although the full-length protein and nucleotide sequences were used for the analysis, we focused the results obtained only on the structured regions as predicted by PSIPRED since all the proteins of CTC show low complexity FG repeat regions. To avoid the impact of these low complexity regions, we used structure-guided alignments [representative structure-guided alignments of CTC proteins Nup62 (322-525 region, Fig. S6), Nup54 (190-507 region, Fig. S7), and Nup58 (249-475 region, Fig. S8)]. C. thermophilum was taken as the representative sequence of fungi central channel and H. sapiens as the representative sequence of the metazoan central channel. The results obtained are tabulated in Table I and Figure 5 (also refer to Supplementary Material for details).
Overall, in terms of evolution of the CTC proteins from fungi to metazoan, Nup62 can be called evolutionarily conserved. Nup54 gained an extended α/β region which is also under purifying selection pressure and its α helical region is also conserved between the fungal and metazoan species. However, Nup58 has diverged from its ancestral Nup49 since the α helical region is not as well conserved as the structured regions of other two CTC Nups. The presence of FG repeat region in Nup58 at C-terminus also being under purifying selection indicates a gain of additional FG domain in the metazoan lineage. This could also indicate different spatial orientation or tethering of Nup58 in the central channel of the NPC and hence gain of FG-specific features in the vertebrate Nup58.
Additionally, the phylogenetic spread and domain prediction of Nup58 and its homologs from different phyla depict the presence of different PFAM domains (Nucleoporin_FG2 domain (described as a family of chordate nucleoporins (PF15967) and Nucleoporin_FG domain [represents the family of Nups having FG repeat regions (PF13634)] (Fig. S9). Since the major fraction of amino acids are not conserved in the homologous sequences of Nup58 as indicated through the selection pressure analysis, it might lead to different PFAM domains predictions.
To understand the three-dimensional structural implications of these sequence variations on fungal Based on structure-guided multiple sequence alignment, representative homologs of Nup133 were subjected to phylogenetic analysis (neighbor-joining). In the unrooted tree, branch labels represent the percentage bootstrap values and the branch length is scaled to evolutionary distance. The branches are colored according to the type of domain present in different species, blue represents species predicted to have only Nucleoporin_C domain, green predicted to have only Nucleoporin_N domain and red depicts that both Nucleoporin_N and Nucleoporin_C were predicted for these group of species. The domain organization of the three classes of Nup133 is shown below the tree. The species names are abbreviated for the ease of representation and the detailed information is provided separately in Table S9 (* H. sapiens, ** C. thermophilum, and *** S. cerevisiae). The fungal species are grouped under purple color and the metazoan species under the orange color bars.  -c)]. The template used for generating these models is described in Table S7. Although both of the proteins folded as coiled-coils, H. sapiens Nup58 consist of seven helices [α1-α7, Fig. 6(a)], S. cerevisiae is predicted to have five α-helices [α'1-α'5, Fig. 6b) and C. thermophilum consists of only four helices [α"1-α"4, Fig. 6(c)]. An extended loop was observed between α'3 and α'4 (408-428) in S. cerevisiae and between α"2 and α"3 (343-380) in C. thermophilum which is absent in the predicted Nup58 structure from H. sapiens. Apart from this, the C-terminus of Nup49 from C. thermophilum (450-470) is also predicted to be unstructured, whereas the corresponding region in both S. cerevisiae and H. sapiens consists of α helices. Recently, two crystal structures of the CTC (Nup62•Nup54•Nup58) were described sourced from Xenopus (PDB ID: 5C3L) 47 and from Chaetomium (PDB ID: 5CWS; along with 40 a.a. Nic96 interacting region). 6 We further compared our predicted structures with the available X-ray crystal structures to understand if any conformational modulation exists when Nup58/Nup49 is present alone and when it is present as a trimeric complex of the CTC. Since Xenopus is evolutionarily closer to H. sapiens, we compared the predicted human Nup58 structure with the Nup58 chain of Xenopus trimeric complex structure (PDB ID: 5C3L). The structured region of xNup58 (283-406) and hNup58 (249-374) show 88% sequence identity and 94% sequence similarity. The superimposition of the Nup58 modeled structure onto corresponding Xenopus Nup58 chain of trimeric structure (PDB ID: 5C3L, chain B) showed RMSD of 0.922 Å (for 30 aligned pair of Cα atoms) [ Fig. 6(d)]. We could identify two regions, one being a loop (275-277) and other a helix-turn-helix (322-345) which may play a role in the conformational changes when this protein is present as a trimer in the Nup62 subcomplex [ Fig. 6(d)]. Both these regions show high sequence similarity [ Fig. S10(a)]. The first region has an identical three residues stretch (MSS) in both structures, which would lead to an untwisting of the first helix to attain the open conformation as explained in the Xenopus structure. The second region consists of a helix-loop-helix of 26 residues in hNup58 (320-346) out of which 19 are identical to Xenopus Nup58 crystal structure where this region corresponds to a loop-helixloop conformation (xNup58 [364][365][366][367][368][369][370][371][372][373][374][375]). Based on all these observations, we propose that these regions could contribute to different conformations of hNup58 when present independently as opposed to as a complex with its other interacting partners (hNup62 and hNup54).
On the contrary, while comparing the predicted C. thermophilum Nup49 structure with the crystal structure of Chaetomium's central channel (PDB ID: 5CWS), similar loop regions as observed in Xenopus structure and hNup58 model, were not observed [ Fig. 6(e)]. The loop between the first and second helix of both the predicted ctNup49 structure and ctNup49 crystal structure does not have any conserved residues [ Fig. S10(b)]. It is also noteworthy that the crystal structure has two major regions with unresolved densities (334-365 and 414-423) owing to which the superimposition is also not complete (RMSD of 1.06 Å for 25 aligned pair of Cα atoms). We also compared Nup58 chain of 5C3L and Nup49 chain of 5CWS and observed that they superimpose with an RMSD of 1.09 (for 68 aligned pair of Cα atoms). This suggests that in presence of other interacting partners (Nup62•Nup54 and Nsp1•Nup57, respectively) as well as the stabilizing agents (Fab and nanobody, respectively) Nup58/Nup49 undergoes major conformational alterations. It is evident from crystal structures of R. norvegicus CTC proteins that they exist in different quaternary assemblies when present as an independent protein (PDB ID: 5H1X, 4J3H, 2OSZ) 21,32,73 as compared to when in complex with one of the interacting partners (PDB ID: 3T97, 3T98). 50 For example, Nup54 when present as a single entity attains a homo-tetramer conformation (PDB ID: 4J3H), 45 and when present in complex with Nup62 forms a heterotrimer, with two chains of Nup62 and a single chain of Nup54 (PDB ID: 3T97). 50 Similarly, when Nup54 is present in complex with Nup58 it forms a heterotrimer with two chains of Nup54 and a single chain of Nup58 (PDB ID: 3T98). 50 Thus, validating our tertiary structure prediction of Nup58 α helical region to exhibit a compact conformation when present as an independent entity as compared to an open conformation when present as a complex (as seen in Xenopus crystal structure).
These differences in tertiary structure, as well as the absence of FG repeats on the C-terminus of yeast Nup49 may contribute to the distinct structural and functional role of Nup58 in the metazoan NPC and Nup49 in the fungi NPC, and also is likely to influence the interaction with other nucleoporins as well as the cargo molecules. All these observations taken together indicate the presence of a metazoan-specific Nup58 and fungi-specific Nup49 of the CTC.
To validate that indeed the interaction network of both the CTC proteins i.e. Nup58/Nup49 would be different we devised an in vitro pull-down assay. It has been shown previously that C. thermophilum Nup49 forms a stable complex with Nup57 and Nsp1 6 and similarly their homolog in R. norvegicus forms a Nup58-Nup54-Nup62 complex. 46 We generated a polycistronic plasmid to express chimeric complex ctNup49 with rNup54 and rNup62 [ Fig. 7(aii)]. We observed that in case of chimeric complex ctNup49 eluted as a single protein [ Fig. S11(a)] whereas for rat ternary complex, Nup58 pulled out Nup62 and Nup54 along with it in the given identical conditions of pull-down assay [ Fig. S11(b)]. The presence of His 6 -tagged proteins (ctNup49 and rNup58) were confirmed with Anti-His antibody western blot [ Fig. S11(c)] in pull down assays. Subsequently, the gel filtration elution profile in case of rat ternary complex depicts the presence of stable three protein complex [ Fig. 7 (b(i))]. However, for the Chaetomium chimeric construct, a single protein corresponding to the molecular weight of ctNup49 [ Fig. 7(b(ii))] was observed supporting our hypothesis that Nup49 could not complement the interaction interface of Nup58 for the rat ternary complex formation.

Supertree of nuclear pore complex depicts divergent evolution
To understand the evolution pattern of all the nucleoporins taken together, supertree approach 79 was deployed where individual trees of all nucleoporins i.e. 35 source trees (including the ones unique to metazoans and fungi) were used as input. Compiling all the information available from the 35 input trees which were build based on structure-guided multiple sequence alignments, the nucleoporins may be divided into three different classes. Class one includes all the nucleoporins which are evolutionarily conserved namely Nup62, Nup93, Nup205, Nup155, Nup188, NDC1, TMEM33, Nup107, Nup85, Sec13, Seh1, Nup133, Gle1, and Rae 1. These trees show the distribution of species as per the tree of life. Class two includes nucleoporins which are not evolutionary conserved and the species do not follow the distribution as per the tree of life namely Nup54, Nup58, Nup35, ELYS, Nup88, Nup214, Nup160, Nup98, Nup50, and TPR. The third class includes nucleoporins which are unique to either metazoans or fungi. Gp210, Nup37, Nup43, and Aladin are unique to metazoans; Pom121 is present only in Chordates. Pom152 is only present in fungi but is conserved across all the families of fungi. Nup60 and Pom34 are present only in Ascomycetes.
Considering all these observations of the independent phylogenetic trees, we obtained a supertree depicting divergent evolution of nucleoporins across 613 species (Fig. 8). Interestingly, when we look at the positions of higher organism species such as those belonging to Viridiplantae, Arthropoda, and Chordata (colored green, cyan, and orange, respectively) in the supertree (Fig. 8), they do not form demarcated clades but are intermingled with each other. Such an observation could be either because of under representation of a particular species due to compositional differences, or due to the divergence of the sequences from the parental ones. In both scenarios, it is evident that the higher organisms have  Table S11. As represented in the tree, the lower organisms group together into separate clades but the higher organisms (namely, Chordata, Viridiplantae) show segregated distribution throughout the supertree indicating divergent evolution of nucleoporins among distinct taxa. (* H. sapiens, ** C. thermophilum, and *** S. cerevisiae).
diverged to a great extent from their ancestral forms owing to increasing complexity at the sequence level, due to insertion/deletions/mutations leading to changes in the sequence length, presence/absence of certain domains as well changes in the secondary structure organizations. This analysis is thus indicative of a divergent evolution of nucleoporins. However, the sequence search space is limited to the current sequence data available.
While constructing individual trees of nucleoporins, structure-guided alignments were taken into consideration, suggesting that this supertree also depicts structure-guided sequence evolution of the NPC. Although in various recent studies, comparisons have been made between the crystal structure of fungi (C. thermophilum and S. cerevisiae) and tomographic structure of H. sapiens but this analysis suggests that these comparisons should be made with caution. As observed in the supertree also these three species are present at diverged positions from each other in terms of structure-guided evolution of sequences.

Conclusions
To completely decipher the complex protein-protein interaction network in this multiprotein complex of the cell, it is important to keep in mind the role of evolution on both the sequence and the structure of individual proteins. It has been reported through large-scale interactome studies that two types of evolutionary forces exist on the interfaces of interacting proteins. One is to resist any change on the interacting interfaces to maintain the functional relevance of that interaction for evolutionarily conserved proteins. The other being to accept mutations in a way to develop new interacting interfaces with proteins which might perform the similar function but are not evolutionarily conserved. 80 NPC might be the perfect example to observe both these evolutionary forces acting at the same time and thus leading to a speciesspecific interaction network. To support our view, two recent reports on tomography structure of NPC from Chlamydomonas reinhardtii 81 and S. cerevisiae 82 depict the architectural differences in the arrangement of major subcomplexes as compared to the H. sapiens 26 NPC. Hence, it is important to elucidate the structural information of NPC in species-specific context to fully understand its transport functions. Although there would be some similarities between metazoan and fungal NPC, its structural details are likely to be significantly different from other species to accommodate more complex transport functions to tackle cellular and tissue-specific functional specificity.

Secondary structure prediction
The secondary structure prediction was performed using PSIPRED online server psipred version 3.3 83 for all the nucleoporins from H. sapiens, S. cerevisiae, and C. thermophilum. A comparative chart containing all the predicted secondary structure for these three species was sketched in IBS illustrator. 84

Phylogenetic analysis
Only the representative sequences generated by PROMALS3D 85 were considered to reduce the size of the dataset as well as the computation time for all the nucleoporins, including the ones exclusively present in H. sapiens and S. cerevisiae. The edited multiple sequence alignments were saved in phylip format to run through phylogenetic analysis using Phylip (http://evolution.genetics.washington.edu/phylip.html). The alignments were bootstrapped 500 times as a test of phylogeny. Neighbor-joining method was used to construct unrooted trees from the distance matrices obtained. A consensus tree was generated using the program SumTrees of DendroPy 86 at a 75% majority rule. The advantage of using SumTrees program is it retains the branch length information which might be lost while using other standard consensus programs and results in a phylogram with branch lengths scaled to the evolutionary distances. The phylogenetic trees were visualized using FigTree software version 1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/).

Evolutionary pressure analysis
The evolutionary pressure analysis was performed on nucleoporins of the central channel (namely, Nup62, Nup54, and Nup58) from H. sapiens and their homologous (namely, Nsp1, Nup57, and Nup49) from C. thermophilum. Nucleotide sequences for all these proteins from different species were obtained from the Uniport database search. The nucleotide sequences were then translated to protein sequences and then structure-guided multiple sequence alignment was performed on the amino acid sequences. Using the structure-guided alignment as reference the nucleotide sequences were aligned using DAMBE5. 87 Maximumlikelihood phylogenetic analysis was performed on the aligned sequences using phylip. The final maximumlikelihood tree and nucleotide alignment obtained was used for evolutionary pressure analysis using the "codeml" program of PAML. 88 PAMLX 89 a GUI interface was used to set the parameters for these calculations. The final output obtained for each dataset i.e. Bayes Empirical Bayes probabilities were then plotted using ggplot2 and reshape2 package in Rstudio (R studio Team [2015]. Rstudio: Integrated Development for R. Rstudio, Insc., Boston, MA, URL http:// www.rstudio.com/).

Tertiary structure prediction
Tertiary structure prediction using a threading approach was performed for Nup58 of H. sapiens and its homolog Nup49 of S. cerevisiae and C. thermophilum. pGenThreader 78 was used to search for best template against the sequence of these nucleoporins from the three species. The best hit which had the lowest P-value and maximum query coverage were used to model the sequence onto the structure. Since both Nup58 and Nup49 contain FG repeat regions/ unstructured regions the query was designed to contain only the structured region sequence based on secondary structure predictions from PSIPRED. The templates used for generation of models are described in Table S7. The predicted structures were visualized and analyzed using UCSF software Chimera. 90 Cloning and purification of rat ternary and chaetomium chimeric construct The structured region of central channel complex from Rattus norvegicus was cloned in modified pET28a as described earlier. 46 The construct contains thrombin cleavage site at the N terminus of His 6 tagged Nup58 (239-415) followed by Nup62 (322-525) and Nup54 (332-510). The conserved region of Nup58 in Chaetomium thermophilum Nup49 (246-470) was gene synthesized (Invitrogen) and replaced on the above construct. Both the constructs were subjected to the same protocol for Ni-NTA affinity purification. 46 Briefly, BL21(DE3)-RIL strain of Escherichia coli was transformed with the construct and the culture(2 L) was induced with 0.5 mM IPTG at OD~0.6 followed by incubation of 8 h at 18 C and purified using Ni-NTA agarose bead (Qiagen). The purified protein complex was dialysed against buffer (Tris-HCl pH 8, 250 mM NaCl, 1 mM DTT) and digested with thrombin at 4 C. The digested protein was concentrated using a 3 kDa cutoff concentrator (Merck) and subjected for SEC using superdex 200, 10/300 GL column (GE Healthcare) in SEC buffer (Tris-HCl pH 8, 250 mM NaCl, 0.5 mM EDTA, 1 mM DTT) at 4 C.

Supertree construction
A supertree can be constructed if there are numerous species trees of different genes with at least few overlapping species in all trees. All the phylogenetic trees (35 in total with approximately 80 representative species in each tree) were used as input to construct a supertree by an average consensus method using the program clann. 79 Nearest neighbor approach was utilized in merging the phylogenetic information from all the trees under consideration. The final supertree was visualized as a cladogram using FigTree software version 1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/).