The IDIP framework for assessing protein function and its application to the prion protein

.


I. INTRODUCTION
The prion protein is best known for causing fatal neurodegenerative diseases in a subset of mammalian species (Prusiner, 1982).These diseases, which are now commonly referred to as 'prion diseases', include Creutzfeldt-Jakob disease (CJD) in humans, bovine spongiform encephalopathy (BSE) in cattle, and chronic wasting disease (CWD) in deer and elk.
The normal 'cellular' form of the prion protein (PrP C ) is found in almost all cells in vertebrates.In prion diseases, the prion protein has been shown to acquire alternative conformations (Basler et al., 1986), termed PrP 'scrapie' (PrP Sc ), after the first known prion disease in sheep.Once present, PrP Sc can induce PrP C also to convert to PrP Sc .This selftemplating has become the widely accepted means of prion disease spread (Prusiner, 1998).
The function of PrP C is of interest for several reasons.Knowing more about its main biological function may recommend a rational approach for reducing PrP C levels, a projected therapeutic avenue to arresting its conversion to PrP Sc (Minikel et al., 2020).Moreover, cell death manifests in these diseases as a result of a poorly understood cascade of events that requires cell-surface PrP C (Brandner et al., 1996).Therefore, insights into the function of PrP C may also shed light on the molecular underpinnings of this phenomenon.
In our view, this debate has become disjointed because of a lack of clarity about what one can reasonably expect to find when looking for the function of a given protein.To guide the discussion, it is useful first to clarify three concepts: what is meant by 'the function of a protein', the relationship between 'function' and 'mechanism', and the connection between 'function' and 'role'.

II. THE CONCEPT OF PROTEIN FUNCTION
Surprisingly, the terminology that governs research aimed at elucidating the function (from the Latin fungi, 'perform') of a protein is not well defined.Finding the function of various 'parts' in natural phenomena has been important throughout the history of philosophy and science, particularly to answer why a particular part exists.Functional studies have historically been closely associated with teleology (from the Greek telos, 'end') and Aristotelian 'final causes' (Aristotle, 1930).
The noun 'function' can be described as an "activity or purpose natural to or intended for a person or thing" (Lexico.com, 2020a).In the present context, references to 'purpose' and 'intention' are mostly counter to our understanding of evolution, leaving us with the definition that the function of biological entities ought to represent their 'natural activity'.Furthermore, we can think of goaldirectedness (as distinct from notions of 'intention' or 'agency') in terms of, for example, the maintenance of protein homeostasis in cells (Narayan, Ehsani & Lindquist, 2014) or constant body temperature in endothermic organisms (Allen & Neal, 2020).We can choose to refer to these states as 'goals' of the cell or organism without carrying any 'intentions'.As such, one way to think broadly of a function is to consider "the contribution that a structure or action makes to the realization of a goal state" (Hull, 2015(Hull, , p. 1051)).
It also can be observed that "the function of a feature of an organism is frequently defined as that role it plays which has been responsible for its genetic success and evolution"; in other words, "although the brain weighs down the shoulders, this is not its function, for this is not why entities with brains are successful" (Blackburn, 2016, p. 191).Similarly, one can distinguish "genuine biological functions from accidental utility (such as noses supporting glasses)" (Allen & Neal, 2020).
We can thus say that, for our purposes, we consider the function of a protein to be its main natural activity, responsible for its genetic success in the context of cell or organism survival, and as such we will use the phrases 'main role', 'the function', 'main function' and 'primary function' interchangeably.Once widely established, such a function may also be referred to as the 'canonical' function.

III. FUNCTION, MECHANISM AND ROLE
A further concept needing clarification is the issue of where a protein 'obtains' its function.A traditional answer is that a protein's structure determines its function, although research on 'intrinsically disordered' proteins has increasingly challenged the notion of 'rigid' protein structure (Uversky, 2020).Nevertheless, this structure-function association continues to inform much research in structural biology (AlQuraishi, 2020;Schmid & Hugel, 2020;Xie et al., 2020).However, given that we are aiming for precision, it might be more apt to consider the connection between structure and 'property' (as is usually done in chemistry; Yang et al., 2017) rather than structure and function.The complex and intricate physicochemical nature of the aminoacid sequence of a protein and the resulting structure together 'endow' a protein's different domains with particular properties (or 'features').It is these properties that then enable the protein's function.Because the sequence and predicted chemical properties of a protein of unknown function are now easily accessible, researchers investigating the function of a protein tend to engage in parallel efforts to build mechanistic models both of the internal dynamics of the protein itself (McKay et al., 2015), and also of the network of proteins and other molecules in the cell with which the protein is involved.Whilst these mechanistic models represent a main goal of many pathobiological investigations that might eventually reveal how a protein's function is exerted, this information should not be conflated with its function.
Mechanistic and functional explanations can complement each other in useful ways.For instance, knowing the function of a protein and also a particular cell mechanistic model in which the protein plays a part can greatly enrich the explanation of a target phenomenonwith the protein functional explanation helping to answer why a protein does what it does, and the mechanistic model showing how it might take place (Theurer, 2018).In the protein world, mechanistic insights, in addition to functional understanding, may offer therapeutic angles that would otherwise be obscured.
A third concept in need of clarification is the terminological yet important distinction between 'function' and 'role'.In protein biology, some authors have resorted to using the terms 'role' or 'functional role' (instead of just 'function'); a role being "the function assumed or part played by a person or thing in a particular situation" [emphasis added] ( Lexico.com, 2020b).This cautious approach is reflective of a reality that biologists are well attuned to, whereby the context (or paradigm) often has a critical influence on observations.By referring to 'roles', one can arguably evade the burden to convince others that a proposed natural activity a protein exhibits in a particular situation is its main natural activity.Scientists using this term may take the position that, for example, the function of PrP C is not known or even knowable, and as such may be considered the 'sceptics' or 'agnostics' amongst the protein function seekers.
Related to the term 'role' is the concept of 'causal role', which philosophers of biology (particularly those writing on genetics) distinguish from the 'selected effect' notion of biological function (Neander, 1991;Amundson & Lauder, 1994).This distinction has been especially common in commentary (e.g.Brunet & Doolittle, 2014;Kellis et al., 2014) on the Encyclopedia of DNA Elements (ENCODE) effort which, since 2003, has sought to identify functional elements in the human genome (ENCODE Project Consortium, 2004).As explained recently, the overly broad causal role notion of function "applies to any of the effects which a component has on the system(s) that contain it, irrespective of their impact (or that system's impact) on fitness" (Linquist, Doolittle & Palazzo, 2020, p. 1).As will become clear in this review, the evolutionarily cognisant conception of protein function we advocate is in line with the selection-based notion of function.

IV. FRAMEWORK FOR PROTEIN FUNCTION ASSIGNMENT
Assigning a function to a protein-of-interest (POI) can be a daunting task.Faced with endless theoretical possibilities, it makes sense to look for data that narrow the number of possible functions.Accordingly, the proposed framework here is based on the recognition that certain types of data are particularly well suited to provide such constraints.These are data that can be collected when studying a protein's inheritance, distribution, interactions and phenotypes.The guideposts provided by each of these data types can be augmented through evolutionary considerations.Below we elaborate on the origins and nature of some of the more obvious constraints each of these data types can provide.

The function of the prion protein
(1) Inheritance Novel protein-coding sequences rarely evolve de novo but most often arise through the duplication of genome segments or the retrotransposition of transcribed genes (Escudero et al., 2020;Cosby et al., 2021;Wacholder & Carvunis, 2021).Once formed, the genes that code for proteins arelike all other sequences within the genomesubject to continuous mutagenesis.Consequently, gene sequences coding for proteins which serve no function will decay over time because selective evolutionary pressures that keep them functional do not apply.Minimally, for a proteincoding gene sequence to survive intact over evolutionary timescales, its encoded protein needs to convey a fitness advantage to a species at a rate that exceeds the rate of decay of the gene sequence encoding it.As of 2019, the human genome was understood to comprise 19,116 protein-coding gene sequences (Piovesan et al., 2019).This number ignores the existence of small open reading frames (ORFs) (Chen et al., 2020;Martinez et al., 2020), but even if these were included, the number would seem small given the sheer volume and complexity of tasks the proteins they encode accomplish.Evolution has also limited the average length of protein chains to approximately 375 amino acids in humans (again ignoring the small ORFs) (Brocchieri & Karlin, 2005).Other organisms exhibit mostly similar average/median protein lengths.There is only so much functionality that can be embedded in a chemical structure of this size.Consequently, the natural activities of proteins are limited by the amino acid sequences they inherited from ancestral genes (Steiner & Sazanov, 2020).
The challenges posed by the small number of proteincoding genes and the relatively small average protein lengths have been met mainly through the evolution of a vast repertoire of transcriptional intricacies (e.g.splicing) and posttranslational modifications (PTMs), as well as 'teamwork' amongst proteins that come together to form macromolecular protein complexes or cooperate in looser arrangements.In particular, PTMs tend to impart proteins with slightly altered properties, which are either adaptations to complex dynamic environments or widen the conditions under which they can be useful (see Section IV.5).
Natural activities of proteins evolve when changes to their coding sequence accumulate over time (Konaté et al., 2019).One way to assess functional conservation of related genes is to assess the functional divergence of orthologs and paralogs, i.e. genes that diverged by speciation or duplication, respectively.This has been done extensively in human-tomouse ortholog comparisons.Available data revealed a divergence of 10-20% of ortholog pairs having acquired functional differences when these were assessed on the basis of gene expression similarity, consistency of alternative splicing patterns and genotype-phenotype data (Gharib & Robinson-Rechavi, 2011).Although this percentage indicates a surprisingly high divergence when viewed from the perspective of whether functional data generated in mice are predictive for human biology, they still establish that 80-90% of orthologs have retained functional similarities.A recent analysis of paralogs in yeast suggests that about two-thirds had functionally diverged and one-third retained functional redundancy (Kuzmin et al., 2020).Irrespective of the type of homologywhich often cannot be determined with certainty (Jensen, 2001;Altenhoff, Glover & Dessimoz, 2019) whenever an altered or new function is acquired, the parts of the protein central to it have to be maintained in subsequent generations.Even small changes to a coding sequence can have dramatic effects on protein function, for example if a catalytic centre is abrogated.Nonetheless, homologous protein-coding genes that share upwards of 50% sequence identity exhibit dissimilar functions in fewer than 6% of cases (Sangar et al., 2007).Even if the sequence identity drops to between 40 and 25%, homologous proteins may retain functional similarity (Pearson, 2013), emphasising that an approach that considers protein ancestry has merit in assigning protein functions.This approach does not have to align in its directedness with the evolution of organisms with simpler body plans into those with more complex body plans; useful inferences regarding the function of a POI of a single-celled species often can be drawn from functional data available for a highly conserved vertebrate ortholog.
(2) Distribution The existence of all natural proteins begins with their ribosomal translation and ends with their eventual degradation.During this time, a protein might be in transit or reside for extended periods in a given location.Interestingly, systematic analyses have revealed that a majority of cellular proteins are mostly found in only one or two main subcellular locations (Thul et al., 2017).When we refer to a protein's distribution, we capture not only the locations at which it can be found but also its relative amounts in these locations (Bolognesi & Lehner, 2018).If the gene-founding event that gave rise to a particular protein is not a recent event but goes back hundreds of millions of years, evolution is expected increasingly to favour a gene expression profile that is mostly governed by the primary function of a protein.Organisms have evolved several mechanisms to ensure the presence of proteins in locations where they are most useful.These include the epigenetic regulation of gene expression, intricate regulation of transcription and translation, and ways of controlling the transport and residence of proteins inside and outside of cells (Wong et al., 2020).Several conclusions can be deduced from these basic observations.First, a protein cannot have a role or function in a location where it is not encountered.Second, the characteristics of a protein's distribution not only limit its possible roles but can also be informative of its function.And third, proteins that have a highly similar distribution are more likely to contribute to similar cellular programmes or may even form a functional complex.In summary, over evolutionary time, the distribution of a protein increasingly serves as a 'fingerprint' that can be informative of its function.(3) Interactions Most proteins associate with a subset of other molecules and are relatively inert to interactions with others, a remarkable achievement given the molecular crowding of biological systems, estimated at 2-4 million proteins per cubic micron in bacteria, yeast and mammalian cells (Milo, 2013).This has been accomplished through an exquisite co-evolution of molecular folds and surfaces.Thus, the number of distinct roles that can be attributed to a protein is expected loosely to reflect the number of distinct molecules or molecular complexes it associates with and the type of activity it contributes to these interactions.For example, a kinase may influence one to many distinct phosphorylation substrate(s), dependent on its substrate recognition properties, but it may also affect the biology of molecular complexes it is associated with independent of its kinase activity.For proteins that have no apparent enzymatic activity, including the prion protein, their main role is assumed to be conveyed by association with other molecules.There are myriad possible ways in which such a protein may exert a natural activity.Critically, molecules that associate with each other have been found to contribute to related natural activities.This 'partner-in-crime' logic has been used in countless studies that have assigned functions to proteins.
(4) Phenotypes For many proteins, evolution has led to safeguards being in place should their expression be abnormally altered as a consequence of either mutagenesis or other internal or external factors.This protective adaptation, which mostly relies on built-in redundancies (including feedforward and feedback loops), translates into a reality whereby only approximately 5% of protein-coding sequences are 'essential' in a given paradigm, a term that is operationally defined by determining if the deficiency of a protein leads to non-viability.A practical consequence is that changes to the levels of a protein often but not alwayslead to distinct phenotypes that can be studied.One example of this approach is the systematic study of genotype-phenotype relationships undertaken in the yeast Saccharomyces cerevisiae, the first eukaryotic species for which a full genome sequence was completed (Goffeau, 1996;Duina, Miller & Keeney, 2014).
If a phenotype caused by the modulation of a protein can be discerned, it can be assumed that its existence is in some way linked to a natural activity of the POI.When discerning phenotypes at the organismal level, the inference of natural activities requires several levels of organisational hierarchy to be bridged (organism !organ !cell !subcellular compartment !molecular).Moreover, this hierarchy might at times need to be contextualised in light of organismal ageing and its effect on the rate of proteins' natural activities (Becker & Rudolph, 2021).These realities turn phenotypic analyses into powerful validation tools when orthogonal data inform a hypothesis of a POI's natural activity but pose a formidable hindrance in the absence of such a hypothesis.
As a case in point, if some orthogonal data suggest that a POI may play a role in controlling hypertension, then the observation of a heart defect in mice deficient for the respective gene can be critical evidence towards validating the proposed role.By contrast, the mere observation of a heart defect in the absence of data that can inform a causal hypothesisalthough a useful data pointwould leave the natural activity of the gene associated with this defect elusive.This is true because there are just too many possible scenarios that could implicate a POI in this phenotype.
One way to narrow down the natural activity of a POI is to look for more than one phenotype that can be linked to the genetic manipulation of its protein-coding gene.If encountered, this phenomenon would reflect a reality of the respective gene contributing to more than one phenotypic trait, a scenario referred to as 'pleiotropy'.In particular, for phenotypic traits that cannot immediately be reconciled, assuming only one natural activity may provide the strong constraint that could advance our understanding of the function of the POI.Naturally, when dealing with any genotype-phenotype relationship, it is critical to be alert to pitfalls associated with the genetic methodology that was employed to generate the model under investigation.Unless isogenic control organisms are available, phenotypes could be caused by inadvertent manipulations of other gene products and therefore need to be considered tentative.
In the absence of orthogonal data that facilitate the interpretation of phenotypic data, an immediate benefit of phenotypic analyses can be the ability to exclude proposed protein functions.For example, if a gene product has been considered essential for mitosis, then its deficiency should be lethal in a dividing cell paradigm.If no lethality is encountered, then the protein could at best contribute to mitosis but can no longer be considered essential for it.Thus, the characteristics of phenotypes can not only point toward the function of a protein but can also exclude proposed functions.
We already alluded to the fact that a subset of proteins can reliably be found in more than one subcellular location.But what is the evidence that a protein may cause distinct phenotypes by having acquired more than one function?We will revisit this pertinent question in Section V but note here that it is more common to find that proteins that contribute to more than one phenotype do so by having harnessed subsets of the same molecular programme for more than one purpose.
Consequently, if more than one phenotype exists that can be reliably linked to the abnormal expression of a POI, rather than assuming that multiple functions are involved, pleiotropy is a more likely cause, and attempts should be made to reconcile these under one model.In summary, phenotypes do not merely indicate possible functions but can be useful for the exclusion of proposed functions.This aspect of interpreting phenotypic data is particularly powerful when a proposed function is poorly equipped to explain more than one validated phenotype.

The function of the prion protein (5) Functional adaptation and adjustment
There can be no doubt that transcriptional intricacies, including the alternative use of start codons, or alternative splicing, as well as PTMs, can profoundly alter the properties of a given protein, in particular as they relate to its distribution, interactions and phenotypes.As evidenced by a large body of literature, even small PTMs, such as the attachment of a single phosphate, can have pronounced consequences (Hunter, 2012).
Transcriptional variations or PTMs typically lead to two types of changes: (i) those that help a protein adapt its function to a complex and ever-changing environment these types of alterations of a protein's physicochemical properties are not thought to alter the function of a protein but merely allow it to fulfil it better; and (ii) those that widen the range of conditions under which a protein can be useful.We noted in Section IV.1 the paradoxically small number and short lengths of protein-coding gene sequences given the sheer volume and complexity of tasks the proteins they encode accomplish.It is apparent that this challenge has at least in part been met by the ability to adjust expression products through small changes to widen the conditions under which the function of a gene's expression products can be useful.For instance, an ion channel capable of transporting a particular metal ion under one set of conditions may acquire the ability to channel related metal ions under a second set of conditions (Hanemaaijer et al., 2020).
Compared to achieving physicochemical changes by altering the coding sequences of genes, PTMs offer an energetically favourable solution because they can be applied to a large number of substrates, can be dynamically and in acontext-dependent manner conferred to proteins, and are often reversible.In particular, the latter feature lends itself to serving the purpose of regulating a protein's distribution, interactions and phenotypes as it adapts and adjusts to spatiotemporal changes.Consistent with this critical role of PTMs, systematic cross-species comparisons of functional orthologs indicate that the most conservative PTMs are the most likely to regulate a protein's distribution, interactions and phenotypes (Beltrao et al., 2012;Narasumani & Harrison, 2018).The same studies have, however, also uncovered that a majority of PTMs serve no significant biological role and may merely reflect differences in the environments in which they exist.
In the present context, it is important to ask whether the knowledge of transcriptional intricacies or PTMs can help identify the function of a protein, or at least restrict the range of possible functions that can be attributed to a protein, similar to how constraints provided by any of the aforementioned data types can be useful for identifying the function of a protein.Although knowledge of transcriptional variations of gene products can lead to mechanistic predictions regarding how a protein might be affected, this by itself does not usually predict the main function of a protein.Similarly, because most PTMs are not specific to a given protein but can be found on a multitude of proteins, characterising the PTMs associated with a POI usually provides almost no indication of the protein's defining function.
It should be no surprise then that it is difficult to find examples of proteins whose functions were not understood or were misunderstood before their transcriptional variations or PTMs were known.Thus, whereas it is easy to find reports that assign to a subset of proteins, generally referred to as enzymes, functions in the catalysis of a specific reaction that involves PTMs on substrate proteins, instances of a specific PTM defining the function of a carrier protein are difficult to encounter.
Altogether, the value of knowing a protein's transcriptional variations or PTMs is rather limited for identifying its main function.This knowledge can, however, be rewarding for making sense of data on the distribution, interactions and phenotypes of a POI.Such insights may also reveal the mechanistic nuances by which a protein has adapted to a complex dynamic environment and has widened the range of conditions under which its function can be useful.

(6) Evolutionary considerations
Any proposed function for a protein must be consistent with evolution.In particular, the relative timing of evolutionary events and the species affected by them offer constraints that can be useful for validating a proposed protein function.
Because evolution pervades all of biology, this type of accounting for evolutionary insights can be applied to data generated by each of the four methods for determining protein function.
First, when considering the ancestry of a protein, it can be useful to determine if the proposed function can be reconciled with data regarding the function of the most closely related molecules (inheritance in the context of evolution).
Next, because certain functions only evolved in some branches of life but not in others, another useful test is to evaluate whether the gene coding for the POI is represented in the branches of life that exhibit the proposed function (distribution in the context of evolution).For example, if a protein is proposed to play a critical role in the ancient programme of mitosis, it would be expected that orthologs of this protein exist in most species.Finding that such a protein only evolved in relatively recent evolutionary time would not by itself preclude a function in mitosis but should be interpreted as a challenge to make sense of this counterintuitive result.
Similarly, if a protein is proposed to form interactions with other molecules in order to exert its proposed function, then a close look at when and where in evolution its interactors emerged, relative to the POI, can be informative (interactions in the context of evolution).
Finally, if phenotypic traits are associated with the POI, it can be helpful to explore whether the evolution of the molecular programmes thought to underlie these traits is consistent with the known evolutionary emergence and species distribution of the POI (phenotypes in the context of evolution).
Biological Reviews (2021) 000-000 © 2021 The Authors.Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.
Hence, each of the four aforementioned data categories for narrowing down the function of a POI has an evolutionary dimension that can aid in the validation of functional hypotheses.

V. MOONLIGHTING, FUNCTIONAL PRIONS AND FUNCTIONAL COORDINATION
Before we discuss how the information these constraints provide can be harnessed for determining the function of a protein, it is critical to introduce the concepts of the 'multifunctionality' of proteins, 'functional prions' and 'task coordination'.
When encountered, multifunctionality in proteins has been referred to as 'moonlighting'analogous to the colloquial use of this term applied to individuals holding a second job next to their regular employment.According to its original definition, this term should be restricted to scenarios that are not based on multiple RNA splice variants, gene fusions or pleiotropic effects (Jeffery, 1999).For example, lactate dehydrogenase, a major glycolytic enzyme that catalyses the interconversion of pyruvate and lactate, has been shown to be identical to a protein referred to as epsilon-crystallin, which represents the main constituent of the eye lens in ducks (Wistow, Mulders & de Jong, 1987).Intriguingly, the crystallin lens proteins in several other species have since been shown to be based on similar enzymes that moonlight.To date, more than 200 additional moonlighting proteins have been validated by biochemical and biophysical evidence (see the MoonProt database at http://www.moonlightingproteins.org)(Chen et al., 2018).Often moonlighting manifests in proteins that allocate specialised properties to the service of distinct functions.These properties may, for example, be reliant on different protein-protein interaction domains within the same polypeptide or may involve catalytic centres or post-translational features.Thus, whereas with crystallins the features that contribute to the enzymatic versus lens functions cannot easily be separated, for the majority of moonlighting proteins the features that promote dual functionality are distinct and may have evolved at distinct rates.A well-known example of this second type of moonlighting proteins is the glycolytic enzyme phosphoglucose isomerase, which carries a small catalytic centre that can interconvert fructose and glucose but also carries specialised proteinprotein interaction domains whose docking to specific receptors facilitates neurite outgrowth (Sun et al., 1999).The capacity for this type of multifunctionality is capped by the aforementioned average size of proteins, which also places an upper boundary on the number of features a protein can harbour.
Of relevance in the context of prion protein biology is another type of multifunctionality of a gene product, namely the ability of certain polypeptides to acquire two or more fundamentally different conformers (Eisenberg & Jucker, 2012;Prusiner, 2012).A related, yet distinct phenomenon is the existence of supramolecular protein-based phase transition that has emerged as a principle of cellular organisation and is a hallmark of several neurodegenerative disease proteins (Mathieu, Pappu & Taylor, 2020).Whereas for the prion protein and other neurodegenerative disease proteins the disease-associated conformers have no known function, for an ever-growing number of other proteins distinct functions can be assigned to their naturally occurring 'prions' (Halfmann et al., 2012;Dixson & Azad, 2020).This phenomenon was initially reported in yeasts (Wickner, 1994;Patino et al., 1996) and filamentous fungi (Coustou et al., 1997) but has since also been observed in an increasing number of proteins of higher organisms.Functions fulfilled by these natural prions cover a broad spectrum and may serve to increase the fitness of their host organisms under certain stress conditions, determine the mating type of filamentous fungi on the basis of heterokaryon incompatibility (Coustou et al., 1997), or consolidate memory in a wide range of species, including sea slugs (Miniaci et al., 2008;Si et al., 2010) and fruit flies (Keleman et al., 2007;Hervas et al., 2020).A similar type of multifunctionality can also be observed in what are called 'metamorphic' proteins that can adopt distinct folds leading to different functions (Dishman et al., 2021).In contrast to prions whose transition to misfolded conformers is typically irreversible, the acquisition of alternative folds by metamorphic proteins is reversible (Dishman & Volkman, 2018).
Much more widespread and of general applicability to a vast majority of proteins seems to be another phenomenon, namely that proteins can utilise several of their features to serve the same overall function in more than one way, a faculty we refer to as 'functional coordination'.This type of coordination exists as a consequence of evolution having equipped proteins with exquisite functional adaptation.Cyclin-dependent kinase 1 (CDK1), a small regulatory protein of the cell cycle, functionally coordinates when it executes its function through intricate interactions with several proteins (Hayward, Alfonso-Pérez & Gruneberg, 2019).More specifically, at the right time during S phase entry, CDK1 molecules are activated through the binding of specific cyclins.These interactions confer a target specificity to the protein kinase that is skewed towards substrate proteins whose phosphorylation promotes mitosis.Eventually, the CDK1/cyclin complex is inhibited through direct interaction with specific CDK inhibitor proteins (CKIs).
Because functional coordination appears to be a very common ability of proteins, whenever a proposed function of a protein is contested, it should be considered whether natural activities that seemingly indicate distinct functions cannot be reconciled as facets of a functionally coordinating protein that serve the same overall function.
(1) Communicating protein function Considerable advances have been made in understanding CDK1 function.But how would an ideal description of the function of CDK1, or any POI for that matter, capture all aspects of its functional complexity?There are various ways The function of the prion protein to think about 'complexity', one of which is that the "complexity of an object composed of definable parts is defined as the size of the minimum description of the object" (Hinegardner & Engelberg, 1983, p. 7).
This begs the question: what level of detail is sufficient when assigning and communicating a function of CDK1, or any POI?There should be no reason to treat proteins in this regard differently from other units to which functions can be assigned.For more familiar objects, we routinely omit details when referring to functions.For example, we may describe the function of a clock as 'keeping time', without elaborating on its range of uses or the functions of its component parts, although we are quite aware that these are not identical for all clocks.Analogously, when describing the function of a protein, the primary concern should not be the level to which our understanding of its function has been refined, but rather that the function we communicate is not wrong.For example, referring to CDK1 as a dehydrogenase would be wrong but saying that it is a protein kinase is acceptable, even though this crude description omits all aspects of its function that relate to its role in cell cycle regulation.These aspects could be added to achieve a refined granularity in the functional description of this kinase, thereby helping to tell it apart from other kinases.
A second theme that transcends the language used to communicate the function of objects is the push towards brevity.Unfortunately, it is to be expected that the terminology we commonly use to label the function of familiar things may not lend itself to short descriptions of the functions of certain proteins.When we refer to a 'clock' we can rely on this term being understood to represent an instrument that fulfils the function of telling the time.By contrast, certain proteins may require more than a few words to convey their unfamiliar functions accurately.In the world of proteins, as our understanding improves, new terms are continuously introduced to facilitate discussion about them and their functions.For example, the fact that CDK1 harbours a specified enzymatic activity under the widely understood term of 'protein kinase'as opposed to the description of 'a protein that can transfer phosphate groups under hydrolysis of ATP to the side-chain of certain amino acids'greatly facilitates our ability to communicate its function.

VI. THE INTERSECTION OF INFERENCES: THE IDIP FRAMEWORK
Remarkably, the widespread discord over the function of individual proteins contrasts with broad agreement regarding the most useful methods for identifying the function of a protein.When investigators search for the function of a protein, they most often (i) investigate the function of its closest evolutionary relatives (inheritance); (ii) characterise its expression profiles (distribution); (iii) identify other molecules it binds to (interactions); and (iv) elucidate phenotypes that manifest when its expression is modulated or its gene sequence is mutated (phenotypes).
Disagreements seem to arise because there is little consensus regarding what to do with information gathered with these approaches, which we refer to by the acronym 'IDIP'.For example, consider that a POI associates with two other proteins belonging to distinct protein complexes, which themselves are linked to separate biological processes.Which of these interactions would be more informative for the goal of identifying the protein's predominant function?Similarly, if the knockout of a POI gives rise to several distinct phenotypes that are not easily reconciled as sharing a common functional explanation, how would one weigh the relative importance of these phenotypes for informing conclusions about the POI's function?
Our argument is that this conundrum can be overcome by a systematic approach.When considering the merits of any of the individual IDIP approaches for determining protein function, it should be apparent that none provides sufficient information for deductive reasoning, i.e. the type of reasoning that leaves no chance for a conclusion to be false so long as the premises and logic applied are sensible.Instead, each of these methods merely allows inferences to be made.Depending on the nature of the available data, these inferences can be strong but are not by themselves conclusive so long as only one of the approaches has been applied.For example, if a POI is repeatedly observed to co-purify only with one other protein, it might trigger an inference of it being a functional partner of this other protein.Yet, it would not allow for a determination of the function of the POI to be made because: (i) the interaction could still have been artificially generated during the in vitro preparation of the biological material that preceded the co-purification step (a not-uncommon phenomenon when cellular compartments are disrupted by the addition of detergents); (ii) an interaction between the POI and an unknown binding partner might exist in vivo but this interaction is not captured because it is disrupted during the sample-preparation step; and (iii) it would merely inform about a physical interaction but provide no information about the functional significance of this interaction.
This list of caveats is not exhaustive but makes the point that additional information will be needed to arrive at the function of a POI.Rather than undertaking further interaction analyses, researchers facing this situation could resort to orthogonal methods.For example, if the binding information between the POI and its selective interactor could be paired with expression profiling data, the investigators might find that the POI and the co-purifying protein exist in distinct subcellular compartments, which would strongly indicate that their co-purification was a post-lysis artefact.Alternatively, these proteins might strongly overlap in their distribution in one cell type or tissue but not in any other, including locations in which the POI exhibits its highest level of expression.This scenario would not by itself be conclusive but could suggest a shared role only in a specific cell or tissue context.Critically, it would also suggest that the main function of the POI may not relate to this particular molecular interaction.Finally, the subcellular distribution of these proteins Biological Reviews (2021) 000-000 © 2021 The Authors.Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.
and their tissue distribution during development could be found to be highly overlapping, which would strengthen the conclusion that their interaction is needed for the POI to exert its main function.Naturally, even if this third scenario was encountered, it would not clarify the significance of the molecular interaction between the POI and its binding partner.Instead, the significance may be inferred from insights into the function of a close relative of the POI or from a phenotype observed when a suitable model is made deficient of the POI.
It should be apparent that similar caveats can limit inferences made on the basis of data gathered with any of the IDIP approaches.Moreover, information that the IDIP approaches provide is not just useful for deriving these types of inferences.Rather, whenever IDIP data are in conflict with a proposed function of a POI, this should raise a red flag.For example, if a POI's proposed function fits with all available data but conflicts with functional data for proteins related to the POI, it should trigger efforts to document that the POI has diverged sufficiently from its protein relatives to have acquired a new function.Plausible explanations of this nature would be critical for a credible functional claim because, almost invariably, the function of proteins is clearest at the intersection of inferences made from each of the four IDIP approaches.In our view, the main reason that the functions of many proteins are still debated is linked to an arguably inconsistent practice of deriving conclusions regarding a protein's function under exclusion of evidence from at least one of the IDIP branches.Because this is a pivotal point to the present review, we will further explore its validity by introducing an analogy between a person and a society.

VII. PERSON-SOCIETY ANALOGY
In this analogy, the proteins-of-interest are represented by persons-of-interest (conveniently also abbreviated as POIs) and the functions of proteins correlate to professions (or main daily activities).If the goal is to identify the profession of a POI living in a society, we would try to gather information about the types of qualifications this POI acquired when training for the profession (tantamount to the 'inheritance' category).It further would be informative to learn where the POI spends most of their time when not asleep or pursuing other daily activities ('distribution').A great deal might also be learned from identifying individuals with whom this POI associates ('interactions').Finally, we could try to infer the POI's profession by gathering information about how this person contributes to society and their communitythis might be done by looking at a person's online presence ('phenotypes').Let us assume for a moment that the POI is working as a prion scientist.In that case we may learn that this person studied at some university a discipline within the natural sciences.Additionally, the POI may have trained in the laboratory of another prion scientist (inheritance), might spend most of the time at an institute whose self-declared mandate is to find solutions to neurodegenerative diseases (distribution), may collaborate primarily with neurodegenerative disease researchers and other scientists (interactions), and might have published several articles that are in some form related to prion science (phenotype).With the above information at hand, it would be relatively easy to arrive at the conclusion that the POI is a neurodegenerative disease researcher.In fact, up to this point, this analogy seems trivial.
Its value becomes more apparent when considering other human activities.Suppose that this prion scientist writes a commentary expressing concern over the practice of not judging academic output on its scientific merit but on the basis of the impact factors of the journals in which the work is reported.The perspective may generate a splash of publicity that is more noticeable than other traces this person has left on the web (strong phenotype).Over time, interest in this perspective might lead to interactions with journal editors, research administrators and policy makers (interactions).These observations alone could lead to an inference that this person is also working in these related professions.Even the fact that the POI spends most of the time at an institute that studies neurodegenerative diseases could obviously be reconciled with such a role (distribution).Only by learning about qualifications this person acquired when training for the profession, namely having studied natural sciences and trained in the laboratory of another prion scientist, would it become more likely that the POI is foremost a prion scientist (inheritance), who also has an interest in science policy.This example showcases a major pitfall when conclusions are drawn from functional analyses, namely the tendency to attribute the most striking phenotype caused by a protein to reflect its function.
Another frequent misconception relates to the many ways that proteins influence their environment.Let us assume that the prion scientist commutes to work on a crowded bus, with the first bus stop on the commuter route being close to the prion scientist's home, affording a seat on the bus every time the POI commutes to work.In this scenario, close investigations may reveal that the presence of the POI on the commuter bus causes other passengers taking the same bus at later stops (when the bus is almost full) to increase their standing stamina.If we imagine that this pattern is so consistent that it can be robustly documented, it may lead to the conclusion that one role of the POI must be to generate this response in fellow passengers.In particular, proteins whose functions remain disputed are subject to the type of detailed analyses that may uncover trivial effects that they exert on proteins that surround them.Note that we deliberately chose an example here that requires the least active participation on the part of the POI, namely a steric effect, to emphasise this point.In the current protein function literature, any phenotypelarge or smallthat has been robustly attributed to a protein's existence is typically interpreted to represent a function of that protein.In fact, when no other more robust phenotypes have been reported for a given POI, this fallacy may lead authors to misinterpret the existing robust Biological Reviews (2021) 000-000 © 2021 The Authors.Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.
The function of the prion protein phenotype(s) to represent the main function of the protein.
To document that the robust association is the primary function, orthogonal information provided by the four IDIP methods needs to be integrated and the proposed function should align with evolutionary realities.Most importantly, if results from one of the approaches are in conflict with a proposed function, this should not be dismissed but used to stimulate additional research or to revise conclusions.

VIII. GENE ONTOLOGY (GO) CONCEPTS
Since 1998, the Gene Ontology (GO) consortium has assembled a vast repository of information that aims eventually to catalogue the complexity of cellular biology in an exhaustive manner (Ashburner et al., 2000;The Gene Ontology Consortium, 2019).This information is contained in three continuously updated ontologies for the concepts 'Molecular Function', 'Cellular Component' and 'Biological Process'.Because computational approaches could potentially be less prone to bias, it would seem intuitive for researchers interested in the function of a protein to adopt the GO 'Molecular Function' ascription that has the strongest experimental support to its name.For instance, the current list of PrP C 's GO 'Molecular Functions' is 64 annotations long and includes 'microtubule binding', 'aspartic-type endopeptidase inhibitor activity' and 'NOT type 8 metabotropic glutamate receptor binding'.Several of its 'Molecular Functions' are evidenced by more than one type of data.For example, 'amyloid-beta binding' is supported by four types of evidence (ISS, inferred from sequence or structural similarity; TAS, traceable author statement; IPI, inferred from physical interaction; and IDA, inferred from direct assay) referenced in the GO repository.Because no other 'Molecular Function' is supported by more orthogonal evidence in the GO database, this observation might recommend the latter as the function of PrP C .Regrettably, such a 'mechanical' interpretation is at least at this timenot warranted.For one reason, it is apparent that, although the sampling of data is already formidable (>600000 annotations; http://geneontology.org/ docs/literature/), many reports are not yet included.Thus, the current GO repository lacks certain well-supported evidence, including ties of PrP C to neural cell adhesion molecule 1 (NCAM1) or G protein-coupled receptor 126 (GPR126) (see Sections X.1 and X.2).
The concepts captured by the GO 'Cellular Component' ontology are related to the term 'distribution' we use herein but do not explicitly relate to quantitative information, i.e. the expression level in a particular location, a facet of a protein's distribution we consider useful for making inferences on protein function.The merit of considering relative quantities of a POI in particular locations is apparent when considering that, currently, PrP C is associated with 24 GO 'Cellular Component' entries, including the Golgi apparatus, endoplasmic reticulum, cell surface, membrane raft, cytoplasm, nucleus, inclusion body and extracellular exosome, reflective of a body of literature that has localised PrP C in each of these subcellular structures.Because this ontology does not weigh the relative presence of PrP C , the information it provides masks the widely accepted dominant presence of PrP C within raft domains in the secretory pathway and at the plasma membrane.
The GO concepts captured by the 'Biological Process' ontology relate to "the larger processes, or 'biological programs' accomplished by multiple molecular activities" (see http://geneontology.org/docs/ontology-documentation/).As such, this ontology encompasses in one set of terms links of a POI to other molecules (molecular interactions) that are part of a process/programme and also the observable outcomes (phenotypes) to which they contribute.Currently, according to the GO database, PrP C has been associated with 36 processes or programmes.Most of these are broad, e.g.'cell cycle arrest' or 'learning or memory', whilst others are more specific, including 'negative regulation of interleukin-17 production'.Unless a scholar interested in the function of PrP C peruses the primary reports that led to these annotations, the function's link to them is somewhat obscured.Importantly, the relatively large number of GO 'Biological Process' terms that PrP C is associated with do not identify a common theme.Instead, they seem to reflect the reality of deeply investigated proteins being observed to influence their environment in a variety of ways.Thus, like other large-scale, in-progress, repositories of its kind, the GO repository can provide useful starting points for learning about PrP C 's function.It is, however, less useful for weighing and contextualising information, a step required for anyone intent on 'seeing the forest for the trees'.

IX. THE TAU PROTEIN EXAMPLE
Before discussing the function of PrP C , it is useful to showcase the application of the proposed IDIP approach to the tau protein, a protein whose main function is less controversial.When applied to tau, this approach would reveal that: (i) the microtubule-associated protein tau (MAPT) gene is a member of a small family of microtubule-associated protein (MAP) genes that code for MAPs 1-4, with MAP2 exhibiting the most sequence similarity to tau (inheritance).(ii) Tau is mainly expressed in the brain, where it is abundantly found in axons, i.e. the location where the ability of microtubules to handle the transport of cargo over relatively long distances is most sophisticated.In fact, co-immunofluorescence analyses have repeatedly shown an almost perfect co-localisation of tau and microtubules, although tau has also been reported to be associated with other subcellular spaces, including the endoplasmic reticulum, Golgi apparatus (Liazoghli et al., 2005), mitochondria (Amadoro et al., 2010;Manczak & Reddy, 2012), nucleus (Bukar Maina, Al-Hilaly & Serpell, 2016), exosomes (Saman et al., 2012) and tunnelling nanotubes (Tardivel et al., 2016)  long list of other interactors, including Src family kinases (Lee et al., 1998;Larson et al., 2012), ribonucleoprotein complexes (Gunawardana et al., 2015) and pericentromeric DNA (Sjoberg et al., 2006) (interactions).(iv) The absence of tau is rather inconsequential, although it can generate several minor molecular phenotypes, including destabilisation and changes to the organisation of axonal microtubules in certain types of axons (Harada et al., 1994) (phenotypes).
With this information in hand, it is uncontroversial to recognise that tau's relationship to microtubules represents the theme that runs through data collected with all four methods for determining protein function.It therefore makes sense to infer that the function of tau relates to its association with microtubules, although many details of how its microtubule binding affects axonal transport or other aspects of microtubule biology still need to be resolved.Not surprisingly, this conclusion aligns well with evolutionary data, has framed our understanding of the conversion of tau into neurofibrillary tangles in Alzheimer's disease (Goedert et al., 1988) and related tauopathies (Goedert, Spillantini & Crowther, 2015), and is reflected in a majority of reviews that have been written on the function of the tau protein in health and disease (Goedert et al., 1988;Morris et al., 2011;Rosenmann et al., 2012;Spillantini & Goedert, 2013).Even authors who focus their work on deciphering roles of tau in other cellular locations tend to agree with this central understanding of the function of tau.In the following section, we will show that a similarly strong body of data indicates a particular function of PrP C .

X. THE FUNCTION OF THE PRION PROTEIN
Here we apply our proposed framework for determining PrP C 's function.Critically, decisions regarding the information discussed below were not made arbitrarily.Instead, they reflect how well a piece of information satisfies constraints provided by the IDIP methods for determining protein function.By using the IDIP framework, a majority of observations that relate to possible roles of the prion protein can be shown to be grounded on inferences based only on one type of data, which have gained no support from orthogonal data types.We consider it likely that these data reflect mere influences of PrP C on other proteins and its environment and hence are not central to understanding its function (potentially, a subset of the available data might also simply be incorrect in certain respects).That said, rather than applying a strict methodology, we also captured PrP C interactors or proposed roles to which the PrP C literature has paid particular attention.Although we are confident that we have not missed a major strand of the literature, we may have omitted discussion of many excellent individual reports.We apologise in advance for this shortcoming and ascribe this failure to an attempt to keep the reference list manageable.
(1) PrP inheritance In 2009, the prion gene was shown to have evolved from an ancestral Zrt-, Irt-like protein (ZIP) metal ion transporter (Schmitt-Ulms et al., 2009).The genomic rearrangement that led to the prion founding gene was proposed to have involved the retrotransposition of an ancient messenger RNA (mRNA) of a member of the ZIP zinc transporter family (Ehsani et al., 2011b).These events added the prion gene and its paralogs Doppel (DPL) (Moore et al., 1999) and shadow of prion protein (SPRN, encoding the protein Shadoo) (Premzl et al., 2003) to a pre-existing family of ZIP zinc transporters already known to include 14 paralogs in humans and mice (Ehsani et al., 2011a).Detailed sequence analyses identified a subbranch within this gene family that includes ZIPs 5, 6 and 10 to have retained the highest sequence similarity to the prion protein (Ehsani et al., 2011a).Whereas expression of ZIP5 is mostly restricted to the digestive system, ZIP6 and ZIP10 are abundant in the brain and assemble into a heterodimeric complex (Taylor et al., 2016).The presence of maternally derived ZIP6 and ZIP10 was initially shown to be essential for the mammalian oocyte-to-egg transition (Kong et al., 2014).Signal transducer and activator of transcription 3 (STAT3)-dependent transcriptional induction also causes both proteins to be induced during epithelial-to-mesenchymal transition (EMT), a morphogenetic programme characterised by the loss of epithelial character, a shift from cell-cell to cellsubstrate adhesion and an increase in cell motility (Yamashita et al., 2004;Macara et al., 2014;Miyai et al., 2014;Muraina et al., 2020;Lambert & Weinberg, 2021).
A subsequent in-depth investigation of proteins that interact with the ZIP6-ZIP10 heterodimer revealed NCAM1 to be its most prominent interactor in a cell model commonly used to study EMT (Brethour et al., 2017).When the same cell line was used to probe ZIP6 interactions before and after EMT induction, it was observed that ZIP6 facilitates the integration of NCAM1 into focal adhesion complexes, whose expression increases in cells that have acquired mesenchymal morphology (Brethour et al., 2017).Moreover, ZIP6 was shown to control glycogen synthase kinase 3 (GSK3)mediated phosphorylation of the longest isoform of NCAM1 (NCAM180) in cells undergoing EMT.
Consistent with a functional role of these zinc transporters during EMT, ZIP6 and ZIP10 were reported to (i) promote the expression of STAT3 (Taylor et al., 2016;Nimmanon et al., 2021), (ii) cause the downregulation of cadherin (Taylor et al., 2016), and (iii) be essential for several steps that rely on EMT during vertebrate development (Yamashita et al., 2004;Taylor et al., 2016).Although ZIP6 and ZIP10 can work together to control EMT, the functional overlap between these two zinc transporters is only partial.Accordingly, whereas silencing of ZIP6 (Yamashita et al., 2004) or ZIP10 (Taylor et al., 2016) impacted gastrulation in zebrafish (Danio rerio) embryos, ZIP10 deficiency also prevented the formation of hatching glands in the same model (Muraina et al., 2020) and was reported to cause epidermal abnormalities in mice (Bin et al., 2017).Finally, part of the zinc Biological Reviews (2021) 000-000 © 2021 The Authors.Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.
The function of the prion protein transporter-dependent signalling that occurs during EMT seems to be harnessed for B-cell development, as ZIP10deficient mice were reported to exhibit a STAT3-dependent defect in their B-cell homeostasis (Miyai et al., 2014).
(2) PrP distribution The prion gene was reported to be expressed as early as embryonic day 6.5 in mice (Khalifé et al., 2011).Early expression is particularly strong in the cephalic mesenchyme and the caudal primitive streak (Tremblay et al., 2007).Neural crest cells that migrate from the cephalic mesenchyme give rise to a diverse range of cell types and structures, including neurons and glia, craniofacial cartilage, odontoblasts and Schwann cells.Similarly, primitive streak-derived mesenchymal cells populate the bone marrow with haematopoietic stem cells and contribute to heart development (Acloque et al., 2009).Consistent with this course of events, expression of the prion protein is present during development in the corresponding cranial structures, peripheral nerves, bone marrow and heart (Tremblay et al., 2007) (Fig. 1A), and has additionally been observed to be increased in tumours of mesenchymal subtype (Le Corre et al., 2019).Thus, PrP C expression seems to be prominent in multipotent post-mitotic mesoderm-committed progenitor cells that have undergone some degree of differentiation but are not yet fully lineage committed.Consistent with this characterisation, (i) PrP Cpositive cells in the heart were described as mesodermderived cardiomyogenic progenitor cells that retain the potential to differentiate into cardiac or smooth muscle cells (Hidaka et al., 2010); (ii) PrP C is abundantly expressed in post-mitotic haematopoietic stem cells that retain the potential to give rise to several other lineages, including dendritic cells, B-and T-cells and natural killer cells (Zhang et al., 2006); (iii) expression of PrP C in the brain is particularly pronounced and critical for the development of post-mitotic cells that have been committed to the neuronal lineage within the dentate gyrus of the hippocampus and the subventricular zone of the rostral migratory system (Steele et al., 2006); and (iv) silencing of PrP C in a human embryonic stem cell paradigm altered the balance between lineages of the three germ layers (Lee & Baskakov, 2013).
The appearance of NCAM1 (often referred to as CD56 in stem cell research) has long been known to identify this type of mesoderm-committed progenitor cells that retain the potential to give rise to the cell types and structures listed above (Evseenko et al., 2010).The NCAM1 upregulation that marks these cells is paralleled by the loss of epithelial adhesion through adherens junctions.Together, these morphogenetic features are hallmarks of EMT (Frame & Inman, 2008;Lehembre et al., 2008).Accordingly, NCAM1 levels can be observed to increase in several cell models of EMT (Lehembre et al., 2008;Mehrabian et al., 2015;Životi c et al., 2018).
The prion protein was long considered a housekeeping gene due to its widespread expression in many vertebrate cells and the presence of certain regulatory elements in its promoter often found in housekeeping genes, including a GC-rich region upstream of its transcription start site (Puckett et al., 1991).To date, we know that its expression is surprisingly varied across cell types and subject to complex regulation, with the highest levels of this protein observed in the brain.When NMuMG cells, a mammalian cell model for studying EMT, are exposed to transforming growth factor beta 1 (TGFB1) for 2 days, the cells lose their epithelial morphology, detach and acquire a fibroblastoid appearance.During this process, prion gene (Prnp) transcript levels were shown to increase more than tenfold and PrP C protein levels were more than fivefold upregulated (Mehrabian et al., 2015).To our knowledge, this is the strongest increase in PrP C expression in response to a natural stimulus reported in the literature.Thus, the prion gene shares with the Slc39a6 gene coding for its ZIP6 molecular cousin a profound upregulation in its expression in cells undergoing EMT (Brethour et al., 2017;Ashok et al., 2019).
The functional annotation of the mammalian genome 5 (FANTOM5) single-molecule complementary DNA (cDNA) sequencing data set published by RIKEN and other members of the FANTOM consortium remains one of the largest scale efforts to characterise the expression of human genes in more than 30 tissues (Forrest et al., 2014).Systematic pairwise comparisons of the expression profiles of the prion gene against all other human genes in the FANTOM5 data set revealed NCAM1 to be among the top 50 genes within the human genome whose expression profiles are most similar to the expression profile of the prion gene.NCAM1 ranked in position 42 in a list of more than 16000 genes whose expression profiles were available, computing to the 99.75 percentile.By contrast, the expression profiles of other gene products that are frequently considered functional interactors of PrP C deviated profoundly from its expression signature.For example, stress-induced phosphoprotein 1 (STIP1) ranked in position 3019 (81.4 percentile), the 37-kDa laminin receptor precursor (LRP, aka RPSA) occupied position 7469 in the list (53.96 percentile), and the metabotropic glutamate receptor 5 (GRM5) was listed in position 1932 (88.09 percentile).The expression profile of adhesion protein GPR126 (aka ADGRG6) deviated most from the respective profile for PrP C .More specifically, whereas GPR126 is strongly expressed in the placenta and is only present at very low levels in the brain (with intermediate levels of its gene products observed in the liver, colon and intestine), these tissues exhibit opposite trends in their PRNP transcript levels (Fig. 1B).
(3) PrP interactions A broad range of methods have been applied to identify molecules that interact with PrP C (reviewed in Miranzadeh Mahabadi & Taghibiglou, 2020).Early experiments based on a ligand-blotting methodology uncovered a possible interaction between PrP Sc and glial fibrillar acidic protein (GFAP) (Oesch et al., 1990).A genetic screen of a mouse brain cDNA library revealed that a secreted probe of PrP C fused to  The function of the prion protein alkaline phosphatase binds to amyloid precursor-like protein 1 (APLP1) (Yehiely et al., 1997), and a yeast two-hybrid-based search for PrP C bait interactors using a HeLa cDNA for the expression of prey proteins identified LRP as a candidate PrP C interactor (Rieger et al., 1997).A study that made use of an unconventional 'complementary hydropathy' approach uncovered a 66-kDa candidate interactor (Martins et al., 1997), later revealed to be STIP1 (Zanata et al., 2002).Several reports have suggested an interaction between cytosolic PrP C and tubulin leading to impaired microtubule assembly (Nieznanski et al., 2005;Dong et al., 2008).When PrP C was covalently linked to its neighbours by formaldehyde addition to neuroblastoma (Neuro2a) cells (Schmitt-Ulms et al., 2001) or by time-controlled transcardiac perfusion crosslinking of mouse brains (Schmitt-Ulms et al., 2004), the mass spectrometry-based analysis of pooled PrP C co-immunoprecipitates were dominated by Ncam1.The latter analysis additionally revealed that the environment of PrP C is enriched in proteins that harbour immunoglobulin-like motifs (e.g.Ncam1, contactin, basigin, Ncam2, neurofascin and neuronal membrane glycoprotein M6-a).Like PrP C , several of these proteins use glycosylphosphatidylinositol (GPI) anchors for their membrane attachment [e.g. the 120-kDa isoform of Ncam1, contactin, limbic systemassociated membrane protein (Lsamp), neurotrimin, and opioid-binding protein/cell adhesion molecule (Opcml)], validating their plausible proximity to PrP C within raft domains.
More recent co-immunoprecipitation-mass spectrometry analyses of PrP C -enriched detergent-resistant microdomains from cerebellar granule cells (Farina et al., 2009), myc epitope-tagged PrP C from transgenic mouse brains (Rutishauser et al., 2009), and endogenous PrP C from wildtype mouse brains (our unpublished results) have validated the conclusion that PrP C preferentially interacts with cell adhesion molecules comprising immunoglobulin domains, and have also revealed additional candidate interactors that do not fit this overall pattern.
More recently, a comparison of PrP C interactors in four different cell models provided additional insights into the organisation of the molecular environment of PrP C (Ghodrati et al., 2018).Remarkably, distinct subsets of proteins were observed in immediate proximity to PrP C in the four cell models.The exception to this trend was Ncam1, which emerged as the only robust PrP C interactor observed across all cell models tested (Ghodrati et al., 2018).GO analyses revealed that despite the compositional differences of the PrP C interactomes in the four cell lines, all models shared a strong enrichment of PrP C -proximal proteins that contribute to the organisation of extracellular laminin or fibronectin matrices as well as signalling centred on TGFβ and integrins, consistent with earlier work showing PrP C to influence integrin signalling in 1C11 neuroectodermal cells (Loubet et al., 2012).TGFβ and integrins are well-known sister complexes that control the transition from cell-cell towards cell-substrate attachment that accompanies EMT (recently reviewed by Nolte & Margadant, 2020).
(4) PrP phenotypes PrP deficiency renders its mammalian hosts refractory to infection with prions, a profound phenotypic distinction that underscores PrP's central role in the aetiology of prion diseases (Bueler et al., 1994;Sailer et al., 1994).Apart from this special phenotype that is only seen in prion diseasechallenged hosts, the most robust consequence of PrP deficiency in mammals manifests as a chronic demyelinating neuropathy, initially reported in mice (Bremer et al., 2010) and recently also observed in goats (Skedsmo et al., 2020).We will consider this phenotype in more detail in Section XI.
Both the discovery of the prion protein (Prusiner, 1982) and the first evidence that NCAM1 carries a unique PTM in the form of a linear sialic acid polymer [polysialic acid (PSA or 'polySia'); not to be confused with the single terminal sialic acid modification present on the antennae of some N-linked glycans, including the prion protein (Baskakov, 2020)] occurred in 1982 (Finne, 1982;Hoffman et al., 1982;Rothbard et al., 1982).Since then, dozens of papers have been published in what were separate fields of study reporting on phenotypes in mice that were manipulated to be deficient in PrP C or for PSA-NCAM1 (Table 1).A close scrutiny of these reports reveals a remarkable similarity of pleiotropic phenotypes (described in detail in Mehrabian, Hildebrandt & Schmitt-Ulms, 2016), including evidence for (i) deficits in long-term potentiation that appear to involve reduced α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) receptor activity and hyperexcitation of NMDA receptors, (ii) altered circadian rhythm under continuous darkness, (iii) defective pathfinding of infrapyramidal bundles within the hippocampal formation, (iv) defects in haematopoietic stem (Figure legend continued from previous page.)Fig. 1.Distribution of the cellular prion protein (PrP C ). (A) Prnp promoter-driven bacterial β-galactosidase gene product (LacZ) expression has been shown to label neural crest-and primitive streak-derived structures during embryogenesis that are also positive for polysialylated neural cell adhesion molecule 1 (PSA-NCAM1).(B) Human mRNA expression profiles in 36 distinct types of tissue (Forrest et al., 2014).Data are shown for PRNP and a subset of genes whose gene products are known to interact with PrP C ; these proteins continue to generate interest because they underpin proposed PrP C functions.
Biological Reviews (2021) 000-000 © 2021 The Authors.Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.cell renewal, (v) impaired myelination of peripheral axons, (vi) dentine structure defects, and (vii) insulin resistance.
Zebrafish possess duplicated PrP ortholog genes PrP-1 and PrP-2.The morpholino-mediated acute reduction of PrP-1 in zebrafish embryos was reported to lead to defective embryogenesis (Malaga-Trillo et al., 2009).Subsequent work established that germline null mutants of PrP orthologs in zebrafish exhibit milder phenotypes, possibly as a consequence of compensation (Leighton et al., 2018).At the morphological level, acute PrP-1 deficiency was characterised by arrested gastrulation caused by deficient movement of mesenchymal cells.At the molecular level, this phenotype seemed to reflect perturbed choreography of EMT-related molecules, including E-cadherin, Src family kinases and catenins (Sempou et al., 2016), and resembled the phenotype that manifests in zebrafish embryos subjected to morpholinobased acute reduction of ZIP6 (Yamashita et al., 2004) or ZIP10 (Taylor et al., 2016).Although mammalian prion genes have diverged from their fish ancestorsas evidenced by profoundly different N-terminal sequences, notwithstanding the presence of repeat domainsthe heterologous introduction of the mouse Prnp gene into PrP-1-deficient zebrafish embryos was able partially to rescue the gastrulation arrest phenotype, suggesting that mammalian PrP C shares key functional features with its fish ortholog (Malaga-Trillo et al., 2009).
With the advent of CRISPR-Cas9, the study of PrP Cdeficiency phenotypes has become much more accessible.An investigation of PrP C -deficient NMuMG cells (Mehrabian et al., 2014a), which queried how the presence or absence of PrP C affects the levels of proteins critical for EMT, identified NCAM1 as the protein whose levels were most profoundly increased during EMT (approximately threefold) and were robustly reduced (0.75-fold) when PrP C was deficient (Mehrabian et al., 2015).A follow-on investigation on the effect of PrP C -deficiency on NCAM1 revealed The function of the prion protein that NMuMG cells made PrP C -deficient by CRISPR-Cas9 knockout or stable knockdown were severely compromised in their ability to polysialylate NCAM1 (Mehrabian et al., 2015).This outcome was shown to be caused by impaired transcriptional expression of the polysialyltransferase (St8sia2) responsible for this post-translational modification of NCAM1 during EMT in wild-type NMuMG cells.In summary, the four main methods available for determining protein function all point towards a relationship of PrP C with NCAM1 and its role during EMT: (i) NCAM1 is the key interactor of the closely related ZIP zinc transporters; (ii) the expression profiles of PrP C and NCAM1 are profoundly correlated in neural crest-derived cell lineages during development and in adult human tissues, and they are particularly increased in cells undergoing EMT; (iii) NCAM1 is the only interactor that has been consistently observed in all large-scale brain PrP C interactome analyses and across cell models; and (iv) there is a striking overlap of deficiency phenotypes for Prnp expression or Ncam1 polysialylation that is consistent with the main pleiotropic traits associated with these proteins in mice (Fig. 2A).
(5) PrP evolutionary considerations Does the proposed functional relationship between PrP C and NCAM pass scrutiny on the basis of evolutionary insights?ZIP zinc transporters whose canonical function is to import divalent cations into the cytosol are present in all living organisms, including bacteria (Kambe, Taylor & Fu, 2021).By contrast, the subset of ZIP zinc transporter paralogs that carry PrP-like ectodomains evolved only in the metazoan lineage, at around the same time as ancestral NCAM molecules (Ryan & Grant, 2009;Ehsani et al., 2011b).Consistent with this timing, NCAM and other members of the immunoglobulin superfamily acquired roles in cell adhesion, the movement of cells and other plasticity programmes that became necessary in motile animals with complex body plans.
Contemporary ZIP zinc transporters most closely related to PrP (ZIP6/ZIP10) are upregulated during EMT, binding to NCAM1 and promoting its post-translational phosphorylation during EMT (Brethour et al., 2017).This suggests that this subset of ZIPs may have functionally diverged from basic ZIP transporters lacking PrP-like domains.Thus, their metal import capabilities were co-opted into the control of NCAMdependent events during EMT and related plasticity programmes.Current data suggest that the zinc-import properties of the ZIP6/ZIP10 heterodimer, combined with its serving as a hub for GSK3, provide a means to control the activity of these kinases, thereby affecting the GSK3-dependent phosphorylation of nearby substrates, including NCAM1 (Brethour et al., 2017).Consistent with this model, the activity of GSK3 is inhibited in the presence of free zinc but not when the kinase is exposed to other divalent cations (Ilouz et al., 2002).
Coinciding with the diversification of vertebrates, a subset of sialyltransferases evolved that acquired increased processivity of their sialic acid transfer activity (Harduin-Lepers et al., 2005, 2008).Instead of transferring a single sialic acid onto substrate proteins, these polysialyltransferases can attach long unbranched chains of sialic acid polymers of up to 90 residues or more in length (Galuska et al., 2008).Remarkably, almost all of this enzymatic activity became directed towards two specific N-glycans within immunoglobulin domain 5 (Ig5) in the ectodomain of NCAM1 (Nelson, Bates & Rutishauser, 1995), accounting for the vast majority of protein-bound polySia in vertebrates (Muhlenhoff et al., 2013).It has been proposed that the evolution of polysialyltransferases endowed a fitness advantage by providing a solution to the challenge of disentangling cell-to-cell contacts.With the increasing complexity of the membraneembedded proteome, this challenge would have become more severe over time in the metazoan lineage, particularly following the whole-genome duplications that occurred around the origin of vertebrates (Bruses & Rutishauser, 2001;Rutishauser, 2008).According to one plausible model, this challenge could be met by the ability of polysialyltransferases to confer NCAM with negative charge clouds whose repellent forces could overcome weaker protein interactions of cellular adhesion molecules and receptors operating in trans (Johnson et al., 2005).
On an evolutionary timescale, the prion founder event took place shortly after the emergence of polysialyltransferases.It is tempting to speculate that the evolutionary success of PrP may be linked to the fact that it inherited from its ancestral ZIP transporter the ability to interact with NCAM and was able to fill a niche that had opened up with the appearance of polysialyltransferases, namely the need to control when and where these enzymes are expressed.This outline of a more detailed narrative published previously (Mehrabian et al., 2016) is included here to illustrate that the available evolutionary insights fit and augment the functional model inferred from the IDIP data.The proposed function of PrP would be much less plausible if, (i) the emergence of ZIPs harbouring PrP-like ectodomains did not coincide with metazoan speciation and/or did not precede the prion gene founding event, (ii) PrP or ZIPs carrying PrP-like ectodomains shown to interact with NCAM were present in any branch of life that does not express NCAM, or (iii) the timing of the evolution of the polysialyltransferase ST8SIA2 or its species distribution within the phylogenetic tree did not align so well with evolutionary insights for PrP.While polysialyltransferases do exist in a subset of Gram-negative bacteria (e.g.E. coli K1), on the basis of different structural features (Coutinho et al., 2003;Cantarel et al., 2009) they are thought to represent an independent origin and to relate to arming capsular polysaccharides against host defences through molecular mimicry (Bliss & Silver, 1996;Cress et al., 2014).
In our view, the combined inferences that can be drawn from the IDIP data and their alignment with evolutionary insights converge on the conclusion that the function of PrP C centres on its interaction with NCAM1 and the control of its polysialylation in the context of EMT and related plasticity programmes (Fig. 2B).Complexity arises because the EMT programme is composed of many other molecules, cellular mechanisms and processes that respond dynamically to changes inside and outside cells.In a given cell, different subsets of parts of the programme are operational and are expected to engage PrP C , thereby contributing to the pleiotropic phenotypes linked to this protein.Although this allows PrP to contribute to a relatively broad range of outcomes, it also raises the expectation that the scope of its function aligns with and is limited by the scope of EMT and related plasticity programmes.

XI. OTHER POSSIBLE PrP FUNCTIONS
It may be thought that the vast PrP literature may make it possible to build a similar argument for any of the long list of proposed functions for PrP C by simply choosing reports that corroborate one's view.That this is not the case, we now illustrate by using the example of the interaction of PrP C with GPR126 (Kuffer et al., 2016), an interaction discussed in recent reviews considering the possible function of PrP C (e.g.Wulf, Senatore & Aguzzi, 2017).This interaction has been put forward as an explanation for how PrP C deficiency may give rise to the robust peripheral myelin maintenance phenotype (Bremer et al., 2010).By applying the IDIP conceptual framework, it should be apparent that GPR126 is an unlikely candidate for being a major functional PrP C interactor in most locations, including the brain, due to the dramatically different expression profiles of the two proteins (distribution) (Fig. 1B).Moreover, a long list of other PrP C -deficiency phenotypes observed in mice, zebrafish and cell-based assays cannot be explained on the basis of this interaction.Although a few of these PrP C -deficiency phenotypes may be rejected following detailed investigation in coisogenic models (Nuvolone et al., 2016), it is likely that at least some of these will be supported (phenotypes).A sceptic could offer the view that the missing connections of GPR126 to the biology of PrP C outside of the myelin maintenance phenotype may merely reflect the reality of a poorly understood protein, which more work on GPR126 could clarify.Although this could certainly be true, it would not erase the concern that the distribution of GPR126 does not reflect the distribution of PrP C and that many PrP C -deficiency phenotypes are centred on tissues or cells that are not known to express GPR126.Naturally, there is the possibility that paralogs of GPR126 within the adhesion G protein-coupled receptor family could substitute for GPR126 in the brain and other locations that exhibit high PrP expression.However, the absence of such interactors in the many protein-protein interaction studies centred on PrP C to date makes this scenario increasingly unlikely.However, it is not our view that the interaction between PrP C and GRP126 is irrelevant for understanding the myelin maintenance phenotype observed in Prnp-deficient mice (Bremer et al., 2010) and goats (Skedsmo et al., 2020).In fact, given the existing data on GPR126-PrP C interaction and prior literature connecting GPR126 to myelin maintenance (Monk et al., 2011;Petersen et al., 2015), it is likely that GPR126 will be part of the eventual mechanistic explanation of how PrP C functionally coordinates myelin maintenance.
Could it be that PrP C 's role in myelin maintenance represents a moonlighting function?Although not impossible, this view seems unwarranted at this time.We anticipate that the relationship between PrP C and NCAM1 may prove critical in this context once the full molecular underpinnings of this phenotype are clear.This view is supported by literature that links NCAM1 deficiencies to myelination defects in peripheral nerves (Nait Oumesmar et al., 1995;Massaro, 2002;Fewou et al., 2007;Niezgoda et al., 2017;Werneburg et al., 2017;Szewczyk et al., 2016;Kim et al., 2019).In fact, it is increasingly apparent that the myelination of peripheral nerves relies on axons and Schwann cells using part of the cellular programme that drives EMT.Consistent with this interpretation, TGFB1 and many other molecular participants in the EMT programme, including plasminogen activator inhibitor-1 (PAI-1) (Einheber et al., 1995), which is the protein that shows the highest induced levels during EMT in NMuMG cells (Mehrabian et al., 2015), may have been 'repurposed' by evolution to control the myelination of peripheral nerves (Rogister et al., 1993).In our view there is no need to invoke a scenario whereby PrP C 's contribution to myelin maintenance represents an exception that cannot be reconciled with the critical role that NCAM1 seems certain to play within the context of EMT for understanding the function of PrP C .
Naturally, there is much still to be learned, in particular regarding the exact mechanism through which PrP C contributes to NCAM1 biology in the context of EMT and related plasticity programmes.Before highlighting some of the more pertinent avenues for future research, we should reiterate that mechanism is not function.Moreover, gaps in a model are not sufficient to dismiss its merit.They merely identify where additional work needs to be done.For example, there is currently no in vivo evidence for the effect of PrP C on St8sia2 and Ncam1 polysialylation in mice; while the overlap in pleiotropic PrP C and NCAM1 polysialylation deficiency phenotypes suggests this conclusion, it has not yet been formally demonstrated.
To refute or limit a hypothesis, flaws and inconsistencies need to be brought to the fore.For instance, when surveying PrP C functions discussed in the literature, no other proposed function is compatible with constraints imposed by data from all four IDIP approaches.In fact, the vast majority of alternative proposed PrP C functions only pass constraints provided by one or two IDIP data categories.More specifically, except for a proposed function of PrP C in iron transport (see Section XII), the available data for other proposed PrP C functions do not fit easily with constraints provided by our understanding of PrP being a paralog of ZIP zinc transporters (PrP inheritance).Similarly, the known distributions of proteins proposed to functionally interact with PrP C are often highly divergent from the expression profile of PrP C (PrP distribution).Although evidence for interactions with PrP C is often available for other proteins proposed to functionally interact with PrP C , the respective interaction data are at times unreliable or indirect (PrP interactions).Finally, when attempts have been made to link other proposed PrP C functions to PrP C -deficiency phenotypes, typically only one of the known pleiotropic phenotypes has been addressed (PrP phenotypes).
At the time of writing, the proposed function of PrP C influencing NCAM1 biology in the context of EMTalthough informed by a large array of findingsdoes not rely exclusively on the wisdom of hindsight, which could then be said to have been 'moulded' to fit the data.This PrP C function was first suggested in 2014 (Mehrabian, Ehsani & Schmitt-Ulms, 2014b), and subsequently refined in 2016 (Mehrabian et al., 2016), since when much supporting data has become available.This includes evidence that ZIP6 forms a heterodimeric complex with ZIP10 that highly selectively interacts with NCAM1, controlling its GSK3-dependent phosphorylation during EMT (Brethour et al., 2017).It should be stressed that it was not necessarily expected that ZIP6/ ZIP10 heterodimers and PrP C would share NCAM1 as their main interactor.For example, evolutionary divergence could have generated a situation where these ZIP/PrP protein family paralogs interacted with different partners.Indeed, proteinprotein interactions are rarely conserved amongst homologous proteins following considerable sequence divergence (Lewis et al., 2012).The fact that PrP C and ZIP6/ZIP10 paralogs do share NCAM1 as their main interactor not only strengthens conclusions regarding their evolutionary relatedness but also argues that this interaction is central to understanding their functions.
A similar argument can be made for the FANTOM5 data, which became available after the initial formulation of the PrP C function proposed here.Had these data not revealed PrP C and NCAM1 to exhibit a highly similar expression profile across 36 human tissue types, then the function for PrP C discussed here would have been less convincing.This brief thought experiment highlights that the proposed PrP C function has already withstood independent scrutiny by orthogonal methods.We emphasise that an incorrectly proposed function would be unlikely to continue to pass exclusion filters defined by data from all four IDIP categories.

XII. UNRESOLVED MATTERS
We now briefly consider a few salient features of the biology of PrP C that still remain unresolved.Critically, when considering the nature of existing knowledge gaps for this protein, it is apparent that most of these are not PrP C specific.In fact, the intense scrutiny this protein has been subjected to has revealed deep insights not available for the majority of proteins.For example, several dozen high-resolution data sets have interrogated the structure of PrP C (Riek et al., 1996;Wuthrich & Riek, 2001).There is also no shortage of sophisticated investigations of the influence of metal ions on PrP C folding (Evans & Millhauser, 2017) and characterising the redox activity conferred by a subset of these metals (Meloni et al., 2012;Singh, 2014;Tripathi et al., 2015).Moreover, its proteinprotein interactions have been mapped in several paradigms and in systematic comparisons across cell models.In-depth analyses of PrP-deficient cells and animal models have revealed this protein to contribute to pleiotropic phenotypic traits.
Still missing are insights into the precise molecular and cellular mechanisms through which PrP C influences its immediate environment.To gain such insights, we need to identify how PrP C changes over time and space in relation to molecules it binds to and is surrounded by.Although data are becoming increasingly available that shed light on the complex dynamics of individual macromolecular complexes, the more transient or low-affinity interactions that are expected to govern the biology of PrP C and a majority of other proteins have largely remained inaccessible.Without this information, we will not understand how many proteins, including PrP, exert their functions.Importantly, this type of information is not essential as far as assigning functions is concerned.
Based on current evidence, including our own in-depth interactome analyses, it is apparent that PrP C does not solely interact with NCAM1 but also engages in interactions with other proteins, in particular those that harbour, like NCAM1, immunoglobulin-like domains in their extracellular regions.Moreover, the PrP C -NCAM1 complex is embedded in a membrane region whose precise composition varies among cell types, in response to morphogenetic programming and over time (Mehrabian et al., 2015;Ghodrati et al., 2018).The proteins found in proximity to PrP C in four different cell types seem to be strongly enriched in functional annotations that link them to EMT, either through a known relationship to TGFB1 signalling or through connections to integrin signalling (Ghodrati et al., 2018;Prado et al., 2020).Technologies now exist that can begin to elucidate the dynamics within these specialised membrane domains, including mass spectrometry and novel imaging modalities.This type of work could represent a starting point toward the more ambitious goal of obtaining atomic-resolution molecular-mechanistic models.We anticipate increasing understanding that PrP C exerts its function within the EMT programme in a manner that is context dependent and dynamic, such that its interactions may appear to be stochastic or lacking spatiotemporal coordination.Yet, we also anticipate these interactions to be governed by its affinity for immunoglobulin-like domains and its GPI anchor-guided enrichment in specialised raft domains whose composition is dominated by molecules that control TGFB1 and integrin signalling.
One of the most intriguing remaining questions is how PrP C generates intracellular signals that originate in the raft domains.While there is much evidence regarding PrP Cdependent signals (Schneider et al., 2011), we are missing insights into the significance of these signals and the precise mechanisms that elicit them.For instance, although we know that the presence or absence of PrP C has a striking effect on transcription of the St8sia2 polysialyltransferase responsible for Ncam1 polysialylation, the mechanism by which PrP C acts as a switch in this context is unknown.Once triggered, this switch may not only control the transcription of St8sia2 but may also promote other facets of EMT.Several other transmembrane molecules are found near PrP C that are known to influence intracellular signalling and thus could mediate signals emanating from PrP C (Wang et al., 2013; Biological Reviews (2021) 000-000 © 2021 The Authors.Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.
The function of the prion protein Martin-Lanneree et al., 2017).There are also notable reports of PrP C being able to acquire alternative membrane topologies that would allow it to signal across the plasma membrane (Hegde et al., 1998;Stewart & Harris, 2005), either by making contact with intracellular molecules or by influencing the transport of small molecules or ions through the membrane (Solomon, Biasini & Harris, 2012).This wealth of possibilities contrasts with a dearth of information on when and where alternative signalling modes are active.Of particular interest would be detailed insights into PrP C -dependent signalling events that can protect cells against certain insults, including the neurotoxicity elicited by soluble aggregates of neurodegeneration-causing proteins (Brody & Strittmatter, 2018;Corbett et al., 2020).Recent studies have offered valuable starting points, such as the involvement of p38 mitogenactivated protein kinase (MAPK) synaptotoxic signalling in this context (Fang et al., 2018) and linking the welldocumented ability of PrP C to protect cells against a tumor necrosis factor α (TNFα) insult to its ability to modulate β1 integrin signalling (Ezpeleta et al., 2017).Also needing clarification are PrP C -dependent mechanisms that lead to the autophosphorylation-mediated activation of Src family kinases, a pertinent question given repeated findings that PrP C is linked with the activation of this family of protein tyrosine kinases (Mouillet-Richard et al., 2000;Um et al., 2012;Sempou et al., 2016).
Given the evolutionary relationship of PrP C to ZIPs, as well as its similarity to the GPI-anchored family members Doppel (Dpl) and Shadoo (Sho), it seems plausible that PrP C may compete with or replace these paralogs in certain paradigms.There is elegant evidence that PrP can replace Dpl when the latter is inadvertently expressed in the brain, thereby rescuing an ataxic phenotype caused by Dpl (Moore et al., 1999).Moreover, the accumulation of PrP Sc in mice is paralleled by a profound and selective depletion of Sho, possibly a result of a protease attempting to reduce PrP Sc levels also targeting Sho (Watts et al., 2007).The discovery of the evolutionary origins of the prion founder gene was itself spurred by data that revealed ZIP6 and ZIP10 were interactors of PrP C (Schmitt- Ulms et al., 2009;Watts et al., 2009).As a more recent example, PrP C was reported to act as a ferrireductase partner for ZIP14 (Singh et al., 2015;Tripathi et al., 2015).Although such examples are intriguing and suggest common features between PrP C and its protein family members, the relative importance of these observations, for example for the manifestation of pleiotropic phenotypes that PrP C contributes to, remains elusive.

XIII. CONCLUSIONS
(1) This review was written with the overall goal of providing a systematised framework for protein function assignment to counter a prevalent view that the many influences the cellular prion protein (PrP C ) has been shown to exert in various paradigms somehow compel the conclusion that this protein may have multiple functions that will be understood only when we obtain a full picture of all nuances of its biology.
(2) The view of PrP C as being an "enigmatic actor" (Haigh & Collins, 2016, p. 239) whose "reason for being" is elusive (Aguzzi et al., 2008) arguably arises from a neglect within the field and beyond to clarify the terminology and to set expectations about what it is that defines the main function of a protein.
(3) We advocate determining the function of proteins by integrating information from the most pertinent and orthogonal data available, and illustrate how a narrow focus on inferences provided by a subset of approaches can be misleading.(4) Rather than seeking to add more data points describing subtle effects of PrP C on additional paradigms, the path to understanding its function begins with shedding the 'noise' from the vast literature surrounding this topic by focusing on data generated with four widely accepted methods for determining protein function: the study of a protein's 'inheritance', 'distribution', 'interactions' and 'phenotypes' (IDIP).( 5) The application of this approach leads to the conclusion that PrP C 's function will reflect its intimate linkage to the biology of NCAM1 within the morphogenetic programme of EMT.(6) A key benefit of the proposed framework for arriving at the function of proteins-of-interest is that it leads to testable hypotheses that could focus research on pertinent issues.(7) We hope that this review will inspire a lively and constructive debate with the objective of arriving at widely agreed benchmarks that define when the search for the function of a protein can be considered to have accomplished its goal.

(
Figure legend continues on next page.)Biological Reviews (2021) 000-000 © 2021 The Authors.Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.

Fig. 2 .
Fig. 2. Proposed function of the cellular prion protein (PrP C ). (A) Venn diagram detailing evidence for a primary role of PrP C interacting with NCAM1 from the four IDIP (inheritance, distribution, interactions, phenotypes) methods for studying protein function, and additional evolutionary insights.(B) Cartoon showing how PrP C -dependent signalling could control the polysialylation of NCAM1.(C) Schematic depicting the EMT morphogenetic programme during which PrP C and NCAM1 seem to be most highly expressed, and when the polysialylation of NCAM1 is particularly pronounced.(D) Model of a ZIP6/ZIP10 heterodimer and PrP C controlling distinct post-translational modifications of NCAM1.EMT, epithelial-to-mesenchymal transition; GSK3, glycogen synthase kinase 3; NCAM1, neural cell adhesion molecule 1; PSA-NCAM1, polysialylated NCAM1; ST8SIA2, alpha-2,8-sialyltransferase 8B; ZIP, Zrt-, Irt-like protein.
Biological Reviews (2021) 000-000 © 2021 The Authors.Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.
Biological Reviews (2021) 000-000 © 2021 The Authors.Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.
Biological Reviews (2021) 000-000 © 2021 The Authors.Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.

Table 1 .
Shared expression characteristics of cellular prion protein (PrP C ) and polysialylated neural cell adhesion molecule 1 (PSA-NCAM1) and pleiotropic traits caused by their deficiency Note that deficiency phenotypes are shown in italics and references relate to phenotypes in the same line.
*Biological Reviews (2021) 000-000 © 2021 The Authors.Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.
Biological Reviews (2021) 000-000 © 2021 The Authors.Biological Reviews published by John Wiley & Sons Ltd on behalf of Cambridge Philosophical Society.