Edinburgh Research Explorer Functional evolution of the Colony Stimulating Factor 1 Receptor (CSF1R) and its ligands in 2 birds

Macrophage colony-stimulating factor (CSF1 or M-CSF) and interleukin 34 (IL34) are secreted cytokinesthatcontrolmacrophagesurvivalanddifferentiation.BothactthroughtheCSF1recep-tor (CSF1R), a type III transmembrane receptor tyrosine kinase. The functions of CSF1R and both ligands are conserved in birds. We have analyzed protein-coding sequence divergence among avian species. The intracellular tyrosine kinase domain of CSF1R was highly conserved in bird species as in mammals but the extracellular domain of avian CSF1R was more divergent in birds with multiple positively selected amino acids. Based upon crystal structures of the mammalian CSF1/IL34 receptor-ligand interfaces and structure-based alignments, we identified amino acids involved in avian receptor-ligand interactions. The contact amino acids in both CSF1 and CSF1R diverged among avian species. Ligand-binding domain swaps between chicken and zebra finch CSF1 confirmed the function of variants that confer species specificity on the interaction of CSF1 with CSF1R. Based upon genomic sequence analysis, we identified prevalent amino acid changes in the extracellular domain of CSF1R even within the chicken species that distinguished commercial broilers and layers and tropically adapted breeds. The rapid evolution in the extracellular domain of avian CSF1R suggests that at least in birds this ligand-receptor interaction is subjected to pathogen selection. We discuss this finding in the context of expression of CSF1R in antigen-sampling and antigen-presenting cells.

phenotypic consequences differ depending on genetic background and species but include osteopetrosis and postnatal growth retardation. 4,5 Conversely, administration of CSF1 to mice, rats, or pigs produces a monocytosis and expansion of tissue macrophage populations. [6][7][8] In humans, gain-of-function coding mutations in CSF1R have been associated with an autosomal-dominant human neurodegenerative disease, 9,10 while two recent studies describe recessive loss-offunction CSF1R mutations 11,12 that share skeletal abnormalities with the mouse and rat Csf1r knockouts. Variants at the CSF1 locus are strongly associated with Paget's disease. 13 Differences in phenotype of Csf1r −/− mice compared to a spontaneous Csf1 mutation (Csf1 op/op ) mice suggested the existence of a second CSF1R ligand, which was subsequently identified and named interleukin 34 (IL34). 14 Mutation of the Il34 locus in mice revealed a specific function in development of subsets of tissue macrophages in skin and brain, where the gene is most highly expressed. 15 The two CSF1R ligands appear functionally equivalent. IL34 expressed under the control of the CSF1 promoter rescues the Csf1 op / op phenotype. 16 The CSF1R system of two ligands binding to one receptor was shown to be conserved throughout vertebrates, including birds 17 and fish. 18 An intronic enhancer that controls CSF1R expression is also conserved from reptiles to mammals. 19 Recombinant CSF1 administered to chicks produced a massive expansion of blood and tissue macrophage populations. 20 Solution of the tertiary structures of mouse and human CSF1 revealed the characteristic four alpha helices with two beta sheets, a structure shared by a large family of cytokines. The 3D structures of human/mouse IL34 also highlighted four antiparallel alpha helices, but with two shorter beta sheets partially replaced with an additional three alpha helices. Subsequent studies revealed the distinctive structures of the complexes between CSF1, IL34, and the receptor. 21 Most immune proteins are subjected to an "arms-race" between host and pathogen and experience a strong positive selective pressure. 24,25 With some caveats, 26 nonsynonymous (amino acid altering) to synonymous substitution rate ratio ( = dN/dS) provides a measure of natural selection at the protein level, where = 1, > 1, and < 1 indicate neutral evolution, purifying, and positive selection, respectively. 27 The average dN/dS ratio of annotated immune-associated genes is up to four times higher than the genomewide average for protein-coding genes. 24,25 Previous analysis on limited datasets indicated that both CSF1 and CSF1R were subject to positive selection in birds, whereas IL34 was subject mostly to purifying selection. 17 Since the original characterization of the CSF1R system in chicken and zebra finch 17 the Avian Phylogenomic Consortium 28 completed the draft genome sequences for 48 bird species, representing all extant clades and many targeted projects since that time have further expanded the number of partial or complete genomes to >300 and the pool of predicted protein sequences for genes expressed in avian immune cells. Among many applications, these data permitted a re-evaluation of the gene content of avian genomes and global analysis of dN/dS ratios. 29 The expanded number of genomic sequences has added greatly to the diversity of avian predicted CSF1R, CSF1, and IL34 protein sequences. The current study takes advantage of the multispecies genomic dataset to examine the contrasting evolutionary constraints on the CSF1R system in birds and mammals.

Sequence collection and multiple sequence alignment
Avian CSF1, IL34, and CSF1R protein and gene sequences were retrieved from the National Centre for Biotechnology Information (NCBI; http://ncbi.nlm.nih.gov) and completed avian genomes were analyzed by Avian Phylogenetic Consortium. 28 Accession numbers for all protein sequences are provided in Supplementary Table 4.

Phylogenetic analysis
An MSA for avian sequences was created using CLUSTALW and phy- Full annotation of the whole genome sequences and analysis of genetic diversity of these chicken populations will be published elsewhere.

Assay of the biological activity of chicken and zebra finch CSF1 proteins using growth factor dependent cells
We have previously established a bioassay for chicken CSF1 by stably transfecting the interleukin 3 (IL3)-dependent BaF3 cell line with a chCSF1R expression plasmid. 17 The transfected BaF3 cells express chCSF1R on the cell surface 33

Sequence analysis of the CSF1 ligand-receptor system from birds and mammals
From available genomic DNA sequences and entries in NCBI GenBank, we were able to extract 68 CSF1R, 30 IL34, and 36 CSF1 predicted full-length protein sequences orthologous to the functional chicken proteins analyzed previously. 17 The relative paucity of avian CSF1 and IL34 sequences available reflects the difficulties in sequencing in the respective genomic regions, in common with multiple other GCrich regions, in all avian genomes. 29 In many cases, the sequences annotated as CSF1 or IL34 in NCBI as a predicted protein were truncated at the N terminus relative to full-length chicken and zebra finch orthologs. Multiple sequence alignments (MSAs) of each of the avian CSF1, IL34, and CSF1R protein-coding regions are provided in Supplementary Table 1A-C. In mammals, the CSF1 locus encodes multiple isoforms of the protein generated by alternative splicing. 3 The longest cDNA encodes a membrane-bound precursor that is cleaved from the cell surface by TNF-alpha converting enzyme (TACE, ADAM17) 34 to release the minimal bioactive CSF1 protein. In transgenic mice, this longer form of the ligand is required to fully complement a CSF1 mutation and restore postnatal growth. 35 Consistent with previous evidence of the production of longer forms of The short intracellular domain contains a membrane-proximal basic region that is conserved between mammals and birds. The remainder of the intracellular domain is also strongly conserved in birds. Similar membrane proximal basic domains are found in many membraneassociated proteins including G protein-coupled receptors. The intracellular domain may function to promote membrane trafficking from the Golgi 36 or conceivably also produce a reverse signal to the CSF1producing cell. 37 The intervening region between the bioactive peptide and membrane is longer in mammals than in birds. In common with many proteolytic cleavage domains, the obvious conserved feature is repeated proline (P), glutamate (E), serine (S), and threonine (T) amino acids.
At the N terminus, we also noted that there was considerable ambiguity among predicted protein sequences in GenBank regarding the location of the start codon and the length of the leader sequence.
For the purpose of the current analysis, we have aligned the processed peptide containing the 160 amino acids that make up the minimal bioactive 4-helix bundle. 17 In the case of IL34, the predicted avian proteins are all around 180 amino acids, truncated at the C-terminus relative to predicted mammalian IL34 proteins (230-240 amino acids).
In mammals, some of the C-terminal amino acids were found to be engaged in binding to CSF1R 23 but in birds the 180 amino protein contains the biological activity. 17 As noted based upon comparison of chicken and zebra finch, 17 the avian CSF1 sequences all showed conservation of cysteines that provides a strong reference framework for the alignment (Supplementary   Table 1A). These conserved avian residues are predicted to form three intrachain disulfide bonds coincident with the cysteines involved in disulfide bonds in CSF1 of mammals and fish. 17 In all of the avian CSF1 peptides, the cysteine responsible for the interchain disulfide bond in mammalian CSF1 is substituted with glycine (G29 in Supplementary   Table 1A; position 63 in Fig. 1). Nevertheless, the chicken protein forms a dimer through predicted large hydrophobic interfaces. 17 Early studies indicated that the interchain disulfide in human CSF1 was absolutely required for dimerization and biological activity, but this does not appear to be the case. 38 Mutation of this cysteine (C31S, numbered in the mature CSF1 peptide without the leader sequence) did not compromise refolding or biological activity of recombinant human CSF1.
Based upon structural analysis, two amino acids (Q26 and M27) were predicted to make strong contributions to dimer formation. 38 These are conserved in all bird and mammalian CSF1 sequences (Q25/M26 in the active mature chicken sequence shown in Supplementary Table 1A; positions 58/59 in Fig. 1). Indeed, D23, which made strong electrostatic and nonpolar contributions to the dimer interface in the C31S mutant human protein, is also conserved between birds and mammals and in all birds (Supplementary Table 1C). A second shorter segment in CSF1 that contributed to the dimer interface, R66-N73 in human CSF1 (positions 98-107 in Fig. 1), is also conserved between mammals and birds and the core (FKENS) is identical in all bird species. A combined C31S/M27R mutation produced a monomeric CSF1 that acted as a CSF1R antagonist. The absence of cysteine in this location in the avian ligand suggests that the C31S mutation in the mammalian protein is unlikely to be necessary to achieve this outcome. Our earlier analysis of available CSF1 sequences indicated significant divergence among species and evidence of positive selection. 17 This conclusion was confirmed using the larger dataset. 30 Figure 2 shows a neighborjoining phylogenetic tree for the available sequences. This simple analysis reveals that the Galloanseriformes (chicken, turkey, guinea fowl, quail, and goose) clearly form a separate group.
Avian IL34, unlike CSF1, is subject to purifying selection. 17 Indeed, although CSF1 is highly divergent between birds and mammals, the core 145 amino acid chicken IL34 protein, excluding the leader sequence, is around 60% identical to the human protein and can be readily aligned (not shown). Despite this level of conservation, amino acid differences among mammalian species were associated with species-specific biological activity. 39  inactive on the chicken receptor. 17 The arginine substitution is present in all bird sequences. A neighbor joining phylogenetic tree was then generated using the same package.
a much larger assembly of avian species 41 (see the phylogenetic tree image from this study in the graphical abstract, reproduced with permission) and recapitulates analysis based upon the divergence of the conserved intronic enhancer in the CSF1R locus. 19 As in the case of CSF1, the Galloanserae form a divergent group.
The overall sequence identity between the most disparate CSF1R protein sequences (e.g., between chicken and zebra finch), around 75%, is similar to the conservation between the most divergent mammalian sequences (primates and rodents 39

Cross-species specificity of the CSF1 ligand in birds
The divergence also distinguishes chicken, quail, and turkey from duck and goose. The structure-based alignment of predicted contact residues in CSF1 reveals corresponding variation in Site 1 of the ligand, in particular multiple nonconservative substitutions between chicken T57 and E82, whereas Site 2 on CSF1 is conserved across all available avian sequences. The Site 1 interaction between chCSF1 and chCSF1R is predicted to involve a salt bridge between K73 in the ligand and E168 and E170 in the receptor ( Table 2). This interaction is abolished in the zebra finch receptor (Q164, S166); substitutions shared by many bird species (Supplementary Table 1C For the predicted CSF1-CSF1R interaction, the binding Sites 1 and 2 are based upon structure-based alignment of available human and mouse CSF1-CSF1R (D1-D3) and IL34-CSF1R (D1-D3) structures. Contact amino acids in CSF1 and CSF1R derived from the human structures are highlighted in gray, and asterisks indicate amino acids that differ between human and mouse. Where the corresponding amino acids diverge between zebra finch and chicken, they are set in bold.
Four constructs were expressed in HEK293T cells and supernatants containing recombinant CSF1 were tested. The supernatants from HEK293 cells transfected with zfCSF1 expression plasmid were able to promote survival of BaF3 cells expressing the chicken CSF1R to the same extent as supernatants from cells expressing chCSF1 (Fig. 5).
Both of the domain-swapped constructs zf_chCSF1 and ch_zfCSF1 were also active on the chCSF1R reporter cell line (Fig. 5A). We  The chicken and zebra finch CSF1R complexes were modeled based upon the human CSF1-CSF1R (D1-D3) structure as described in Materials and Methods section. Non-conserved amino acids are set in bold. week. In both cases, no cells survived in the absence of added growth factor (panels E and J). As shown in images in panels A-D, chicken bone marrow cells produced a relative confluent lawn of macrophages in response to all of the supernatants. Conversely, only zfCSF1 or zf_chCSF1 (chCSF1 with ZF Site 1) directed macrophage proliferation and differentiation from ZF marrow (panels G and H). Images are representative of three separate experiments.
Site 1 of the chicken ligand (ch_zfCSF1) abolished the activity on zebra finch marrow. This observation confirms that the difference in crossspecies reactivity between chicken and zebra finch CSF1 ligands can be attributed to the variation in receptor binding Site 1 (Table 1).

Polymorphism in the CSF1R, CSF1, and IL34 genes among selected chicken populations
Western commercial chickens have been subject to intensive selection of production traits: rapid growth and meat production or egg laying.
Selection has produced genomic signatures that can be detected as extended regions of homozygosity. 44 In mammals, mutations in CSF1 or CSF1R produce severe postnatal growth retardation suggesting a link between macrophages and the growth hormone/IGF1 axis. 3,5 Indeed, the CSF1R gene on chromosome 13 lies within an interval containing signatures of selection 44  A different selection pressure including heat stress and disease applies to indigenous chicken ecotypes selected for resilience and survival in tropical small holder systems. 45 We predicted that genes such as CSF1 and CSF1R that diverge rapidly between species might also exhibit functional polymorphism within a species occupying many diverse environmental niches. We therefore explored genomic  Table 1 and most vary to some extent between species. Position F125 is also leucine (L) in most other avian species; position N153 is serine (S) in two species of tit, starling, and ruff and position 308 is threonine (T) in 2 manakin species (blue-crowned and golden-crowned) and glycine (G) in cuckoo roller (Supplementary Table 1C Table 3). This amino acid is conserved in bird species but lies outside the binding site for the receptor. In the biologically active portion of CSF1, we identified the N87D variant discussed above at low allele frequency in the majority of populations and a small number of rare potentially deleterious variants at low frequency in specific pop- Table 4). None of the variants altered contact amino acids. One other variant detected in all Ethiopian populations, E99K, is also present in duck and goose, but not in quail or guinea fowl reference sequences.

DISCUSSION
We retained the ability to activate chCSF1R, whereas chCSF1 was inactive on zfCSF1R. Domain swap analysis confirmed that the amino acids K57-N82 within zfCSF1 (Site 1) that interact with domain D2 of CSF1R are both necessary and sufficient to enable activation of zebra finch BM cells. There are six amino acid differences between the two species in this short segment, all involving charged amino acids ( Table 1). As discussed above, we suggest that the binding affinity of chicken CSF1 for chicken CSF1R depends upon charged amino acid interactions.
By contrast, there appear to be no predicted salt-bridge interactions in zebra finch CSF1 binding to its receptor, but two charged amino acid substitutions may permit the formation of salt bridges to the chicken receptor.
The analysis of many more IL34 sequences in birds (Supplementary and syndecan-1. 47 The function of IL34 in birds has not been studied beyond the demonstration that the protein is active on the chicken CSF1R. 17 The most striking feature of our analysis, which clearly distinguishes birds from mammals, is the hypervariability of the CSF1/IL34 binding Site 1 in CSF1R. Why has selection in avian evolution apparently acted upon ligand binding to CSF1R? One major difference between birds and mammals lies in the expression of CSF1R. We developed monoclonal antibodies against CSF1R 33  propria of the intestine control M cell differentiation. 51 In a second contrast with mammals, we found that CSF1R is highly expressed by antigen-presenting dendritic cells, which are a prevalent cell population in the avian liver in addition to their well-recognized prevalence in bursa and spleen. 52 So, we suggest two nonexclusive explanations.
One is that a class of pathogen-associated virulence determinants acts to block binding of CSF1 or IL34 in order to compromise innate immunity or the function of FAE. Such a pathogenicity determinant exists in the form of the immunomodulatory BARF1 viral protein, which binds to human CSF1R. 43 A second nonexclusive explanation is that a pathogen or pathogen-associated molecule binds to CSF1R to enable receptor-mediated internalization. CSF1R is expressed on the cell surface and upon ligand binding promotes endocytosis of the ligand, either CSF1 or IL34. 3 Hence CSF1R could provide a portal for pathogen invasion.
The secondary question is how evolution in CSF1R can occur without compromising the innate immune system. CSF1 and CSF1R knockout mutations in mice and rats 4,5 are macrophage deficient and have severe developmental abnormalities. This is also the case in zebra fish. 12 We have recently confirmed based upon CRISPR-mediated knockout in the germ line that the chicken CSF1R is also absolutely required for posthatch development (Balic A. and DAH, forthcoming). Previous studies of birds in smallholder systems in Ethiopia provided strong evidence for heritable disease resistance and resilience. 45 Comparative analysis of available sequences of western commercial and tropically adapted populations identified prevalent protein sequence variants (Supplementary Tables 2 and 3). Some CSF1R variants distinguished layer and broiler lines consistent with evidence of signatures of selection in broiler lines in this region of chromosome 13 44 and QTL association with growth-related traits. CSF1R is clearly highly polymorphic in chickens and the coding variants distinguish western commercial birds from tropically adapted birds. Two common variants that distinguish commercial broilers and layers, A308T and S409L, also occur within domain 4 but whether they influence CSF1R function is unknown. Common variants detected in commercial birds are relatively rare in Ethiopian and Nigerian populations and one CSF1R variant D91N was prevalent and unique to Ethiopia and Nigeria. Each of the variants affects an amino acid that is conserved to some degree across avian species. Polymorphism is a common feature of innate immune receptors. 54 It remains to be determined whether any of these variants can be associated with disease resistance or production traits and could represent targets for markerassisted selection.
Although the focus of this study has been on the avian CSF1R system, as mentioned in the Introduction, there is emerging interest in CSF1R as a drug target 2 and in functional analysis of loss-of-function and gain-of-function mutations in CSF1R in human patients. [9][10][11][12] The human and mouse equivalents of the BaF3-CSF1R we have used here to assess cross-reaction of avian CSF1 have previously used to assay function of disease-associated human mutant receptors. 9 Our findings in birds suggest that focused mutagenesis of the interaction sites of CSF1 with CSF1R could provide the basis for generation of monomeric antagonists or higher affinity agonists.