The draft genome of the blood pheasant (Ithaginis cruentus): Phylogeny and high‐altitude adaptation

Abstract The blood pheasant (Ithaginis cruentus), the only species in the genus Ithaginis, lives in an extremely inhospitable high‐altitude environment, coping with hypoxia and ultraviolet (UV) radiation. To further investigate the phylogeny of Phasianidae species based on complete genomes and understand the molecular genetic mechanisms of the high‐altitude adaptation of the blood pheasant, we de novo assembled and annotated the complete genome of the blood pheasant. The blood pheasant genome size is 1.04 Gb with scaffold N50 of 10.88 Mb. We identified 109.92 Mb (10.62%) repetitive elements, 279,037 perfect microsatellites, and 17,209 protein‐coding genes. The phylogenetic tree of Phasianidae based on whole genomes revealed three highly supported major clades with the blood pheasant included in the “erectile clade.” Comparative genomics analysis showed that many genes were positively selected in the blood pheasant, which was associated with response to hypoxia and/or UV radiation. More importantly, among these positively selected genes (PSGs) which were related to high‐altitude adaptation, sixteen PSGs had blood pheasant‐specific missense mutations. Our data and analysis lay solid foundation to the study of Phasianidae phylogeny and provided new insights into the potential adaptation mechanisms to the high altitude employed by the blood pheasant.

of the blood pheasant are very limited; the biology and ecology of this species remained largely unexplored; and more genetic information associated with this species is needed for further research.
Geographic barriers, topographic features, typically low temperatures, hypoxia, and strong ultraviolet radiation have influenced species differentiation and high-altitude adaptability of the blood pheasant. The adaptation to high-altitude environment can be attributed to advantageous genetic mutations and selective pressure.
Many studies have been conducted to investigate the molecular genetic mechanisms underlying organisms' high-altitude adaptation in mammals like yak (Bos grunniens) (Qiu et al., 2012) and Tibetan antelope (Pantholops hodgsonii) (Ge et al., 2013). Among the responsible genes proposed by previous studies, the most prominent ones were EPAS1 and EGLN1 which were reported to be the key genes functioning at the upstream of the hypoxia-inducible factor (HIF) pathway (Gorkhali et al., 2016). Convergent evolution was proved to shape the similar genes in the high-altitude adaptation of different animals like EPAS1 which was shared by Tibetans, Tibetan mastiff (Canis lupus familiaris), Tibetan grey wolf (Canis lupus chanco), and Tibetan goat (Capra hircus) (Gorkhali et al., 2016). However, species with different genetic background could obtain unique adaptive mechanisms like Tibetan pig (from Tibet, Gansu, Sichuan, and Yunnan provinces in China) (Ai et al., 2014). However, the genetic mechanism of high-altitude adaptation in birds was poorly investigated (e.g., Tibetan chickens (Wang et al., 2015)). It is possible that the blood pheasant has adapted to high-altitude conditions through different genes or functional pathways.
Besides comparative genomics approach, understanding themechanisms of adaptation to high-altitude requires a robust phylogeny. Rapid radiation and convergent morphological evolution often result in conflicting phylogenetic relationships of species in the Phasianidae, one of four families in the Galliformes (Shen et al., 2014). Although many studies on the phylogenetic research have been conducted in the Phasianidae (Shen et al., 2014;Wang, Kimball, Braun, Liang, & Zhang, 2013), not surprisingly, many unsolved nodes and conflicts remain. The incongruence of phylogenetic relationships requires a reassessment of the phylogeny of the Phasianidae. Most previous phylogenetic analyses were performed based on either mitochondrial (mt) genes (Pereira & Baker, 2006;Shen et al., 2010), a few nuclear genes (Armstrong, Braun, & Kimball, 2001), or a combination of mt and a few nuclear gene sequences (Crowe et al., 2006;Kimball & Braun, 2008). Furthermore, mtDNA only reflect the matrilineal genealogy and not provide information on paternal contributions. With the fast development and F I G U R E 1 Sampling site and photographs of blood pheasant reducing cost of next-generation sequencing, an increasing number of Phasianidae species were sequenced (Cui et al., 2019;Zhou et al., 2018Zhou et al., , 2019Zhou et al., , 2020, which served as a powerful tool to infer phylogenetic relationships in the Phasianidae. So far, the genomic resources of the blood pheasant are very limited for deeper bioinformatic analysis. In this study, we aimed to describe the first blood pheasant genome and to provide pivotal molecular material for the study of the blood pheasant and Phasianidae family. With availability and ease to produce HTS data (whole genome level), we performed phylogenetic comparative genomics analyses in order to increase our knowledge of the blood pheasant's high-altitude adaptation and evolutionary history of Phasianidae species.

| Sampling and sequencing
Genomic DNA of blood pheasant was collected from muscle sample of a wild male in the museum of Sichuan University. DNA was isolated from the muscle using the Qiagen DNeasy Blood and Tissue Kit (Qiagen) following the manufacturer's protocol. Then, we constructed two kinds of libraries on Illumina HiSeq 2000 platform to sequence the genome using the whole genome shotgun approach, including two paired-end sequence libraries with insert sizes of 230 base pairs (bp) and 500 bp, and three mate-paired libraries with insert sizes of 2, 5, and 10 kb. Quality control composed of adaptor removal using Trimmomatic (Lohse et al., 2012), discard of short reads, trim of poor-quality bases, and removal of all identical paired-end reads (Doyle et al., 2014).

| Genome assembly and completeness assessment
The contig and scaffold assemblies of the blood pheasant genome were carried out by the SOAPdenovo2 (Luo et al., 2012). Then, the obtained scaffolds were assembled into super-scaffolds by SSPACE (Boetzer, Henkel, Jansen, Butler, & Pirovano, 2010). Finally, we used short-insert reads from Illumina to correct the remaining intrascaffold gaps by Gapcloser. The completeness of genome assembly was assessed by Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis with the aves_odb9 database (Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015).

| Gene prediction and annotation
Two methodologies, homologous comparison and ab initio prediction, were used to annotate the protein-coding genes (PCGs) in the blood pheasant. For ab initio prediction, we simultaneously employed two tools of AUGUSTUS (Stanke et al., 2006) and GENSCAN (Burge & Karlin, 1997), in which repetitive sequences were masked as "N" based on the HMM algorithm. For homologous annotation, protein sequences including red junglefowl (Gallus gallus), turkey (Meleagris gallopavo), zebra finch (Taeniopygia guttata), peregrine falcon (Falco peregrinus), and human (Homo sapiens) were aligned against the blood pheasant genome using TblastN (Altschul et al., 1997) with an E-value cutoff of 1E-5. The TblastN hits were then conjoined by SOLAR (Ashburner et al., 2000) to obtain the best matches. GeneWise (Birney, Clamp, & Durbin, 2004) was used to predict the exact gene structure of the corresponding genomic region on each TblastN hits. According to these two approaches, all the gene models were finally integrated by EVidenceModeler (EVM) (Haas et al., 2008). Repeatmasker (Smit, Hubley, & Green, 2010) was used to detect and classify different types of repetitive sequences, via aligning the genome sequences against the Repbase library (version 17.01). To further evaluate the repetitive DNA content within the blood pheasant genome, we utilized Krait (Du, Zhang, Liu, Zhang, & Yue, 2018) to detect and characterize genome-wide tandem repeats (microsatellite loci), which could be used for identifying loci that could be employed in population genetic studies.

| Functional annotation
We functionally annotated the predicted proteins within blood pheasant according to homologous searches against SwissProt and TrEMBL protein databases (Boeckmann et al., 2003) using BLASTP tools with the E-value cutoff of 1E-5. Gene Ontology (GO) descriptions of predicted genes were retrieved from SwissProt results. All genes were uploaded to KAAS (Moriya, Itoh, Okuda, Yoshizawa, & Kanehisa, 2007) in order to find the best matching and involvement pathway for each gene. This is a web server for annotating genetic functions of the artificially modified KEGG gene database by BLAST using the bidirectional click (BBH) method.

| Analyses of gene family, phylogeny, and divergence
Gene families were constructed according to orthoMCL (Li, Stoeckert, & Roos, 2003). The detailed information of genomes used in this study was shown in Table S1. Phylogenetic analyses of these nine birds were constructed using 1:1 orthologous genes. Coding sequences from each 1:1 orthologous family were aligned by PRANK (Löytynoja & Goldman, 2010) and concatenated to one sequence for each species for building the tree. Then, the phylogenetic analysis was performed using maximum-likelihood (ML) algorithm in RAxML (Stamatakis, 2014). The best-scoring ML tree was inferred by rapid BP algorithm and ML searches after performing 1,000 rapid bootstraps. In this process, we implemented the Monte Carlo Markov Chain algorithm for divergence time estimation by MCMC tree tool in PAML package (Yang, 2007).

| Positive selection analysis
Nine birds (blood pheasant, turkey, red junglefowl, Hainan partridge (Arborophila ardens), peregrine falcon, mallard (Anas platyrhynchos), rock dove (Columba livia), zebra finch, ostrich (Struthio camelus)) were chosen for positive selection analysis. For each gene, the branch-site model in CODEML program from PAML package (Yang, 2007) was employed to detect whether the gene received positive selection in the blood pheasant branch. Two models were conducted to test the statistical significance of selective pressure specifically on the blood pheasant branch. One was the one-ratio model acting as the null model (NSsites = 0, model = 0), and the other was model 2 (NSsites = 2). The two models were compared with the likelihood ratio test (LRT), which was calculated from the log likelihood (lnL) values for both models. The p values were obtained by calculating twice the difference between lnL (model2) and lnL (one-ratio) and compared with a chi-square distribution. We then identified positively selected genes (PSGs) of the blood pheasant by means of FDR adjustment with Q values < 0.05. GO enrichment analysis of PSGs was implemented by the KOBAS software (Wu, Mao, Cai, Luo, & Wei, 2006;Xie et al., 2011). Genes of the chicken were set as the background.

| Protein structure determination
From the PSGs identified with PAML, we take MB for example to depict the protein and visualize the amino acid changes. The crystal structure of MB was obtained from SWISS-MODEL (Schwede, Kopp, Guex, & Peitsch, 2003). The cartoon and surface representations of gene mutants were visualized through PDB files, which were converted from PQR format with PDB2PQR server (Unni et al., 2011).

| Genome sequencing, assembly, and quality assessment
We sequenced 37.54 Gb paired-end reads and 54.99 Gb mate-paired reads derived from a muscle sample of a single wild male blood pheasant and assembled them into 3,898 scaffolds (N50 = 10.88 Mb, total size = 1.04 Gb) with the longest one 47.18 Mb. In addition, BUSCO evaluated that 4,622 orthologous single-copy genes assembled 94% of intact single-copy genes demonstrating high assembly integrity and sequencing uniformity (Table S2).

| Genome characterization
The GC content of the blood pheasant genome was approximately 41.23%, similar to other bird species such as red junglefowl and zebra finch. Using homologous sequence alignment and de novo prediction, we identified 109.92 Mb (10.62%) of the repeats in the genome, including DNA elements (1.15%) and retrotransposon (8.09%).
We found that 16,003 (92.99%) out of 17,209 identified PCGs were well supported by public protein databases (TrEMBL, SwissProt, GO, and KEGG) for the blood pheasant (Table S3). Totally, we identified 279,037 perfect SSRs in the blood pheasant (Table 2) which was lower than that in the buff-throated partridge .

| Phylogeny and divergence of Phasianidae species
A total of 1,550 one-to-one orthologous genes were obtained to reconstruct the phylogenetic tree. The most species-rich family within Galliformes, Phasianidae, recovered robust relationships among 17 species in the phylogenetic analysis ( Figure 2). In general, there were three major clades within Phasianidae. The earliest divergence was between the Hill partridge (Arborophila spp.) and the other phasianids. The second one included most of the pheasants, the turkey and the grouse. This group has been designated the "erectile clade" (Kimball & Braun, 2008), and this study extended those findings by

| Gene families and positive selection
Based on nine avian genomes, a total of 14,009 gene families were detected, of which 5,444 represented 1:1 orthologous gene families. We found that 550 of the 5,444 orthologous genes were under positive selection in blood pheasant. We identified biochemical pathways represented by the PSGs. The KEGG annotation of the PSGs suggested that they were involved in 51 pathways related to metabolism (53 genes), genetic information processing (32 genes), environmental information processing (54 genes), cellular processes  (Table 3).
We further performed GO enrichment with all the PSGs. GO enrichment identified significant overrepresentation of blood pheasant PSGs associated with high-altitude adaptation (Table S4) Table 3). We assessed damaging effects of Leu41Gln and Met56Leu in MB on the protein structure, which revealed the changes in cartoon presentation and surface presentation ( Figure 5).

| Phylogenetic relationships within Phasianidae
Mitochondrial genes and nuclear genes have been widely used in animal phylogenetic studies, but many studies have shown that the topologies of phylogenetic trees constructed by them may be different (Shaw, 2002;Sota & Vogler, 2001). Mitochondrial genes belong to single copy, strict maternal inheritance, and lack of recombination, so it is very convenient to apply in evolutionary analysis. However, different mitochondrial genes have different evolutionary rates and are subject to different selection pressures, so single gene phylogenetic trees may produce different results. Nuclear genes are relatively conservative and slow in evolution, so they are more suitable for solving phylogenetic problems of higher taxa. But nuclear genes have exchangeable nuclear recombination, as in Phasianidae, where interspecific hybridization is common, so nuclear genes can lead to false conclusions in evolutionary analysis. In addition, most nuclear genes have alleles, which alleles to choose in sequence analysis is also a dilemma. In this study, we obtained the largest phylogenetic tree of Phasianidae species on the basis of whole genomes to date.
The classification of turkey has long been controversial, and the traditional classification classifies turkeys into the Meleagrididae

Meleagris.
Based on the analysis of 102 morphologic traits, turkey and Phasianidae were grouped into monophyletes (Dyke, Gulas, & Crowe, 2003). Part of the mitochondrial genetic phylogenetic tree F I G U R E 2 Phylogenetic tree constructed using one-to-one orthologous genes. The time lines indicated divergence times among the species supports turkey and partridge as sister groups (Crowe et al., 2006).
Analysis of the four nuclear genes and two mitochondrial genes together showed that turkey and Koklass pheasant (Pucrasia macrolopha) were more closely related than other species (Kimball & Braun, 2008). In this study, it was found that turkey should be classified into Phasianidae and closely related to the grouse. Some of the recent researches were consistent with our conclusion, even placing turkeys and grouses sister to each other (Shen et al., 2010;Wang et al., 2013).
The phylogenetic tree in this study did not support the previous division of Tetraophasis spp. into the related genera of Arborophila spp.
or Tetraonidae (Potapov, 2002 its rectrices molted from the center to the outside, the same as the perdicini (Dyck, 1992). Johnsgard (1999) attributed the blood pheasant to phasianini. Both the protein sequence and the nucleic acid sequence of Cyt b gene showed that the phasianini and the blood pheasant clustered together (Kimball, Braun, Zwartjies, Crowe, & Ligon, 1999). In our evolutionary tree, blood pheasant was a separate branch, which was inconsistent with the previous classification of blood pheasant as pheasants or partridges. Therefore, although the whole phasiandae was monophyletic, neither the pheasants nor the partridges were monophyletic, and they were intermingled in our phylogenetic tree.

| Genetic mechanisms of highaltitude adaptation
Testing for positive selection is often difficult because positive selection tends to operate at several sites and in a short evolutionary time, and signals may be overwhelmed by ubiquitous negative selection. In this paper, we used the branch-site model in CODEML program from PAML package (Yang, 2007) to detect whether the gene received positive selection in the blood pheasant branch.
In this test, the branches of the tree are priori divided into foreground and background lineages, and a likelihood ratio test (LRT) is constructed by comparing a model that allows positive selection of foreground lineages with a model that does not allow positive selection. Its main advantage is the implementation of a rich evolutionary model that can be used to estimate the parameters in TA B L E 3 Blood pheasant-specific missense mutations of PSGs associated with high-altitude adaptation null assumptions of no selection may be rejected frequently, resulting in false positives (Zhang, 2004). If some sites evolve under the negative selection of background lineages, but experience a relaxation of constraints on foreground lineages, the test may be misled into wrongly rejecting the null neutral model (Zhang et al., 2005).
We found that ten PSGs (MB, NARFL, PAM, APOLD1, EGR1, CD38, APAF1, TNFRSF1A, VCAM1, and CASP3) with blood pheasant-specific missense mutations were associated with response to hypoxia. MB is a multifunctional heme protein that specializes in oxygen transport, storage, and buffering (Galluzzo, Pennacchietti, Rosano, Comoglio, & Michieli, 2009). The interaction between myoglobin and mitochondria was one of the important mechanisms of hypoxia adaptation: Oxygenated myoglobin was rapidly deoxidized during muscle hypoxia, which increased the oxygen pressure of capillaries and formed a concentration gradient, while its own saturation decreased, closed to the cell membrane and promoted oxygen diffusion. Meanwhile, the density of mitochondria and the cross-sectional area of muscle fiber decreased correspondingly, reducing the oxygen demand and the distance of oxygen diffusion within muscle fibers, and promoting the adaptation of animals to hypoxia (Jaspers et al., 2014). Previous study found that hypoxia markedly increased the expression of EGR1 (Nishi, Nishi, & Johnson, 2002). Hypoxia was reported to cause increased APAF1 protein expression . Under hypoxia, cells underwent a number of column biochemical changes to accommodate this condition. Some protective stress proteins' expression were enhanced, and NARFL could be directly used as an indicator of oxidative stress, which was due to the fact that NARF has been confirmed to be a major antioxidant stress component. In the case of increased levels of oxidative stress, NARFL expression increased to resist oxygen free radicals, whereas in the high oxygen condition after the knockdown of NARFL, the cell line showed increased death and increased sister chromosome breaks (Corbin, Rockx, Oostra, Joenje, & Dorsman, 2015). Therefore, NARFL mutation might cause a decrease in antioxidant stress levels, causing lung damage and malformation of the lung vessels (Kim & Byzova, 2014). In addition, HIF-1 (hypoxia-inducing factor 1) acted as a master regulator of numerous hypoxia-inducible genes under hypoxic conditions. Some studies have also confirmed that NARFL was an activity regulator of HIF-1, whereas the knockdown of NARFL was demonstrated to be able to upregulate the protein levels of HIF and increase their promoter activity and transcription in normoxic and hypoxic states (Huang et al., 2007). CD38 was the target gene of HIF-1 and promoted smooth muscle cell (SMCs) proliferation by inhibiting SIRT1 expression . It could also increase oxygen delivery by mediating Ca 2+ concentration to regulate the contraction of tracheal smooth muscle (Kuemmerle, Murthy, & Makhlouf, 1998).
Low expression of the DDB2 gene in cells would lead to the loss of DDB2 protein binding and scavenge UV-mediated damage (Bommi, Ravindran, Raychaudhuri, & Bagchi, 2018). ERCC8 and ERCC6 encoded proteins involved in the transcription-coupled DNA repair pathway (Laugel et al., 2010). The most prominent feature of ERCC8 was that it has multiple WD 40 repeating structures, which were the scaffold for protein-protein interaction and necessary for the construction of β helix structure (Koch et al., 2014). ERCC6 participated in nuclear and mitochondrial DNA oxidative damage repair through recruiting NER (nucleotide excision repair) factor from DNA damage site (Dianov, Bischoff, Sunesen, & Bohr, 1999).

| CON CLUS IONS
Our results provided the largest phylogenetic tree of Phasianidae species based on complete genomes. We found 10 hypoxia-related PSGs in the blood pheasant. These genes potentially enhanced function of the blood pheasant under hypoxic conditions by diverse pathways. Besides the hypoxia adaptation, seven PSGs required for high-altitude survival related to response to UV radiation were determined in blood pheasant, which was likely as a result of the increased exposure of blood pheasant to UV radiation. There is no doubt that the blood pheasant can be selected as an example for studying the process of adaptation to high-altitude environments.

ACK N OWLED G M ENTS
We are very grateful to Dr. Xingcheng He and Dr. Yongjie Wu for their valuable advice.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.