Genome‐wide comparative identification and analysis of membrane‐FADS‐like superfamily genes in freshwater economic fishes

Membrane fatty acid desaturase (FADS)‐like superfamily proteins (FADSs) are essential for the synthesis of unsaturated fatty acids (UFAs). Recently, studies on FADS in fishes have mostly focused on marine species, and a comprehensive analysis of the FADS superfamily, including the FADS, stearoyl‐CoA desaturase (SCD), and sphingolipid delta 4‐desaturase (DEGS) families, in freshwater economic fishes is urgently required. To this end, we conducted a thorough analysis of the number, gene/protein structure, chromosomal location, gene linkage map, phylogeny, and expression of the FADS superfamily. We identified 156 FADSs genes in the genome of 27 representative species. Notably, FADS1 and SCD5 were lost in most freshwater fish and other teleosts. All FADSs proteins contain 4 transmembrane helices and 2–3 amphipathic α‐helices. FADSs in the same family are often linked on the same chromosome; moreover, FADS and SCD or DEGS are frequently collocated on the same chromosome. In addition, FADS, SCD, and DEGS family proteins share similar evolutionary patterns. Interestingly, FADS6, as a member of the FADS family, exhibits a similar gene structure and chromosome location to that of SCD family members, which may be the transitional form of FADS and SCD. This study shed light on the type, structure, and phylogenetic relationship of FADSs in freshwater fishes, offering a new perspective into the functional mechanism analysis of FADSs.

Regarding phylogenetic analysis, fads2 has been discussed in marine economic fishes, such as Labrus bergylta, Sarpa salpa, Pegusa lascaris, Atherina presbyter, and aquatic invertebrates [21,22]. However, existing phylogenetic studies of FADSs have mainly focused on only one FADS family. Moreover, in freshwater fish, there has been limited comprehensive comparative analysis of FADS, SCD, and DEGS, although researchers have been interested in cloning, tissue expression, and function of FADS genes. Therefore, in this study, the FADSs superfamily genes/proteins were analyzed exhaustively from gene types, gene structure, and protein structure to phylogeny. These results can provide a new perspective for exploring the functional mechanism of unknown FADSs members among various fishes.
The HGNC database was used to classify the FADSs gene group, and online HGNC comparison of orthology predictions (HCOP) (https://www.genenames.org/tools/ hcop/) was employed to identify the orthologues of FADSs in the genomes of humans, mice, rats, chickens, tropical clawed frogs, zebrafish, worms, and fruit flies. Additionally, the orthologues in 19 other organisms were obtained through Ensemble Compare and HomoloGene [23]. To ensure the integrity of FADSs members, NCBI BLASTP and TBLASTN searches with default parameters were conducted, as well as an extensive literature survey to search for previously reported FADSs genes. The gene and protein sequences of FADSs in all representative species genomes were downloaded from the NCBI database.
Gene structure analysis, conserved protein motifs, and topology detection Referencing the exon-intron organization for the structural diversity of FADSs genes in the Ensembl database (release 104), the gene structures, including prominent FADSs transcripts, were drawn manually using PowerPoint (PPT).

Chromosomal location and gene linkage analysis
For the syntenic analysis of FADSs genes, the locations of chromosomes or scaffolds were obtained from Ensembl genome databases (http://asia.ensembl.org/index.html). Subsequently, genetic linkage maps were constructed using MAPDRAW 2.1 [28].

Sequence alignment and phylogenetic analysis
DNA and amino acid sequences were aligned by ClustalW with default parameters and then manually inspected. A neighbor-joining (NJ) phylogenetic tree was built using the Poisson model based on the multiple alignments with 1000 replicates for bootstrap analysis and other default parameters in MEGAX [29]. The reliability of the phylogenetic analysis was verified by constructing a maximumlikelihood (ML) tree based on the Jones-Taylor-Thornton (JTT) model using MEGAX with default parameters. EVOLVIEW v3 [30] was used to visualize and annotate the NJ tree.

Expression pattern determination
The gene expression data of zebrafish, tilapia, and salmon were downloaded from the NCBI database. The data were computed from RNA-seq alignments compared with the most recent RefSeq gene models on the reference genome and then normalized by RPKM (reads per kilobase of transcript per million mapped reads). Heatmaps of gene expression were generated using TBTOOLS (v1.075) [31].

Genome-wide identification of the FADSs superfamily in selected organisms
Referring to the nomenclature and classification of the FADSs in the human genome on the HGNC, we divided the FADSs superfamily into 8 classes, from FADS1 to FADS8. The FADS family comprises FADS1, FADS2, FADS3 and FADS6; the SCD family includes SCD5 (FADS4) and SCDs (FADS5); and the DEGS family consists of DEGS1 (FADS7) and DEGS2 (FADS8). In total, 156 FADS genes were identified in the 27 collected organisms ( Table 1). The numbers and types of FADSs varied among species, with FADS2 and FADS6 being the most conserved in the majority of 27 organisms, FADS1 and SCD5 being lost in most teleosts, while DEGS1 and DEGS2 were found in almost all teleosts.
In Table S1, we list the gene ID, protein ID, splice variant number, nucleic acid/protein length, exon/ intron numbers, genome sites, and gene annotations of 156 FADSs. The length of the FADSs genes varied greatly, from 972 bp (degs1 in Atlantic cod) to 5556 bp (fads6 in zebrafish). The protein lengths, however, showed relatively little variation, ranging from 294 (SCD5 in chicken) to 508 (fads1 in chicken) amino acids. Generally, each family exhibits similar protein lengths, and the average protein lengths are 444, 342, and 323 aa for FADS, SCD, and DEGS, respectively.

Structural characterization of FADSs genes
To investigate the relationship between structure and function, the gene structures of FADSs were analyzed in this study. We downloaded all existing gene structure information of FADSs in the Ensembl database and then manually drew classical gene structures without alternative splicing ( Fig. 1 and Table S1). Each  fat-3/fat-4/fat-5/fat-6/fat-7 5 Fruit fly (Drosophila melanogaster) Goldfish (Carassius auratus) Channel catfish (Ictalurus punctatus) Atlantic salmon (Salmo salar) family displays its own typical structure. Most FADS1, FADS2, and FADS3 have 12 exons and 11 introns, with exons 2 to 10 being highly conserved in lengths (111, 198, 102, 129, 61, 77, 98, 97, 80, and 126 bp, respectively). Unlike most FADS family genes, the structure of FADS6 is more similar to that of SCD family genes, which have 6 exons and 5 introns. In addition, some SCD5 genes show a structure composed of 5 exons and 4 introns, with exons 2-4 exhibiting high conservation (131, 206, and 233 bp, respectively). DEGS family genes have 3 exons and 2 introns, with the second exon (743 bp) being highly conserved. Furthermore, some atypical gene structures are present (Table S2). For instance, 13 exons and 12 introns are present in the FADS2 of zebrafish, Atlantic cod, channel catfish, L. crocea, S. aurata, fadsd6, and d6fadc of Atlantic salmon, respectively; 14 exons and 13 introns are found in the fadsd5 of Atlantic salmon and fads2 of Tropical clawed frog; 11 exons and 10 introns are found in the fads1 of shark and fads2 of common carp; 8 exons and 7 introns are observed in the fads2 of Atlantic salmon, fat-4 of C. elegans, scd of shark and carp; 5 exons and 4 introns are found in the degs1 of shark; and 2 exons and 1 intron are present in the degs2 of channel catfish.

The conserved His motif and topology structure of FADSs proteins
Although different FADSs family proteins shared limited similarity, the identity of protein sequences varied from 1.1% to 99% (Table S3) The crystalline structure of many FADSs proteins has not yet been elucidated. To gain insight into this, we simulated the protein structures of fads2, scd, and degs1 in zebrafish, common carp, and salmon using SWISS-MODEL. The sequences of fads2 mapped to the template (PDBID:1lj0) show an a-helix and bsheet, which range from 15-98, 15-98, and 29-112 aa in zebrafish, common carp, and salmon, respectively ( Fig. 2A-C). The regions that matched the template (PDBID:4ymk) were comparatively integrated for the scd of zebrafish, common carp, and salmon (range: Snapper (Sparus aurata) Atlantic cod (Gadus morhua) The upper or lower part of the gene nomenclature refers to the Ensembl database. N-could not be identified. The names in brackets are aliases. 16-326, 15-286, and 23-330 aa, respectively), comprising the transmembrane helix (TM) and membranebound region composed of amphipathic a-helix (AH), and the other domain consisting of a-helix and b-sheet ( Fig. 2F-H). The alignment regions of the model template (PDBID:4zyo) for degs1 of zebrafish, common carp, and salmon were 84-146, 85-145, and 85-146, respectively, which were mainly composed of a-helices ( Fig. 2K-M). Thereafter, based on the structure of mouse SCD1 [32], we predicted the topological structure of fads2, scd, and degs1 in zebrafish using Plobius. The structures of Fads2 and scd in zebrafish, which contain four helices, were found to be similar to those of SCD1 in mice (Fig. 2D,I). However, degs1 was predicted to have six helices with high probability (Fig. 2N). Careful tracking of the path through the membrane suggested the HX 2 HH motif was placed on the opposite side of the membrane from another His motif, making it impossible to maintain the formation of the metal centre. Therefore, the two helices in the centre of degs1 are likely to be AH-like AH1 in SCD1. Thus, we propose that degs1 contains 2 AH and 4 TM. We then manually drew the predicted topologies with 4 TM and 2-3 AH of these three proteins (Fig. 2E,J,O).

Gene linkage map of FADSs superfamily
Homologous genes are usually linked on the same chromosome; thus, we constructed FADSs linkage maps (Fig. 3). In mammals, FADS1, FADS2, and FADS3 tend to be localized on the same chromosome and arranged in sequential order on human chromosome 11, mouse chromosome 19 and rat chromosome 1, respectively. The scd is also present in chain with FADS in rats and mice. Similarly, fads1 and fads2 are typically found close together on the same chromosome in chickens and Xenopus. fat-3/fat-4/fat-6 and fat-5/fat-7 chains are present in nematode. Fads6/scd and degs1/scdb are usually situated on the same chromosome in fish. Degs1, degs2, scdb, FADS6, and SCD5 are all located on Primary_assembly ssa01 in Atlantic salmon. Conversely, none of the genes in the three families are located on the same chromosome in common carp.

Phylogenetic relationship of FADSs
Based on the above results, we found that different families shared a limited similarity of protein sequences. Thus, we constructed three separate protein/gene neighbor-joining (NJ) trees for each family using MEGAX to thoroughly elaborate the evolution of FADSs. The topology of the protein tree was consistent with that of the gene tree. The results of the protein tree were mainly analyzed, while the results of the gene tree are presented in the form of attachments. In the FADS family, FADS1, FADS2, and FADS3 merged into one clade, while FADS6 was in another clade. In the FADS1/FADS2/FADS3 clade, one fads was present in Branchiostoma belcheri, which is the orthologues of FADS1/FADS2/ FADS3 in vertebrates; conversely, two duplication fads (fads1/fads2) were present in the shark genome ( Fig. 4 and Fig. S4). In the SCD family, the SCDs in fruit flies, nematodes, amphioxus, ciona, mammals, chondrichthyes, aves, and fish were branched in different branches; in ciona, there was one SCD that was then duplicated and radiated in vertebrate genomes; however, SCD5 was lost in most teleost fish. Moreover, we found that SCD5 was present in salmon and rainbow trout and clustered with human and cartilaginous fish SCD5 (Fig. 5 and Fig. S5). Similarly, DEGS family proteins underwent a similar evolutionary process to that of the SCD and FADS families ( Fig. 6 and Fig. S6).
Then, to illustrate the relationship of the FADSs superfamily, we merged all proteins into a single dendrogram ( Fig. 7 and Fig. S7). In addition to the above-investigated sequences, we also selected two D12-related sequences, as LC-PUFA de novo synthesis requires the presence of D12 fatty acid desaturase enzymes. We found that FADS, DEGS, and SCD family proteins were divided into three distinct branches in the phylogenetic tree (Fig. 7). The topology of the ML tree of FADSs superfamily proteins (Fig. S8) and genes (Fig. S9) was consistent with that of NJ.

Gene expression of the FADSs superfamily
Based on data mining from gene expression databases and the literature, gene expression analysis of FADSs was performed in zebrafish, Atlantic salmon and tilapia. The results showed that the expression of degs1 was highest in the heads of adult female zebrafish (Fig. 8A); fads2 was highly expressed in the brain and pyloric caeca of Atlantic salmon (Fig. 8B), while the expression of fads2 was the highest in the liver of tilapia (Fig. 8C).

FADS1 and SCD5 are lost in the majority of fish species
The types, numbers and catalytic activities of FADSs are closely related to the biosynthetic capacity of LC-PUFAs   [21]. In this study, we identified the following types of FADSs in 27 representative organisms: FADS (FADS1, FADS2, FADS3, FADS6), SCD (SCDs, SCD5), and DEGS (DEGS1, DEGS2). FADS1, FADS2, and FADS3 are found in mammals, and FADS1 and FADS2 encode proteins with D5 and D6 desaturase activities, respectively, while the FADS3 expression product does not exhibit catalytic activity [33]. By contrast, FADS1 is lost and FADS3 is absent in teleosts, while only FADS2 is conserved in the fish genome. These results are consistent with the previous study [34,35]. In teleosts, FADS2 shows multifunctional enzyme activities, including D4, D5, D6, and D8 desaturases [16,36]. This phenomenon confirms that gene loss is a major driving force of functional innovation [37].
A previous study found that due to teleost-specific genome duplication (3R), the teleost SCD1 gene repertoire expanded to two copies, but SCD5 did not [3]. However, in this study, we found that SCD5 was present in the Atlantic salmon and rainbow trout genomes. Additionally, in freshwater fish, SCD was present in the carp and grass carp genomes, yet absent in the channel catfish genomes. This may be due to the improvement of genome sequencing in freshwater fishes. However, the function of the desaturase gene in freshwater fish remains to be studied. In addition, it should be noted that the presence of the above FADSs genes is mainly based on the existing research results and fish genome information. The number, type, and function of these genes in specific species must be further studied.

Exon gain, loss and functional divergence of FADSs genes
To characterize the structural information of FADSs, we examined 121 gene structures and found that, compared with mammals, most FADSs in fish have atypical structures of gene involved in exon gain and loss. For example, in this study, in contrast to the typical FADS family gene structure (12 exons and 11 introns), the FADS2s of catfish, carp, and large yellow croakers have 13 exons and 11 introns. These structural differences may induce functional divergence of FADS genes in fish [38,39]. Furthermore, the 12th exon of fads2 is lost in carp, but it still exhibits the ability to synthesize PUFAs [40], suggesting that the 12th exon may not encode the functional site for desaturases interacting with fatty acid substrates. In addition, the high-quality genomic information of carp is not yet complete, further study is needed to verify the functional mechanism of fads2.
Interestingly, as a member of the FADS family, the structure of FADS6 is similar to that of SCD and SCD5. We found that the exon of fads6 has increased in common carp and swamp eel. Similar to the study conducted in Trachinotus ovatus, Fads6 has been shown to possess D4 desaturation activity [41]. However, little information on FADS6 has been reported, and its exact functionality remains to be determined.

The synteny of FADSs genes
Assessing the linkage between family members may provide clues to understanding their evolutionary history [42]. Homologous genes are usually linked on the same chromosome. In this study, we observed that different family genes were frequently present on the same chromosomes. For instance, the linkage of fads6/ scd was found in zebrafish, tilapia, channel catfish, cod, fugu, and goldfish; degs1/scdb is configured on the same chromosome in zebrafish, rainbow trout, and  snapper genome. The synteny of FADSs genes may offer a novel insight into the evolution of these three families.

Three families exhibit similar evolutionary patterns
To understand the evolutionary relationships of the FADSs superfamily, we constructed a phylogenetic tree for individual gene families, as well as phylogenetic trees for the entire superfamily. We found that the three families share a similar evolution pattern.
In the FADS1/FADS2/FADS3 clade, only one fads existed in Branchiostoma belcheri, while two fads (fads1/fads2) were present in the shark genome. This suggests that the replication and isolation of FADS1 and FADS2 occurred after invertebrate chordate divergence and prior to cartilaginous fishes, with FADS1 being lost in teleost fish. Similarly, as for SCD family members, one SCD in ciona is then duplicated and radiated in vertebrate genomes; however, SCD5 is lost in most teleost fish. Furthermore, DEGS family proteins underwent a similar evolutionary process to that of the SCD and FADS families, with duplication events predated the urochordata and then random losses as the organisms evolved. Thus, these results show that the replication and separation events of the three families (FADS/SCD/DEGS) may have begun in cartilaginous fish.
Interestingly, although the gene structure and protein conserved His motif of FADS6 are similar to those of SCD, it is attached to FADS1/2/3 in the phylogenetic tree but is distant from SCD. Moreover, it has been demonstrated that in humans, the function of FADS6 is comparable to that of SCD [43]. Thus, we speculate that FADS6 may be a transitional form of FADS and SCD, thereby indicating the close kinship of these three families.
Taking into account the similar gene structure, conserved histidine motifs, gene linkage phenomena, and similar evolutionary events, we speculated that all FADSs superfamily genes evolved from the same ancestral gene through gene duplication and gene loss; furthermore, these superfamilies subsequently diverged into three families across functional diversification after invertebrate chordate. Therefore, convergent evolution must be considered when viewing the whole superfamily of membrane-bound desaturases and hydroxylases [4]. However, D4 desaturation activity from the FADS and DEGS families is distinct and may be the result of an insertion/deletion event early in the evolution of this superfamily. The fatty acid D4desaturase and the sphingolipid D4-desaturases most likely have evolved their D4-regioselectivities independently [4]. Furthermore, freshwater and marine fish are  Table S4. To make the phylogenetic tree more legible, we have merged the branches of each family that were clustered together. FADS1/2/3/6 are grouped together as FADS (lightcyan); SCD/ACOD/scdb/SCD5 are grouped into SCD branch (lilac); DEGS1/DEGS2 are clustered into DEGS branch (palegreen); Each branch is highlighted with a unique color. The numbers in the branches represent bootstrap values. likely to cluster together in the evolutionary tree, implying that the family's kinship is not determined by its habitat.
Specific expression and the possible function of the FADSs gene family in the primary LC-PUFA metabolism pathway To reveal the expression patterns of the FADSs gene superfamily, we investigated gene expression databases and the literature. We found that these genes play different roles in various tissues and sites. For instance, sphingolipids are abundant in the tissues of the central nervous system [44]; therefore, degs1 exhibits the highest expression in the head of adult female zebrafish. In the fish genome, limited FADSs may possess multiple functions; for example, in Atlantic salmon and tilapia, the expression of fads2 was inconsistent, likely due to its versatility. However, given the limited data on FADSs gene expression in available databases, further research is needed to determine the specific functions of FADSs superfamily genes in fish. Functionally, all homologous FADSs perform specific functions at different locations in the PUFA synthesis pathway. According to the functional annotation from PubChem and the literature, we mapped FADSs proteins to the main metabolic pathways in fish (Fig. 9). FADS2(D5, D6, D8) primarily catalyze the desaturation of linoleic acid (LA, C18:2n-6) and a-linoleic acid (ALA, C18:3n-3) to form ARA Fig. 8. The gene expression of FADSs in zebrafish, salmon, and tilapia. The expression of degs1 was highest in the head of adult female zebrafish (A). fads2 was highly expressed in the brain and pyloric caeca of Atlantic salmon (B), but the expression of fads2 in the liver was the highest in tilapia (C). The raw data on the gene expression of three species are provided in Table S5. (C20:4n-6), EPA (C20:5n-3), and other LC-PUFAs. DEGS1(D4) and DEGS2(D4) promote the synthesis of sphingolipids, mainly catalyzing the formation of sphingosine ceramide from dihydroceramide. SCD5 (D9) and SCD(D9) are responsible for the desaturation of stearoyl-ACP (C18:0) (ACP-acyl carrier protein) to form oleoyl-ACP (C18:1n-9).
Fish have a limited ability to endogenously synthesize PUFAs due to the lack of D12 and D15 desaturases [9]. Instead, they can convert stearic acid to oleic acid by SCD(D9). The biosynthesis of LC-PUFA from ALA usually begins with the desaturation of FADS2(Δ6), which then produces ARA and EPA, respectively, through enzyme elongation and subsequent desaturation of FADS1(Δ5) [45]. The other pathway, known as the 'D8 pathway', which involves the elongation of LA and ALA by elongase, followed by desaturation through Δ8 and FADS1(Δ5) to generate ARA and EPA, respectively [46]. EPA produced by two different pathways can then be used for DHA synthesis in vertebrates [47]. In general, the most recognized biosynthesis pathway of DHA in vertebrates is the 'Sprecher pathway', in which EPA is converted to C24:5n-3 after two successive elongation enzymes, followed by the desaturation of FADS2(Δ6) to C24:6n-3, and finally produces DHA through partial boxidation. In addition, there is a more direct route from docosapentaenoic acid (DPA, C22:5n-3) to DHA through the desaturation of Δ4, known as the 'Δ4 pathway' [47]. We also found the function of another desaturase (DEGS); this enzyme is D4 active and acts on sphingolipids, distinct from the D4 mentioned above [4].

Conclusions
To illustrate the relationship between FADSs and the synthesis of UFAs in freshwater economic fish, this study comprehensively conducted a genome-wide comparative analysis of the FADSs gene superfamily, including the FADS, SCD, and DEGS families. A total of 156 FADSs genes were identified in 27 organisms. Unlike mammals, FADS2 with multiple functions is conserved in most freshwater economic fish and other teleosts. Most genes of the FADS, SCD, and DEGS families share 12-11, 6-5, and 3-2 exon-intron typical gene structures, respectively. Moreover, they share a similar protein architecture, consisting of 4 TM and 2-3 AH. The genes/proteins from the same family are usually located on the same chromosome and cluster in a single branch on a NJ tree. Phylogenetic analysis indicates that these three families have undergone similar evolutionary processes.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Fig. S1. FADS family protein multiple sequence alignment and conserved His motif. The three conserved motifs, (R/Q)HPGG, LQHDX2H, and HFQHH are present in primary structures of most FADS family (FADS1/FADS2/FADS3) (A), with the exception of FADS6 (B). In order to clearly demonstrate the His motif, we only present the conserved His and the amino acids that are adjacent to them. The omitted amino acids are indicated by dotted lines. Fig. S2 and Fig. S3 are presented in a similar manner.   S3. The DEGS family protein multiple sequence alignment and conserved His motif. Fig. S4. NJ tree of the FADS gene family. Fig. S5. NJ of the SCD gene family. Fig. S6. NJ tree of the DEGS gene family. Fig. S7. NJ tree of the FADSs gene superfamily. Fig. S8. ML tree of FADSs superfamily proteins. Fig. S9. ML tree of FADSs superfamily genes. Table S1. FADSs gene locations and alternative splicing. In this table, the gene ID, protein ID, splice variant number, RNA/protein length, and exon/intron number, differing genome sites and other gene annotations are listed.