SCENARIO I: GENE IS WELL CHARACTERIZED
You have performed a microarray screen comparing primary Mus musculus (mouse) cells cultured with a specific growth factor, to treated cells. The microarray shows that the gene Hspg2 is significantly up-regulated in treated cells. You know nothing about the function of this gene or how it might be regulated by the growth factor. How can bioinformatics help determine whether Hspg2 is a “gene of interest” worth further investigation (Fig. 1)?
A. What Is Known About Mouse Hspg2?
Note: Gene databases access a wealth of information. They generally contain data that fall into the following categories: data summary, genomic location, gene models, links to sequences, protein domains, gene ontology, homology, expression, interactions, alleles, phenotypes, reagents, external links, and references.
Results sampled from MGI (version 3.54): Names: Perlecan, heparan sulfate proteoglycan 2. Phenotypes: Null mutants die at embryonic day 10.5 with cardiac outflow defects and/or brain exencephaly at birth. They also exhibit skeletal dysplasia, including micromelia and craniofacial defects (Arikawa-Hirasawa et al.,1999; Costell et al.,1999; Rodgers et al.,2007). An exon 3 deletion mutant shows only a lens defect (Rossi et al.,2003). Expression: Hspg2 is in nearly all tissues examined, including muscle, heart, and brain. Alleles: Five alleles including targeted knockouts; Reagents: cDNAs, primer pairs, antibodies.
B. What Is Known About Human Hspg2?
Results sampled from Entrez Gene: Function: It stabilizes other molecules and regulates cell adhesion and glomerular permeability (Chakravarti et al.,1995; Morita et al.,2005). Expression: localized to cellular basement membrane.
C. Is HSPG2 Implicated in Any Human Genetic Diseases?
Go to NCBI Entrez. Type in “HSPG2”, scroll to “OMIM” (Online Mendelian Inheritance in Man; McKusick,1998).
Note: OMIM disease sites contain data that fall into the following categories: alternative titles, clinical features, other features, inheritance, mapping, molecular genetics, history of discovery, references.
OMIM search results: 142461 heparan sulfate proteoglycan of basement membrane; HSPG2 (last edited 12/15/2006); 255800 Schwartz-Jampel Syndrome, Type 1; SJS1 (last edited 6/11/2007); 224410 Dyssegmental Dysplasia, Silverman-Handmaker Type; DDSH (last edited 2/21/2007). Function: HSPG2 has growth-promoting and angiogenic properties. It can act as a co-receptor for FGF2 (Sharma et al.,1998). Consistent with this, yeast two-hybrid (Y2H) data analysis shows it binds FGF binding protein 1 (FGFBP1; Mongiat et al.,2001). Disease information: Schwartz-Jampel Syndrome, Type 1 (SJS1) can be caused by several mutations in HSPG2. The best characterized are mutations that result in a truncated protein, and a small deletion that leads to reduced expression of the nearly full-length protein (Nicole et al.,2000; Arikawa-Hirasawa et al.,2002; Stum et al.,2006). Phenotypes point to defects in neuromuscular function and cartilage formation. The syndrome is rare, and autosomal recessive.
Dyssegmental Dysplasia, Silverman-Handmaker Type (DDSH) is caused by homozygous functional null mutations resulting in lethal, neonatal, short-limbed dwarfism (Arikawa-Hirasawa et al.,2001).
D. Are There Mouse Models for SJSI and DDSH?
Go to MGI. Type in “SJS1” or “DDSH”, scroll to “Phenotype/Human disease”.
MGI search results: Targeted mutant strains with phenotypes similar to both SJS1 or DDSH are listed.
E. What Can Other Animal Models Tell Us About Hspg2?
Note: Entrez HomoloGene (go to NCBI Entrez, scroll to “HomoloGene”) is another good place to start. A search for “Hspg2” gives a list of genes identified as putative homologs.
1. Caenorhabditis elegans (nematode worm).
Go to WormBase http://www.wormbase.org. Type in “Hspg2”, scroll to “any gene” (default).
Results sampled from WormBase (release WS177, 7/2007): Function: C. elegans unc-52 is a Hspg2 ortholog. unc-52 regulates muscle differentiation, and growth factor-like signaling pathways (Mackinnon et al.,2002; Merz et al.,2003). Mutant phenotypes: Paralyzed, locomotion abnormal, sterile, lethal. Expression: Expression begins in mid-embryogenesis. It is localized primarily in basement membrane of muscle cells (Mullen et al.,1999). In larvae and adults it is also expressed in M-lines, dense bodies, and muscle cell margins of body wall muscle. Reagents: Alleles, transgene strain, primers, microarray probes, SAGE tags, cDNAs, antibodies.
2. Drosophila melanogaster (fruit fly).
Note: A search on FlyBase for “Hspg2” yields “hsp67B”, a heat shock protein. Because it is of a different protein class (see step I–F-1), it is not an ortholog of HSPG2. A search for “perlecan” yields “trol”; the two bear the same protein domains.
Go to FlyBase http://flybase.bio.indiana.edu/. Type “trol”, scroll to “genes” (default).
Results sampled from FlyBase (version FB2006_01, released 12/8/2006): Function: Terribly reduced optic lobes (trol) is a structural molecule that regulates neuroblast division, cell–cell adhesion, cell-matrix adhesion, and signal transduction (Ebens et al.,1993; Voigt et al.,2002; Park et al.,2003). Phenotypes: Allele phenotypes include lethal and sterile. Other phenotypes are not listed. See bibliography for details. Expression: Embryonic midgut, proventriculus, hypo/epipharynx, ventral nerve cord, lateral cord, embryonic and larval circulatory, and muscle systems. Reagents: Alleles, genomic clones, RNAi probes, cDNAs.
3. Danio rerio (zebrafish)
Identify closest related zebrafish sequence to mouse Hspg2. a. Go to Basic Local Alignment and Search Tool (Altschul et al.,1990) 0-http://www.ncbi.nlm.nih.gov.ilsprod.lib.neu.edu/BLAST. Select “Danio rerio”. On next page input mouse Hspg2 amino acid sequence in FASTA format. Select database of choice and TBLASTN program. RefSeq databases are useful databases to search against, they contain NCBI curated, nonredundant sequences.
Note: Because the zebrafish genome sequencing project is not yet complete, all proteins are not yet curated. A search against a translated nucleotide database includes predicted proteins based on expressed sequence tags (ESTs) and gene models.
TBLASTN results: Closest related sequence is predicted hypothetical protein LOC565429, Entrez nucleotide accession XR_030096.1, E-value 0. Unlike the next closest match, it aligns with most of the mouse Hspg2 sequence. b. Perform “reverse” BLASTP alignment of hypothetical protein LOC565429 against mouse RefSeq protein database.
Note: Unlike for most genes, mRNA translation is not available on XR_03096.1 Entrez Gene or Nucleotide pages. Perform six-frame translation of XR_03096.1 mRNA sequence in Baylor College of Medicine (BCM) Search Luancher (last modified 6/4/2004) http://searchlauncher.bcm.tmc.edu/seq-util/Options/sixframe.html. Input translated sequence that aligns with mouse Hspg2 in previous step.
Reverse BLASTP results: Hspg2 is the closest related mouse sequence to LOC565429. c. For associated EST expression, go to NCBI Entrez. Type in “LOC565429,” scroll to “UniGene.”
Note: UniGene contains information (gene expression, genomic location, protein similarities, cDNA clone information) for transcript sequences from a common locus, including ESTs.
UniGene results: ESTs representing LOC565429 are expressed in heart and genitourinary tissues.
F. Are the Hspg2-Related Sequences True Orthologs?
1. Do they have similar domain structure?
Perform motif search on Prosite (Gasteiger et al.,2003) http://expasy.org/prosite/. Input protein sequences in FASTA format. Note: Human Protein Reference Database (HPRD)http://www.hprd.org/is a useful resource for identifying protein domains and visualizing domain structure in human proteins.
Prosite results: All proteins have the following predicted domains, although how often domains are represented vary by species: LDL-receptor class A, Ig-like, Laminin IV type A, Laminin-type EGF-like, Laminin G, EGF-like. Mouse and human proteins also contain a SEA domain. Drosophila Trol is the only putative ortholog that bears a C-5 specific cytosine DNA methylases active site.
Sequence of the zebrafish predicted protein, LOC565429, is truncated at its 5' end. Its alignment with mouse Hspg2 initiates after the 5' SEA and LDL-receptor class A domains. Consistent with this, LOC565429 is lacking these two domains. Whether the protein bears these domains awaits completion of the genome sequence.
2. Do they have close family members that could also be orthologous?
Perform BLASTP alignment of Hspg2 ortholog against own species database.
BLASTP results: There are no close family members to human, C. elegans, and Drosophila Hspg2-like genes. In mouse, a BLASTP alignment of Hspg2 against mouse RefSeq protein database finds a sequence similar to Perlecan, predicted protein LOC100047061.
3. Is syteny conserved?
Go to Entrez Gene page for mouse, human Hspg2, and zebrafish LOC565429. Examine synteny under “Genomic context”.
Synteny results: Mouse: ELa3 - 1700013G24–Hspg2–Ldlrad2–Usp48; Mouse: LOC638198 (similar to Ela3B) -1700013G24 - LOC10004- 7061 (similar to Perlecan)–Ldlrad2 - LOC674195 (similar to USP48)–Rap1gap; Human: RAP1GAP - USP48–LDLRAD2–HSPG2–ELA3B–ELA3A; Zebrafish: LOC100007646– LOC1000- 07663 (BLASTP alignment shows most similar to GSTκ1)–LOC565429 – LOC561081 (similar to Oikosin1 protein) –LOC559970.
4. Compare information between Hspg2-related genes.
Function: Mouse, human, C. elegans, and Drosophila genes have conserved roles in cell adhesion, and signal transduction of growth factor signaling pathways.
Additional functions are reported in different model systems. Phenotypes in human SJS1 and DDSH patients, and disease mouse models, implicate roles for HSPG2 in neuromuscular function and cartilage formation. Drosohphila trol plays a role in neuroblast division. C. elegans unc-52 likely regulates muscle function.
Properties unique to different organisms may be attributed to unique protein domains. For example, human and mouse have a SEA domain that is not conserved in the Drosophila and C. elegans genes, and Drosophila trol bears a C-5 specific cytosine DNA methylases active site. Alternatively, perhaps appropriate experiments have not yet been done to identify those specific properties in other model systems.
Expression: While mouse Hspg2 expression is nearly ubiquitous, expression of Drosophila and C. elegans genes is predominantly in muscle. Within the cell, these genes localize to the basement membrane. Expression of ESTs corresponding to zebrafish predicted protein LOC565429 are localized to heart and genitourinary tissues.
Protein domains: See I-F-1.
Synteny: Human, and mouse Hspg2, and mouse predicted protein LOC100047061 have conserved synteny. Therefore LOC100047061 must be Hspg2. Zebrafish predicted protein LOC565429 does not have conserved synteny. Because of the signifciant evolutionary distance, synteny between zebrafish and mouse/human orthologs are not necessarily conserved.
Human/mouse Hspg2, Drosophila trol, and C. elegans unc-52 are likely functional orthologs in muscle. Whether zebrafish LOC565429 is also a functional ortholog awaits cloning of the entire gene, and further experiments. Reagents readily available in mouse, Drosophila, and C. elegans are available for further study.
Bioinformatics searches outlined in section II can be performed to determine the evolutionary relationship between the genes.