Application of Genomic and Molecular Methods to Fundamental Questions in Canine and Feline Reproductive Health

Authors


Author's address (for correspondence): VN Meyers-Wallen, Baker Institute for Animal Health, College of Veterinary Medicine, Cornell University, Ithaca, NY, USA. E-mail: vnm1@cornell.edu

Contents

Molecular tools are becoming increasingly available to investigate the genetic basis of reproductive disorders in dogs and cats. These were first successful in identifying the molecular basis of diseases inherited as simple Mendelian traits, and these are now being applied to those that are inherited as complex traits. In order to promote similar studies of reproductive disorders, we need to understand how we can play a proactive role in accumulating sufficient case material. We also need to understand these mutation discovery tools and identify collaborators who have experience with their use. The candidate gene and genomic approaches to mutation discovery in dogs are presented, including new sequencing methods and those used to confirm that a mutation has a role in disease pathology. As the final goal is to use our study results to prevent inherited disorders, we need to consider how we can promote efficiency in obtaining DNA test results and providing genetic counselling.

Introduction

Molecular tools are becoming increasingly available to investigate the genetic basis of reproductive disorders in dogs and cats. Clinicians play a central role in such studies, but they also need to understand how molecular tools can best be applied to advance our clinical knowledge and prevent inherited disorders. The role of the clinician in establishing and applying stringent diagnostic criteria that unquestionably identifies both affected and control animals should not be underestimated, as it is essential to the success of such studies. Access to case material can be facilitated by establishing DNA banks in association with veterinary teaching hospitals, where many animals are evaluated, and stringent diagnostic methods are readily available. Collaborative efforts of molecular biologists and computational scientists are equally important to utilize appropriate molecular tools and analyses to identify causative mutations, design practical DNA tests for diagnostic use and demonstrate the role of a mutation in the disease process.

Phenotyping and Preliminary Work

Some of the reproductive problems we encounter are clearly inherited, while others are likely caused by a combination of inherited and environmental factors. At present, we have adequate tools to identify mutations causing the former, but few tools to tackle the latter. To maximize the use of our resources, we can start by selecting for study those disorders that are clearly inherited. Pedigree information should be collected on each case and control, including the related affected animals. The mode of inheritance of the disorder should be known or established early in the study, as it is important in calculating the minimum number of animals needed. It is also useful to have some knowledge of the disease pathophysiology.

Second, careful choice of cases and controls is as important to the study as any of the steps in molecular biology discussed below. For several reasons, clinicians are often obliged to begin treatment based upon a presumptive diagnosis. For genetic studies, however, definitive diagnosis is essential for each animal selected as a case or a control, with no phenotypic overlap between the two. For example, for disorders having phenotypes that range from mild and severe, the most severely affected animals are often chosen as cases. In addition, if the disorder has an adult onset, the controls should be much older than the mean age of onset for that disease. For genome-wide association studies (GWAS, below), controls should be obtained from the same family or breed. From the beginning, it is prudent to collate the diagnostic information on each potential case and control and to collect and appropriately store samples from which DNA can be extracted. These include whole blood in EDTA or buccal swabs. One can also search canine DNA banks for suitable cases and controls to obtain sufficient numbers for study.

Identification of the Genetic Basis for Inherited Disorders

Candidate gene approach

For some traits or disorders, there is sufficient evidence to hypothesize that a particular protein is abnormal or missing, indicating that the gene encoding that protein may contain the causative mutation. Such a candidate gene approach has been used to identify several mutations for disorders inherited as simple Mendelian traits (autosomal dominant, autosomal recessive, X-linked). In most cases, considerable knowledge was available before the study that implicated deficiency of a specific protein, and thus a gene candidate. For example, this approach was taken for persistent Mullerian duct syndrome (PMDS) in the miniature schnauzer, which is inherited as an autosomal recessive trait but can only be expressed in XY males (Meyers-Wallen 2012). Early canine studies indicated that Mullerian inhibiting substance (MIS) was present and active in affected foetuses, indicating a likely abnormality in the MIS receptor. In addition, mutations in human Mullerian inhibiting substance receptor II (MISRII) had been identified in human patients with PMDS. Therefore, exon scanning of canine MISRII was used to identify the causative mutation (Wu et al. 2009). Specifically, the complete sequence of the canine MISRII gene was obtained from the canine genome sequence (http://genome.ucsc.edu/). Using software available from the internet, such as Primer3 (http://frodo.wi.mit.edu/), several PCR primer pairs were designed, so that each pair would amplify a whole exon or parts of an MISRII exon during PCR. The primers were used in PCR with DNA templates from PMDS dogs, proven carriers and controls. Then, the PCR products were sequenced and compared. A single nucleotide change in exon 3, which created a stop codon, was identified in affected and carrier dogs. As a result, the MISRII mRNA produced in affected dogs would be translated into a very short protein (80 amino acids vs normal 602 amino acids) lacking most of the protein domains required for receptor function. That discovery led directly to development of a DNA test to identify carrier and affected dogs in this breed (Pujar and Meyers-Wallen 2009). However, the candidate gene approach is limited. Candidate genes or proteins have not been identified for many inherited disorders, and there is frequently insufficient knowledge of the pathophysiology to predict a candidate.

Genome-wide approaches

When the aetiology of an inherited disorder is unknown and there are no obvious candidate genes, genome-wide searches are indicated. These include genome-wide linkage disequilibrium analysis (linkage analysis) of pedigrees and the genome-wide association study (GWAS) of cases and controls. Both are used to narrow a search from the entire genome to a chromosomal region and eventually to a smaller DNA interval where the causative mutation should be located. In recent years, GWAS has begun to replace linkage analysis for studies in purebred dogs. In contrast to linkage analysis, GWAS does not require DNA samples from a pedigree segregating the disorder, although such samples can be used. Second, the structure of the genome in purebred dogs is particularly advantageous for GWAS, as individuals from each breed share large regions of homozygous sequence. The number of animals that must be genotyped for GWAS varies with the mode of inheritance. Early calculations suggested that as few as 20 cases and 20 controls might be needed to identify the causative mutation for simple recessive traits, whereas 100 cases and controls might be needed for complex traits (Lindblad-Toh et al. 2005). Subsequent studies have confirmed that relatively few individuals are needed. For example, only 13 cases and 21 controls were needed to identify a chromosomal region associated with cutaneous lupus in the German shorthaired pointer, a simple autosomal recessive trait (Wang et al. 2011). As an example of a complexly inherited disorder, only 87 cases and 51 controls were used to identify chromosome regions associated with canine systemic lupus erythematosus disease complex in Nova Scotia duck tolling retrievers (Wilbe et al. 2010).

The initial goal of a GWAS is to identify a chromosomal region that is significantly associated with the affected phenotype. Typically, genomic DNA from cases and control dogs is obtained from a single breed in which the disorder is prevalent (Wade et al. 2006). The DNA is hybridized to canine whole-genome single nucleotide polymorphism (SNP) arrays, which contain thousands of SNPs spaced evenly over the entire genome (Karlsson et al. 2007). The SNPs included are generally known to be polymorphic between breeds, but not all possible SNPs are included in these arrays. The array output is analysed to generate all genotypes for the cases and controls from the thousands of SNPs on the array. Statistical analysis, such as Fisher's exact test, is used to identify which SNP genotypes are highly associated with the affected phenotype. For a simple Mendelian trait, SNPs with significant probability of association should cluster on one chromosome. These data are often displayed as a Manhattan plot, where probability of association for each SNP is plotted on the Y axis, and its chromosome location is plotted on the X axis.

After a chromosomal region is identified, the next step is to use fine mapping of this region by additional genotyping to identify the smallest region possible in which to search for the mutation. For fine mapping with SNPs, DNA is usually obtained from additional cases and controls of the initial breed used in the GWAS as well as related breeds that have the identical disorder (Miyadera et al. 2012). This strategy facilitates differentiation between genotypes that are actually associated with the disorder (disease-associated alleles) and those that are present in affected dogs because they are the same breed (breed-associated alleles). Conventional sequencing methods can be used if the region contains a candidate gene that should be screened first. However, if there are several gene candidates in the region or the region is still too large for conventional sequencing, then new methods are available that are less constrained by the size of the region. The general term for these methods is resequencing rather than fine mapping. For example, canine custom designed SNP arrays can be ordered that cover all known SNPs in the region to be fine mapped (Golden Gate Array, Illumina). Alternatively, canine custom designed genomic tiling arrays can be ordered that contain overlapping probes for all the genome sequence in the region (Sequence Capture Array, 385K; Roche Nimblegen, Madison, WI, USA). An advantage of the latter is that one can detect all SNP genotypes in the region, including previously unknown SNPs, and other nucleotide differences such as insertions and deletions. To fine map with a canine genomic tiling array, genomic DNA from cases and controls is fragmented and then hybridized to the arrays. The DNA that binds specifically to the probes is captured to produce a DNA library for each dog. The libraries are then sequenced by next generation sequencing (NGS) methods (Illumina/Solexa). The sequence outputs from the cases and controls are aligned and compared to the canine genome to identify sequence differences that are specific to the cases.

While GWAS has been successfully used to identify mutations that cause disorders inherited as simple traits, it also has the potential to identify causative mutations for complexly inherited traits. At least one group is using this approach to investigate canine cryptorchidism. Another group has used association analysis to determine whether SNPs located in candidate genes are associated with canine cryptorchidism (Zhao et al. 2010). Potentially, these methods could be used to study the genetic basis of other reproductive disorders that are inherited as complex traits. Whole-genome sequencing offers the promise of a simpler approach. This method is being used now to sequence the entire genome of a patient for direct comparison with the human genome sequence. Currently, this is very expensive, but in future, it may become the method of choice (Cordero and Ashley 2012).

Functional Tests to Demonstrate the Effect of the Mutation

Evidence that the mutation has a role in the disease pathology is necessary to confirm that the mutation causes the disorder. This information is also needed to design effective therapies. The approach can vary with the disorder and the type of mutation, but could include measurement of the mRNA transcribed from the gene, the translated protein or the binding of a protein to a receptor. Another common approach is to demonstrate an adverse effect upon the amount, timing or quality of mRNA expression. Methods used to demonstrate such adverse effects commonly include quantitative reverse transcription polymerase chain reaction (qRTPCR), expression microarrays and direct sequencing of RNA (RNA-Seq).

Some advantages of qRTPCR are that it is relatively inexpensive compared to other methods, and real-time PCR equipment is usually available in an academic environment. If the investigator is evaluating expression of a candidate gene, the method can be accurate if proper controls are used (Bustin 2010; Taylor et al. 2010). Disadvantages of qRTPCR are that the number of candidate genes and controls that can be assayed per run is limited, and proper control genes are often unknown for the canine tissue of interest. Similarly, expression microarrays are designed by choosing probes for known genes. If the causative gene is unknown, probes for that gene are unlikely to be included in a commercial array. Also, the array may not contain probes for a particular candidate gene or the proper control genes.

Although presently expensive, RNA-Seq has several advantages. For this method, total RNA of high quality is extracted from the tissue of interest. An amplified cDNA library is then produced from the RNA and sequenced by NGS methods (Mortazavi et al. 2008). Then, the sequence output is aligned to the canine genome so that the transcripts align with the gene of origin. The final output from RNA-Seq is called a transcriptome, which contains a database of all RNA transcripts that were present in the sample. Therefore, the transcriptome contains the expression data on all genes expressed in the tissue of interest, whether known or unknown (Trapnell et al. 2010). Through computational analysis, the relative abundance of each transcript and differentially spliced transcripts can also be determined. This is useful for measuring candidate gene expression in comparison with many control genes and for comparing gene networks at specific time points or in different tissues. This method is also used to compare expression between neoplastic tissue and normal tissue from the same animal. We are using this method to identify differences in gene expression between canine embryonic ovaries and testes at the time of gonadal sex determination. The function of differentially expressed genes can be explored using tools like the Pathway Interaction Database (PID, http://pid.nci.nih.gov), which contains molecular signalling and regulatory pathways, and KEGG (Kyoto Encyclopaedia of Genes and Genomes), which includes a database of enzymatic pathways. Thus, it is possible to identify which genetic pathways in the target tissue may be altered as a result of the mutation.

Applications to Problems in Canine and Feline Reproduction

Although there are reproductive disorders that are inherited as simple traits, many are likely to be inherited as complex traits. For example, the tendency of a bitch to exhibit ovulatory defects, to have long or short interoestrous intervals, or to develop pyometra is likely to be influenced by several genes. It can be difficult and expensive to identify the genetic basis for such complex disorders, as it will likely require at least 100 cases and 100 controls to attempt a GWAS. To aid collection of cases and controls for such studies, the College of Veterinary Medicine at Cornell University has developed a canine DNA bank. Such banks contain DNA samples from cases phenotyped by stringent diagnostic tests and control animals that have been similarly screened. This approach benefits from the careful phenotyping and diagnostic resources that are available at referral centres associated with veterinary teaching hospitals. Written client consent is acquired. The medical history and findings are included in the bank database so that researchers can pick cases and controls based upon medical history as well as laboratory test results and definitive diagnosis designated by the attending clinician. Expansion to include samples from other domesticated and wild animals could be helpful to provide even rare samples to researchers.

Improvement of Animal Health

While clinicians and researchers can use the knowledge gained from functional studies to design better treatments for inherited disorders, the final goal is to prevent production of affected animals. Mutation detection and DNA testing can play a decisive role in achieving this goal. There remains a need for effective communication of DNA results to veterinarians and state-of-the-art genetic counselling to breeders and owners. Thus, there is a growing need for training in genetic counselling for veterinarians and perhaps development of genetic counselling as a specialty practice within the veterinary profession.

Conflicts of interest

None of the authors have any conflicts of interest to declare.

Authors contributions

Dr Meyers-Wallen is the sole contributor of this article.

Ancillary