SEARCH

SEARCH BY CITATION

Keywords:

  • Bioinformatics;
  • genetics;
  • preterm birth

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Knowledge-based computational biology and bioinformatics approach
  5. Query development and data integration
  6. Insights from database for preterm birth
  7. Discussion
  8. Disclosures
  9. References

A vast body of literature has suggested genetic programming of preterm birth. However, there is a complete lack of an organized analysis and stratification of genetic variants that may indeed be involved in the pathogenesis of preterm birth. We developed a novel bioinformatics approach to identify the nominal genetic variants associated with preterm birth. We used semantic data mining to extract all published articles related to preterm birth. Genes identified from public databases and archives of expression arrays were aggregated with genes curated from the literature. Pathway analysis was used to impute genes from pathways identified in the curations. The curated articles and collected genetic information are available in a web-based tool, the database for preterm birth (dbPTB) that forms a unique resource for investigators interested in preterm birth.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Knowledge-based computational biology and bioinformatics approach
  5. Query development and data integration
  6. Insights from database for preterm birth
  7. Discussion
  8. Disclosures
  9. References

Preterm birth (PTB) is an important, poorly understood clinical problem. It inures enormous clinical, economic and psychological burdens to society. While recent theories underscore the role of inflammation in preterm labor, simple explanations, single pathways and simple patterns of inheritance are inadequate to explain the pathogenesis of this enigmatic pregnancy complication. The pathogenesis of PTB could be better investigated whether considered a complex, polygenic disorder that entails activation or suppression of a host of genes. We hypothesized that polymorphic changes in the genes that contribute to the risk of preterm birth could be identified using new bioinformatics approaches coupled with high-throughput technologies applied to appropriate cohorts of patients. This will lead to previously unrecognized insights into the relative contribution of the genetic and environmental factors, which underlie preterm birth.

We developed an alternative approach to identify a more manageable set of candidate genes, which nonetheless incorporates some elements of genome-wide investigation. Our approach combined information from published literature with data from expression databases, linkage data and pathway analyses to identify biologically relevant genes for testing in an association study of genetic variants and preterm birth. These genes, their genomic location, the single nucleotide polymorphisms contained therein and any associated copy number variations are presented in a publically available, searchable database, http://ptbdb.cs.brown.edu/dbPTBv1.php.

Knowledge-based computational biology and bioinformatics approach

  1. Top of page
  2. Abstract
  3. Introduction
  4. Knowledge-based computational biology and bioinformatics approach
  5. Query development and data integration
  6. Insights from database for preterm birth
  7. Discussion
  8. Disclosures
  9. References

We developed a web-based, semantic data mining and aggregation tool to ‘filter’ published literature for evidence of association of preterm birth with genes, genetic variants, single nucleotide polymorphisms (SNPs) or changes in gene expression. dbPTB used SciMinerTm to extract the gene and protein information from published articles specific to preterm birth.[1] More than 30,000 articles related to PTB potentially included relevant information on genes, SNPs or genetic variations. Using semantic language processing, we identified 980 articles with information about genes and genetic variants. We used queries that have common and very well-known keywords for PTB and genetics, for example, ‘preterm birth and genes’. After acceptance of extracted articles, all the MeSH (Medical Subject Headings) terms associated with these papers were used to create new search queries with the newly annotated MeSH terms.

Curation is the process where the literature is searched by several junior and senior members of a biomedical research team. Our curation team consisted of researchers and medical students formally trained in the molecular and cell biology of preterm birth. Each article was carefully read with attention to study design, and relevant articles were deposited into the database with their unique PMID. We entered the genes, genetic variants, SNPs, rs numbers and annotations describing gene–gene interactions. We accepted the authors' criteria for statistical significance. All genes and genetic variants entered into the database were entered using their unique Hugo Gene Nomenclature (HGNC) numbers for identification. SNPs were entered into the database and recorded with their appropriate rs number using HapMap Data Release 27.[2] Where specific haplotypes were shown to confer significant risk for preterm birth, all the individual SNPs within the haplotype were entered into the database. Inter-rater reliability was assessed, and kappa scores were measured after training.[3, 4] Articles that were accepted for PTB immediately become accessible to dbPTB queries along with all the relevant genetic data (Fig. 1).

image

Figure 1. Workflow for retrieval of articles, curation and extraction of genes from literature, microarray data and gene interpolation for pathway analysis.

Download figure to PowerPoint

Query development and data integration

  1. Top of page
  2. Abstract
  3. Introduction
  4. Knowledge-based computational biology and bioinformatics approach
  5. Query development and data integration
  6. Insights from database for preterm birth
  7. Discussion
  8. Disclosures
  9. References

High-dimension databases of expression data, data from linkage analyses, databases of results from SNP arrays and data from proteomic platforms were searched for genes, genetic variants and proteins related to preterm birth or showing differential association with preterm birth. We also searched for articles that provided information on analyses of proteins in body fluids or compartments that were analyzed using contemporary proteomic techniques; for example, mass spectrometry. We also searched the Heart, Lung, Blood Institute and the National Human Genome research (NHGRI) repositories, the Human Gene Mutation Database and the Catalogue of Published Genome-Wide Association Studies hosted by the NHGRI.

For each deposited gene, we include SNP data and tag SNPs from 5 kb upstream to 5 kb downstream from the genomic sequence from HapMap (release number 272). SNP information was utilized from NCBI dbSNP Build 126. For each article, abstract and related information such as PMID numbers, journal name, authors' name and title also were stored in dbPTB.

We used the ingenuity pathway analysis (IPA, Ingenuity® Systems, www.ingenuity.com) to identify pathways and networks involving the genes we identified with significant evidence for their roles in preterm birth. We included the genes and genetic variants identified by curation and in public databases, largely transcriptome wide array data sets[5, 6] and some proteomic analyses related to preterm birth.[7] The genes identified by the ingenuity pathway analysis were entered into the Kyoto Encyclopedia of Genes and Genomes (KEGG) database.

Insights from database for preterm birth

  1. Top of page
  2. Abstract
  3. Introduction
  4. Knowledge-based computational biology and bioinformatics approach
  5. Query development and data integration
  6. Insights from database for preterm birth
  7. Discussion
  8. Disclosures
  9. References

We extracted 31,018 articles dealing with PTB from PubMed using SciMiner. The ‘filtered set’ included 980 articles with likely information from 1200 genes. We ‘accepted’ 142 articles described by a total of 960 unique MeSH terms. These articles provided associations of 186 genes with preterm birth that were accepted as statistically valid by the publishers and the curation team. We next imported 215 genes from both published and public databases containing array data and data from other proteomic analyses. Lastly, we identified and included an additional 216 genes based on the interpolation from pathway analysis. These genes were contained in 173 unique pathways. The work flow supporting retrieval of genes from the literature and public databases and gene interpolation from pathway analysis is shown in Fig. 1. These results are all retrievable from the publicly available database for preterm birth http://ptbdb.cs.brown.edu/dbPTBv1.php. We have also included the 156,963 SNPs contained with the genomic and flanking regions of each gene in dbPTB. We physically mapped the genomic location for genes in dbPTB. The chromosomes and the number of genes mapped to each are shown in Fig. 2.

image

Figure 2. Number of genes among chromosomes identified from curated articles, databases and pathway analysis.

Download figure to PowerPoint

We identified a total of 25 networks. Several networks including ‘Inflammatory Response, Small Molecule Biochemistry, Cellular Development, Hematological System Development and Function, Cellular Function and Maintenance, Cardiovascular Disease, Connective Tissue Development and Function, Drug Metabolism, Genetic Disorder’ represented the largest portion of interaction domains among the major networks detected.

Database for preterm birth allows investigators interested in preterm birth to pursue several query strategies to search related articles, genes, SNPs, chromosomes or keywords against the MeSH terms and abstracts of the curated articles. This includes the authors, the title of the articles, name of the published journal and the link to the original source. There are links to Online Mendelian Inheritance in Man (OMIM), the UCSC Genome Bioinformatics and HGNC. Under the same search option, users are able to see all related SNP data for each gene.

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Knowledge-based computational biology and bioinformatics approach
  5. Query development and data integration
  6. Insights from database for preterm birth
  7. Discussion
  8. Disclosures
  9. References

Recent studies have focused on genomic and proteomic approaches to diagnosing and determining the mechanism(s) of preterm labor. Polymorphic changes in the protein coding regions of specific genes and in regulatory and intronic sequences have been described. In most of the studies reported to date, candidate genes or proteins involved in inflammatory reactivity or uterine contractility have been investigated.[8-26] Summaries of these observations and candidate genes have been reported.[12] Most of the studies reported to date have involved modest-sized patient cohorts and polymorphisms from genes involved in infection/inflammation. The results suggest that alteration in the structure and/or expression of these proteins interacts with infection and/or other environmental influences and is associated with preterm birth. The results generally, however, do not provide insight into the causes of prematurity in the absence of inflammation. They also do not demonstrate whether the observed associations are reflective of genetic mechanism(s) and/or gene–environmental interactions.

The promises of the genomic era have been presented eloquently.[27-29] The genome-wide association study (GWAS) approach queries the genome in a hypothesis-free unbiased approach, with the potential for identifying novel genetic variants. However, while there have been a number of important ‘hits’ (e.g., macular degeneration, obesity), there are many ‘misses’ and failures to replicate findings even from large-scale studies.[30-32] Moreover, the GWAS-based interrogation of large numbers of anonymous SNPs or CNVs severely limits power and makes it difficult computationally to examine combinatorial gene–gene interactions.[33-35]

We created a more manageable set of genes and genetic variants for which there is a prior evidence for involvement in preterm delivery. dbPTB was developed to create, aggregate and store this unique combination and specialized information on preterm birth. We believe this smaller set of genes may allow important but otherwise difficult computational approaches to examination of gene–gene interactions in combinatorial or higher order fashion. As the first basis for population of this database, we used published literature. One hundred and eighty-six genes were identified by using the literature-based curation, 215 genes were from publically available databases and an additional 216 genes came from the pathway-based interpolation. This total of 617 genes represents a parsimonious but robust set of genes for which there is good a priori biological evidence for involvement in preterm birth. These genes and genetic variants can be used now in case–controlled studies comparing genetic variants, SNPs or copy number variations for their relationship to PTB.

Disclosures

  1. Top of page
  2. Abstract
  3. Introduction
  4. Knowledge-based computational biology and bioinformatics approach
  5. Query development and data integration
  6. Insights from database for preterm birth
  7. Discussion
  8. Disclosures
  9. References

None.

Acknowledgements

This work was supported by the National Foundation March of Dimes Prematurity Initiative # 21-FY08-563, and National Institutes of Health Grants NIH-5T35HL094308-02 and NIH-NCRR P20 RR018728.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Knowledge-based computational biology and bioinformatics approach
  5. Query development and data integration
  6. Insights from database for preterm birth
  7. Discussion
  8. Disclosures
  9. References
  • 1
    Hur J, Schuyler AD, States DJ, Feldman EL: SciMiner: web-based literature mining tool for target identification and functional enrichment analysis. Bioinformatics 2009; 25:838840.
  • 2
    International HapMap Consortium: A haplotype map of the human genome. Nature 2005; 437:12991320.
  • 3
    Donner A, Klar N: The statistical analysis of kappa statistics in multiple samples. J Clin Epidemiol 1996; 49:10531058.
  • 4
    Reed JF 3rd: Homogeneity of kappa statistics in multiple samples. Comput Methods Programs Biomed 2000; 63:4346.
  • 5
    Enquobahrie DA, Williams MA, Qiu C, Muhie SY, Slentz-Kesler K, Ge Z, Sorenson T: Early pregnancy peripheral blood gene expression and risk of preterm delivery: a nested case control study. BMC Pregnancy Childbirth 2009; 9:56.
  • 6
    Weiner CP, Mason CW, Dong Y, Buhimschi IA, Swaan PW, Buhimschi CS: Human effector/initiator gene sets that regulate myometrial contractility during term and preterm labor. Am J Obstet Gynecol 2010; 202:474.e1e20.
  • 7
    Buhimschi CS, Dulay AT, Abdel-Razeq S, Zhao G, Lee S, Hodgson EJ, Bhandari V, Buhimschi IA: Fetal inflammatory response in women with proteomic biomarkers characteristic of intra-amniotic inflammation and preterm birth. BJOG 2009; 116:257267.
  • 8
    Adams KM, Eschenbach DA: The genetic contribution towards preterm delivery. Semin Fetal Neonatal Med 2004; 9:445452.
  • 9
    Crider KS, Whitehead N, Buus RM: Genetic variation associated with preterm birth: a HuGE review. Genet Med 2005; 7:593604.
  • 10
    Menon R, Fortunato SJ, Thorsen P, Williams S: Genetic associations in preterm birth: a primer of marker selection, study design, and data analysis. J Soc Gynecol Investig 2006; 13:531541.
  • 11
    Pennell CE, Jacobsson B, Williams SM, Buus RM, Muglia LJ, Dolan SM, Morken NH, Ozcelik H, Lye SJ, Relton C: Genetic epidemiologic studies of preterm birth: guidelines for research. Am J Obstet Gynecol 2007; 196:107118.
  • 12
    Plunkett J, Muglia LJ: Genetic contributions to preterm birth: implications from epidemiological and genetic association studies. Ann Med 2008; 40:167195.
  • 13
    Romero R, Espinoza J, Gotsch F, Kusanovic JP, Friel LA, Erez O, Mazaki-Tovi S, Than NG, Hassan S, Tromp G: The use of high-dimensional biology (genomics, transcriptomics, proteomics, and metabolomics) to understand the preterm parturition syndrome. BJOG 2006; 113(Suppl. 3):118135.
  • 14
    Weinberg CR, Shi M: The genetics of preterm birth: using what we know to design better association studies. Am J Epidemiol 2009; 170:13731381.
  • 15
    Aidoo M, McElroy PD, Kolczak MS, Terlouw DJ, ter Kuile FO, Nahlen B, Lal AA, Udhayakumar V: Tumor necrosis factor-alpha promoter variant 2 (TNF2) is associated with pre-term delivery, infant mortality, and malaria morbidity in western Kenya: Asembo Bay Cohort Project IX. Genet Epidemiol 2001; 21:201211.
  • 16
    Fujimoto T, Parry S, Urbanek M, Sammel M, Macones G, Kuivaniemi H, Romero R, Strauss JF 3rd: A single nucleotide polymorphism in the matrix metalloproteinase-1 (MMP-1) promoter influences amnion cell MMP-1 expression and risk for preterm premature rupture of the fetal membranes. J Biol Chem 2002; 277:62966302.
  • 17
    Genc MR, Gerber S, Nesin M, Witkin SS: Polymorphism in the interleukin-1 gene complex and spontaneous preterm delivery. Am J Obstet Gynecol 2002; 187:157163.
  • 18
    Kalish RB, Vardhana S, Gupta M, Perni SC, Witkin SS: Interleukin-4 and -10 gene polymorphisms and spontaneous preterm birth in multifetal gestations. Am J Obstet Gynecol 2004; 190:702706.
  • 19
    Landau R, Xie HG, Dishy V, Stein CM, Wood AJ, Emala CW, Smiley RM: beta2-Adrenergic receptor genotype and preterm delivery. Am J Obstet Gynecol 2002; 187:12941298.
  • 20
    Lorenz E, Hallman M, Marttila R, Haataja R, Schwartz DA: Association between the Asp299Gly polymorphisms in the Toll-like receptor 4 and premature births in the Finnish population. Pediatr Res 2002; 52:373376.
  • 21
    Ozkur M, Dogulu F, Ozkur A, Gokmen B, Inaloz SS, Aynacioglu AS: Association of the Gln27Glu polymorphism of the beta-2-adrenergic receptor with preterm labor. Int J Gynaecol Obstet 2002; 77:209215.
  • 22
    Papazoglou D, Galazios G, Koukourakis MI, Kontomanolis EN, Maltezos E: Association of -634G/C and 936C/T polymorphisms of the vascular endothelial growth factor with spontaneous preterm delivery. Acta Obstet Gynecol Scand 2004; 83:461465.
  • 23
    Roberts AK, Monzon-Bordonaba F, Van Deerlin PG, Holder J, Macones GA, Morgan MA, Strauss JF 3rd, Parry S: Association of polymorphism within the promoter of the tumor necrosis factor alpha gene with increased risk of preterm premature rupture of the fetal membranes. Am J Obstet Gynecol 1999; 180:12971302.
  • 24
    Simhan HN, Krohn MA, Roberts JM, Zeevi A, Caritis SN: Interleukin-6 promoter -174 polymorphism and spontaneous preterm birth. Am J Obstet Gynecol 2003; 189:915918.
  • 25
    Witkin SS, Vardhana S, Yih M, Doh K, Bongiovanni AM, Gerber S: Polymorphism in intron 2 of the fetal interleukin-1 receptor antagonist genotype influences midtrimester amniotic fluid concentrations of interleukin-1beta and interleukin-1 receptor antagonist and pregnancy outcome. Am J Obstet Gynecol 2003; 189:14131417.
  • 26
    Gibson G: Hints of hidden heritability in GWAS. Nat Genet 2010; 42:558560.
  • 27
    Varmus H: Getting ready for gene-based medicine. N Engl J Med 2002; 347:15261527.
  • 28
    Collins FS, Green ED, Guttmacher AE, Guyer MS: A vision for the future of genomics research. Nature 2003; 422:835847.
  • 29
    Feero WG, Guttmacher AE, Collins FS: Genomic medicine–an updated primer. N Engl J Med 2010; 362:20012011.
  • 30
    Dewan A, Liu M, Hartman S, Zhang SS, Liu DT, Zhao C, Tam PO, Chan WM, Lam DS, Snyder M, Barnstable C, Pang CP, Hoh J: HTRA1 promoter polymorphism in wet age-related macular degeneration. Science 2006; 314:989992.
  • 31
    Mathew CG: New links to the pathogenesis of Crohn disease provided by genome-wide association scans. Nat Rev Genet 2008; 9:914.
  • 32
    Glessner JT, Bradfield JP, Wang K, Takahashi N, Zhang H, Sleiman PM, Mentch FD, Kim CE, Hou C, Thomas KA, Garris ML, Deliard S, Frackelton EC, Otieno FG, Zhao J, Chiavacci RM, Li M, Buxbaum JD, Berkowitz RI, Hakonarson H, Grant SF: A genome-wide study reveals copy number variants exclusive to childhood obesity cases. Am J Hum Genet 2010; 87:661666.
  • 33
    Gui J, Andrew AS, Andrews P, Nelson HM, Kelsey KT, Karagas MR, Moore JH: A robust multifactor dimensionality reduction method for detecting gene–gene interactions with application to the genetic analysis of bladder cancer susceptibility. Ann Hum Genet 2011; 75:2028.
  • 34
    Cordell HJ: Detecting gene–gene interactions that underlie human diseases. Nat Rev Genet 2009; 10:392404.
  • 35
    Moore JH: Detecting, characterizing, and interpreting nonlinear gene–gene interactions using multifactor dimensionality reduction. Adv Genet 2010; 72:101116.