A vast body of literature has suggested genetic programming of preterm birth. However, there is a complete lack of an organized analysis and stratification of genetic variants that may indeed be involved in the pathogenesis of preterm birth. We developed a novel bioinformatics approach to identify the nominal genetic variants associated with preterm birth. We used semantic data mining to extract all published articles related to preterm birth. Genes identified from public databases and archives of expression arrays were aggregated with genes curated from the literature. Pathway analysis was used to impute genes from pathways identified in the curations. The curated articles and collected genetic information are available in a web-based tool, the database for preterm birth (dbPTB) that forms a unique resource for investigators interested in preterm birth.