• Text mining;
  • Anti-obesity effect;
  • Anti-diabetes effect;
  • Compound dictionary;
  • Natural language processing


In the pharmaceutical industry, efficiently mining pharmacological data from the rapidly increasing scientific literature is very crucial for many aspects of the drug discovery process such as target validation, tool compound selection etc. A quick and reliable way is needed to collect literature assertions of selected compounds’ biological and pharmacological effects in order to assist the hypothesis generation and decision-making of drug developers. INFUSIS, the text mining system presented here, extracts data on chemical compounds from PubMed abstracts. It involves an extensive use of customized natural language processing besides a co-occurrence analysis. As a proof-of-concept study, INFUSIS was used to search in abstract texts for several obesity/diabetes related pharmacological effects of the compounds included in a compound dictionary. The system extracts assertions regarding the pharmacological effects of each given compound and scores them by the relevance. For each selected pharmacological effect, the highest scoring assertions in 100 abstracts were manually evaluated, i.e. 800 abstracts in total. The overall accuracy for the inferred assertions was over 90 percent.