Semantic text mining in early drug discovery for type 2 diabetes

NIAID Data Ecosystem2026-03-11 收录

下载链接：

https://zenodo.org/record/3603609

下载链接

链接失效反馈

官方服务：

资源简介：

BACKGROUND: Surveying the scientific literature is an important part of early drug discovery; and with the ever-increasing amount of biomedical publications it is imperative to focus on the most interesting articles. Here we present a project that highlights new understanding (e.g.\ recently discovered modes of action) and identifies potential novel drug target, via a novel, data-driven text mining approach to score type 2 diabetes (T2D) relevance. We focused on monitoring trends and jumps in T2D relevance to help us be timely informed of important breakthroughs. METHODS: We extracted over 7 million n-grams from PubMed and then clustered around 240,000 linked to T2D into almost 50,000 T2D relevant `semantic concepts'. To score papers, these concepts were weighted depending on co-mentioning with core T2D proteins. A protein's current T2D relevance was determined by combining the scores of the papers mentioning it in the preceeding five years. The significance of a jump in a protein's rank was assessed by comparing it to previously observed jumps. RESULTS: We show that T2D relevant papers, also those not mentioning T2D explicitly, got assigned high scores by mentioning semantic concepts often used in connection with T2D, as shown by the enrichment of well known T2D proteins among the top scoring proteins. Our `high jumpers' identified important past developments in the apprehension of how certain key proteins relate to T2D, indicating that our method will make us aware of future breakthroughs. In summary, this project facilitated keeping up with current T2D research by repeatedly providing short lists of potential novel targets into our early drug discovery pipeline.

创建时间：

2020-06-16

5,000+

优质数据集

54 个

任务类型

进入经典数据集