NLMChem a new resource for chemical entity recognition in PubMed full-text literature
收藏DataCite Commons2026-03-05 更新2025-05-10 收录
下载链接:
https://datadryad.org/dataset/doi:10.5061/dryad.3tx95x6dz
下载链接
链接失效反馈官方服务:
资源简介:
Automatically identifying chemical and drug names in scientific
publications advances information access for this important class of
entities in a variety of biomedical disciplines by enabling improved
retrieval and linkage to related concepts. While current methods for
tagging chemical entities were developed for the article title and
abstract, their performance in the full article text is substantially
lower. However, the full text frequently contains more detailed chemical
information, such as the properties of chemical compounds, their
biological effects, and interactions with diseases, genes, and other
chemicals. We, therefore, present the NLM-Chem corpus, a
full-text resource to support the development and evaluation of automated
chemical entity taggers. The NLM-Chem corpus consists of 150 full-text
articles, doubly annotated by ten expert NLM indexers, with ~5000 unique
chemical name annotations, mapped to ~2000 MeSH identifiers. Using this
corpus, we built a substantially improved chemical entity tagger,
with automated annotations for all of PubMed and PMC freely accessible
through the PubTator web-based interface and API.
提供机构:
Dryad
创建时间:
2021-03-22



