Two document-concept representations of the biomedical literature
收藏NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5138385
下载链接
链接失效反馈官方服务:
资源简介:
These two datasets represent the biomedical literature (Medline abstracts and PubMedCentral articles) in the "document-concept matrix" format produced by TDC Tools. These datasets can be used in downstream IR applications such as Literature-Based Discovery.
Each of the two datasets corresponds to a specific data extraction method, see details here and in the paper linked below.
Paper: pending
Code: https://github.com/erwanm/tdc-tools
Documentation:https://erwanm.github.io/tdc-tools/
Important: the raw data from which this data is derived was downloaded from Medline, PubMedCentral and PubTatorCentral, provided courtesy of the U.S. National Library of Medicine (NLM). The data was extracted in January 2021 and do not reflect the most current/accurate data available from NLM. See the github repository above in order to generate similar datasets from up to date data.
创建时间:
2021-07-27



