MeSHup
收藏arXiv2022-04-29 更新2024-06-21 收录
下载链接:
https://github.com/xdwang0726/MeSHup
下载链接
链接失效反馈官方服务:
资源简介:
MeSHup是由西安大略大学和多伦多大学联合创建的大型生物医学文档索引数据集,包含1,342,667篇英文全文文章及其相关的MeSH标签和元数据。该数据集从MEDLINE数据库中收集,旨在通过提供丰富的全文信息,支持生物医学文本分类和信息检索的研究。MeSHup数据集的创建过程涉及从PubMed Central Open Access和MEDLINE/PubMed年度基准库中下载和匹配文章,确保数据的质量和多样性。该数据集广泛应用于自动支持生物医学文献索引的研究,旨在解决手动索引过程中的时间和成本问题。
MeSHup is a large-scale biomedical document indexing dataset jointly developed by the University of Western Ontario and the University of Toronto. It contains 1,342,667 full-text English articles along with their associated MeSH terms and metadata. Collected from the MEDLINE database, this dataset aims to support research on biomedical text classification and information retrieval by providing rich full-text information. The creation process of the MeSHup dataset involves downloading and matching articles from PubMed Central Open Access and the annual baseline repository of MEDLINE/PubMed to ensure the quality and diversity of the dataset. This dataset is widely used in research on automatically supporting biomedical literature indexing, aiming to address the time and cost issues in manual indexing processes.
提供机构:
西安大略大学计算机科学系,多伦多大学计算机科学系,向量研究所,多伦多健康联合体
创建时间:
2022-04-29



