owaiskha9654/PubMed_MultiLabel_Text_Classification_Dataset_MeSH
收藏Hugging Face2023-01-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/owaiskha9654/PubMed_MultiLabel_Text_Classification_Dataset_MeSH
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
license: afl-3.0
source_datasets:
- BioASQ Task A
task_categories:
- text-classification
task_ids:
- multi-label-classification
pretty_name: BioASQ, PUBMED
size_categories:
- 10K<n<100K
---
This dataset consists of a approx 50k collection of research articles from **PubMed** repository. Originally these documents are manually annotated by Biomedical Experts with their MeSH labels and each articles are described in terms of 10-15 MeSH labels. In this Dataset we have huge numbers of labels present as a MeSH major which is raising the issue of extremely large output space and severe label sparsity issues. To solve this Issue Dataset has been Processed and mapped to its root as Described in the Below Figure.


提供机构:
owaiskha9654
原始信息汇总
数据集概述
基本信息
- 语言: 英语
- 许可证: AFL-3.0
- 来源数据集: BioASQ Task A
- 任务类别: 文本分类
- 任务ID: 多标签分类
- 美观名称: BioASQ, PUBMED
- 数据集大小: 10K<n<100K
数据集内容
- 数据来源: 约50,000篇来自PubMed库的研究文章
- 标注方式: 由生物医学专家手动标注,每篇文章关联10-15个MeSH标签
- 数据特点: 存在大量MeSH主要标签,导致极大的输出空间和严重的标签稀疏性问题
- 处理方式: 数据集已处理并映射到其根节点,具体映射结构如图所示



