five

owaiskha9654/PubMed_MultiLabel_Text_Classification_Dataset_MeSH

收藏
Hugging Face2023-01-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/owaiskha9654/PubMed_MultiLabel_Text_Classification_Dataset_MeSH
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: afl-3.0 source_datasets: - BioASQ Task A task_categories: - text-classification task_ids: - multi-label-classification pretty_name: BioASQ, PUBMED size_categories: - 10K<n<100K --- This dataset consists of a approx 50k collection of research articles from **PubMed** repository. Originally these documents are manually annotated by Biomedical Experts with their MeSH labels and each articles are described in terms of 10-15 MeSH labels. In this Dataset we have huge numbers of labels present as a MeSH major which is raising the issue of extremely large output space and severe label sparsity issues. To solve this Issue Dataset has been Processed and mapped to its root as Described in the Below Figure. ![Mapped Image not Fetched](https://raw.githubusercontent.com/Owaiskhan9654/Gene-Sequence-Primer-/main/Capture111.PNG) ![Tree Structure](https://raw.githubusercontent.com/Owaiskhan9654/Gene-Sequence-Primer-/main/Capture22.PNG)
提供机构:
owaiskha9654
原始信息汇总

数据集概述

基本信息

  • 语言: 英语
  • 许可证: AFL-3.0
  • 来源数据集: BioASQ Task A
  • 任务类别: 文本分类
  • 任务ID: 多标签分类
  • 美观名称: BioASQ, PUBMED
  • 数据集大小: 10K<n<100K

数据集内容

  • 数据来源: 约50,000篇来自PubMed库的研究文章
  • 标注方式: 由生物医学专家手动标注,每篇文章关联10-15个MeSH标签
  • 数据特点: 存在大量MeSH主要标签,导致极大的输出空间和严重的标签稀疏性问题
  • 处理方式: 数据集已处理并映射到其根节点,具体映射结构如图所示
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作