harvardairobotics/MedConclusion-Compact
收藏Hugging Face2026-04-20 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/harvardairobotics/MedConclusion-Compact
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-generation
language:
- en
tags:
- medical
- biomedical
- abstract
- conclusion-generation
pretty_name: MedConclusion Compact
size_categories:
- 100K<n<1M
---
# MedConclusion-Compact
**MedConclusion** is a large-scale dataset of 5.7M PubMed structured abstracts for biomedical conclusion generation. Each instance pairs the non-conclusion sections of an abstract with the original author-written conclusion, providing naturally occurring supervision for evidence-to-conclusion reasoning. MedConclusion also includes journal-level metadata such as biomedical category and SJR, enabling subgroup analysis across biomedical domains.
This repository contains the **Compact** version of the dataset, designed for faster evaluation and model prototyping. For the full dataset (5.7M instances), please check out the [**Full Version**](https://huggingface.co/datasets/harvardairobotics/MedConclusion).
- **Train**: 100,000 instances
- **Validation**: 10,000 instances
- **Test**: 30,000 instances
## Benchmark Information
This dataset is introduced in the paper [MedConclusion: A Benchmark for Biomedical Conclusion Generation from Structured Abstracts](https://arxiv.org/abs/2604.06505).
## Citation
```bibtex
@article{li2026medconclusion,
title={MedConclusion: A Benchmark for Biomedical Conclusion Generation from Structured Abstracts},
author={Li, Weiyue and Qian, Ruizhi and Li, Yi and Li, Yongce and Long, Yunfan and Cai, Jiahui and Luo, Yan and Wang, Mengyu},
journal={arXiv preprint arXiv:2604.06505},
year={2026}
}
```
许可证:MIT许可证
任务类别:
- 文本生成
语言:
- 英语
标签:
- 医疗
- 生物医学
- 摘要
- 结论生成
展示名称:MedConclusion Compact
规模类别:
- 10万 < 样本数 < 100万
# 紧凑版MedConclusion
**MedConclusion**是一款面向生物医学结论生成任务的大规模数据集,收录了570万条PubMed(PubMed)结构化摘要。每个数据样本均将摘要的非结论部分与作者原创的结论段落进行配对,为证据到结论的推理任务提供了天然的监督信号。该数据集还包含期刊级元数据,例如生物医学分类与SJR指数,支持跨生物医学领域的亚组分析。
本仓库提供该数据集的**紧凑版**,专为快速评估与模型原型开发设计。如需获取完整数据集(含570万条样本),请访问[**完整版**](https://huggingface.co/datasets/harvardairobotics/MedConclusion)。
- **训练集**:100,000条样本
- **验证集**:10,000条样本
- **测试集**:30,000条样本
## 基准数据集信息
该数据集由论文《MedConclusion:面向结构化摘要的生物医学结论生成基准数据集》(https://arxiv.org/abs/2604.06505)提出。
## 引用格式
bibtex
@article{li2026medconclusion,
title={MedConclusion: A Benchmark for Biomedical Conclusion Generation from Structured Abstracts},
author={Li, Weiyue and Qian, Ruizhi and Li, Yi and Li, Yongce and Long, Yunfan and Cai, Jiahui and Luo, Yan and Wang, Mengyu},
journal={arXiv preprint arXiv:2604.06505},
year={2026}
}
提供机构:
harvardairobotics



