quantiles/PubMedQA

Name: quantiles/PubMedQA
Creator: quantiles
Published: 2026-04-26 00:03:09
License: 暂无描述

Hugging Face2026-04-26 更新2026-05-03 收录

下载链接：

https://hf-mirror.com/datasets/quantiles/PubMedQA

下载链接

链接失效反馈

官方服务：

资源简介：

PubMedQA数据集是一个用于生物医学研究问答的数据集，其任务是基于相关摘要回答研究问题，答案为是/否/可能（例如：术前他汀类药物是否减少冠状动脉旁路移植术后的心房颤动？）。数据集包含三个配置：pqa_artificial（人工生成数据，约211,269个示例）、pqa_labeled（标注数据，1,000个示例，其中500个用作测试集）和pqa_unlabeled（未标注数据，约61,249个示例）。数据特征包括pubid（整数标识符）、question（问题字符串）、context（上下文序列，包含contexts、labels、meshes等字段）、long_answer（长答案字符串）和final_decision（最终决策字符串）。数据集为英语单语，规模在10K到1M之间，适用于多项选择问答任务，并有一个官方排行榜用于评估模型性能。

The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after coronary artery bypass grafting?) using the corresponding abstracts. The dataset includes three configurations: pqa_artificial (artificially generated data with approximately 211,269 examples), pqa_labeled (labeled data with 1,000 examples, of which 500 are used as a test set), and pqa_unlabeled (unlabeled data with approximately 61,249 examples). Features include pubid (integer identifier), question (question string), context (a sequence containing contexts, labels, meshes, and other fields), long_answer (long answer string), and final_decision (final decision string). The dataset is monolingual in English, with size categories ranging from 10K to 1M, and is designed for multiple-choice question answering tasks, with an official leaderboard available for model evaluation.

提供机构：

quantiles

5,000+

优质数据集

54 个

任务类型

进入经典数据集