quantiles/PubMedQA
收藏Hugging Face2026-04-26 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/quantiles/PubMedQA
下载链接
链接失效反馈官方服务:
资源简介:
PubMedQA数据集是一个用于生物医学研究问答的数据集,其任务是基于相关摘要回答研究问题,答案为是/否/可能(例如:术前他汀类药物是否减少冠状动脉旁路移植术后的心房颤动?)。数据集包含三个配置:pqa_artificial(人工生成数据,约211,269个示例)、pqa_labeled(标注数据,1,000个示例,其中500个用作测试集)和pqa_unlabeled(未标注数据,约61,249个示例)。数据特征包括pubid(整数标识符)、question(问题字符串)、context(上下文序列,包含contexts、labels、meshes等字段)、long_answer(长答案字符串)和final_decision(最终决策字符串)。数据集为英语单语,规模在10K到1M之间,适用于多项选择问答任务,并有一个官方排行榜用于评估模型性能。
The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after coronary artery bypass grafting?) using the corresponding abstracts. The dataset includes three configurations: pqa_artificial (artificially generated data with approximately 211,269 examples), pqa_labeled (labeled data with 1,000 examples, of which 500 are used as a test set), and pqa_unlabeled (unlabeled data with approximately 61,249 examples). Features include pubid (integer identifier), question (question string), context (a sequence containing contexts, labels, meshes, and other fields), long_answer (long answer string), and final_decision (final decision string). The dataset is monolingual in English, with size categories ranging from 10K to 1M, and is designed for multiple-choice question answering tasks, with an official leaderboard available for model evaluation.
提供机构:
quantiles



