qitai123/PubMedQA
收藏Hugging Face2026-02-26 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/qitai123/PubMedQA
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- expert-generated
- machine-generated
language_creators:
- expert-generated
language:
- en
license:
- mit
multilinguality:
- monolingual
size_categories:
- 100K<n<1M
- 10K<n<100K
- 1K<n<10K
source_datasets:
- original
task_categories:
- question-answering
task_ids:
- multiple-choice-qa
paperswithcode_id: pubmedqa
pretty_name: PubMedQA
config_names:
- pqa_artificial
- pqa_labeled
- pqa_unlabeled
dataset_info:
- config_name: pqa_artificial
features:
- name: pubid
dtype: int32
- name: question
dtype: string
- name: context
sequence:
- name: contexts
dtype: string
- name: labels
dtype: string
- name: meshes
dtype: string
- name: long_answer
dtype: string
- name: final_decision
dtype: string
splits:
- name: train
num_bytes: 443501057
num_examples: 211269
download_size: 233411194
dataset_size: 443501057
- config_name: pqa_labeled
features:
- name: pubid
dtype: int32
- name: question
dtype: string
- name: context
sequence:
- name: contexts
dtype: string
- name: labels
dtype: string
- name: meshes
dtype: string
- name: reasoning_required_pred
dtype: string
- name: reasoning_free_pred
dtype: string
- name: long_answer
dtype: string
- name: final_decision
dtype: string
splits:
- name: train
num_bytes: 2088898
num_examples: 1000
download_size: 1075513
dataset_size: 2088898
- config_name: pqa_unlabeled
features:
- name: pubid
dtype: int32
- name: question
dtype: string
- name: context
sequence:
- name: contexts
dtype: string
- name: labels
dtype: string
- name: meshes
dtype: string
- name: long_answer
dtype: string
splits:
- name: train
num_bytes: 125922964
num_examples: 61249
download_size: 66010017
dataset_size: 125922964
configs:
- config_name: pqa_artificial
data_files:
- split: train
path: pqa_artificial/train-*
- config_name: pqa_labeled
data_files:
- split: train
path: pqa_labeled/train-*
- config_name: pqa_unlabeled
data_files:
- split: train
path: pqa_unlabeled/train-*
---
# Dataset Card for [Dataset Name]
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [PubMedQA homepage](https://pubmedqa.github.io/ )
- **Repository:** [PubMedQA repository](https://github.com/pubmedqa/pubmedqa)
- **Paper:** [PubMedQA: A Dataset for Biomedical Research Question Answering](https://arxiv.org/abs/1909.06146)
- **Leaderboard:** [PubMedQA: Leaderboard](https://pubmedqa.github.io/)
### Dataset Summary
The task of PubMedQA is to answer research questions with yes/no/maybe (e.g.: Do preoperative statins reduce atrial fibrillation after coronary artery bypass grafting?) using the corresponding abstracts.
### Supported Tasks and Leaderboards
The official leaderboard is available at: https://pubmedqa.github.io/.
500 questions in the `pqa_labeled` are used as the test set. They can be found at https://github.com/pubmedqa/pubmedqa.
### Languages
English
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
[More Information Needed]
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
Thanks to [@tuner007](https://github.com/tuner007) for adding this dataset.
annotations_creators:
- 专家生成
- 机器生成
language_creators:
- 专家生成
language:
- 英语
license:
- MIT许可证
multilinguality:
- 单语言
size_categories:
- 10万<样本数<100万
- 1万<样本数<10万
- 1千<样本数<1万
source_datasets:
- 原生数据集
task_categories:
- 问答任务
task_ids:
- 多项选择问答(multiple-choice QA)
paperswithcode_id: pubmedqa
pretty_name: PubMedQA
config_names:
- pqa_artificial
- pqa_labeled
- pqa_unlabeled
dataset_info:
- config_name: pqa_artificial
features:
- name: 文献ID(pubid)
dtype: 32位整数
- name: 问题
dtype: 字符串
- name: 上下文
sequence:
- name: 上下文内容
dtype: 字符串
- name: 标签
dtype: 字符串
- name: 医学主题词(MeSH)
dtype: 字符串
- name: 长答案
dtype: 字符串
- name: 最终判定
dtype: 字符串
splits:
- name: 训练集
num_bytes: 443501057
num_examples: 211269
download_size: 233411194
dataset_size: 443501057
- config_name: pqa_labeled
features:
- name: 文献ID(pubid)
dtype: 32位整数
- name: 问题
dtype: 字符串
- name: 上下文
sequence:
- name: 上下文内容
dtype: 字符串
- name: 标签
dtype: 字符串
- name: 医学主题词(MeSH)
dtype: 字符串
- name: 需推理预测项
dtype: 字符串
- name: 无推理预测项
dtype: 字符串
- name: 长答案
dtype: 字符串
- name: 最终判定
dtype: 字符串
splits:
- name: 训练集
num_bytes: 2088898
num_examples: 1000
download_size: 1075513
dataset_size: 2088898
- config_name: pqa_unlabeled
features:
- name: 文献ID(pubid)
dtype: 32位整数
- name: 问题
dtype: 字符串
- name: 上下文
sequence:
- name: 上下文内容
dtype: 字符串
- name: 标签
dtype: 字符串
- name: 医学主题词(MeSH)
dtype: 字符串
- name: 长答案
dtype: 字符串
splits:
- name: 训练集
num_bytes: 125922964
num_examples: 61249
download_size: 66010017
dataset_size: 125922964
configs:
- config_name: pqa_artificial
data_files:
- split: 训练集
path: pqa_artificial/train-*
- config_name: pqa_labeled
data_files:
- split: 训练集
path: pqa_labeled/train-*
- config_name: pqa_unlabeled
data_files:
- split: 训练集
path: pqa_unlabeled/train-*
# 数据集卡片:PubMedQA
## 目录
- [数据集概述](#dataset-description)
- [数据集摘要](#dataset-summary)
- [支持任务与排行榜](#supported-tasks-and-leaderboards)
- [语言](#languages)
- [数据集结构](#dataset-structure)
- [数据实例](#data-instances)
- [数据字段](#data-fields)
- [数据拆分](#data-splits)
- [数据集构建](#dataset-creation)
- [筛选依据](#curation-rationale)
- [源数据](#source-data)
- [注释](#annotations)
- [个人与敏感信息](#personal-and-sensitive-information)
- [数据使用注意事项](#considerations-for-using-the-data)
- [数据集的社会影响](#social-impact-of-dataset)
- [偏差讨论](#discussion-of-biases)
- [其他已知局限性](#other-known-limitations)
- [附加信息](#additional-information)
- [数据集维护者](#dataset-curators)
- [许可证信息](#licensing-information)
- [引用信息](#citation-information)
- [贡献](#contributions)
## 数据集概述
- **主页:** [PubMedQA主页](https://pubmedqa.github.io/ )
- **代码仓库:** [PubMedQA代码仓库](https://github.com/pubmedqa/pubmedqa)
- **论文:** [PubMedQA:生物医学研究问答数据集](https://arxiv.org/abs/1909.06146)
- **排行榜:** [PubMedQA:排行榜](https://pubmedqa.github.io/)
### 数据集摘要
PubMedQA的任务为基于对应文献摘要,以“是/否/可能”三种形式回答生物医学研究问题(例如:术前使用他汀类药物是否会降低冠状动脉搭桥术后的心房颤动发生率?)。
### 支持任务与排行榜
官方排行榜链接为:https://pubmedqa.github.io/。
`pqa_labeled`配置中的500条问题被用作测试集,相关数据可在https://github.com/pubmedqa/pubmedqa 获取。
### 语言
英语
## 数据集结构
### 数据实例
[需补充更多信息]
### 数据字段
[需补充更多信息]
### 数据拆分
[需补充更多信息]
## 数据集构建
### 筛选依据
[需补充更多信息]
### 源数据
#### 初始数据收集与标准化
[需补充更多信息]
#### 源语言生产者是谁?
[需补充更多信息]
### 注释
#### 注释流程
[需补充更多信息]
#### 注释者是谁?
[需补充更多信息]
### 个人与敏感信息
[需补充更多信息]
## 数据使用注意事项
### 数据集的社会影响
[需补充更多信息]
### 偏差讨论
[需补充更多信息]
### 其他已知局限性
[需补充更多信息]
## 附加信息
### 数据集维护者
[需补充更多信息]
### 许可证信息
[需补充更多信息]
### 引用信息
[需补充更多信息]
### 贡献
感谢[@tuner007](https://github.com/tuner007) 为本数据集的收录提供支持。
提供机构:
qitai123



