highnote/pubmed_qa
收藏Hugging Face2023-08-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/highnote/pubmed_qa
下载链接
链接失效反馈官方服务:
资源简介:
---
annotations_creators:
- expert-generated
- machine-generated
language_creators:
- expert-generated
language:
- en
license:
- mit
multilinguality:
- monolingual
size_categories:
- 100K<n<1M
- 10K<n<100K
- 1K<n<10K
source_datasets:
- original
task_categories:
- question-answering
task_ids:
- multiple-choice-qa
paperswithcode_id: pubmedqa
pretty_name: PubMedQA
dataset_info:
- config_name: pqa_labeled
features:
- name: pubid
dtype: int32
- name: question
dtype: string
- name: context
sequence:
- name: contexts
dtype: string
- name: labels
dtype: string
- name: meshes
dtype: string
- name: reasoning_required_pred
dtype: string
- name: reasoning_free_pred
dtype: string
- name: long_answer
dtype: string
- name: final_decision
dtype: string
splits:
- name: train
num_bytes: 2089200
num_examples: 1000
download_size: 687882700
dataset_size: 2089200
- config_name: pqa_unlabeled
features:
- name: pubid
dtype: int32
- name: question
dtype: string
- name: context
sequence:
- name: contexts
dtype: string
- name: labels
dtype: string
- name: meshes
dtype: string
- name: long_answer
dtype: string
splits:
- name: train
num_bytes: 125938502
num_examples: 61249
download_size: 687882700
dataset_size: 125938502
- config_name: pqa_artificial
features:
- name: pubid
dtype: int32
- name: question
dtype: string
- name: context
sequence:
- name: contexts
dtype: string
- name: labels
dtype: string
- name: meshes
dtype: string
- name: long_answer
dtype: string
- name: final_decision
dtype: string
splits:
- name: train
num_bytes: 443554667
num_examples: 211269
download_size: 687882700
dataset_size: 443554667
config_names:
- pqa_artificial
- pqa_labeled
- pqa_unlabeled
duplicated_from: pubmed_qa
---
# Dataset Card for [Dataset Name]
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [PUBMED_QA homepage](https://pubmedqa.github.io/ )
- **Repository:** [PUBMED_QA repository](https://github.com/pubmedqa/pubmedqa)
- **Paper:** [PUBMED_QA: A Dataset for Biomedical Research Question Answering](https://arxiv.org/abs/1909.06146)
- **Leaderboard:** [PUBMED_QA: Leaderboard](https://pubmedqa.github.io/)
### Dataset Summary
[More Information Needed]
### Supported Tasks and Leaderboards
[More Information Needed]
### Languages
[More Information Needed]
## Dataset Structure
### Data Instances
[More Information Needed]
### Data Fields
[More Information Needed]
### Data Splits
[More Information Needed]
## Dataset Creation
### Curation Rationale
[More Information Needed]
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
[More Information Needed]
### Contributions
Thanks to [@tuner007](https://github.com/tuner007) for adding this dataset.
标注生成方(annotations_creators):
- 专家生成(expert-generated)
- 机器生成(machine-generated)
语言生成方(language_creators):
- 专家生成(expert-generated)
使用语言(language):
- 英语(en)
许可证(license):
- MIT许可证(mit)
多语言属性(multilinguality):
- 单语言(monolingual)
样本规模分类(size_categories):
- 10万 < 样本数 < 100万
- 1万 < 样本数 < 10万
- 1千 < 样本数 < 1万
源数据集(source_datasets):
- 原生数据集(original)
任务类别(task_categories):
- 问答(question-answering)
任务子类别(task_ids):
- 多选问答(multiple-choice-qa)
PapersWithCode数据集ID(paperswithcode_id): pubmedqa
数据集展示名(pretty_name): PubMedQA
数据集详情(dataset_info):
- 配置名(config_name): pqa_标注版(pqa_labeled)
特征字段(features):
- 字段名(name): PubMed编号(pubid)
数据类型(dtype): 32位整数(int32)
- 字段名(name): 问题(question)
数据类型(dtype): 字符串(string)
- 字段名(name): 上下文(context)
序列类型(sequence):
- 字段名(name): 上下文文本(contexts)
数据类型(dtype): 字符串(string)
- 字段名(name): 标签(labels)
数据类型(dtype): 字符串(string)
- 字段名(name): 医学主题词(MeSH,meshes)
数据类型(dtype): 字符串(string)
- 字段名(name): 需推理预测项(reasoning_required_pred)
数据类型(dtype): 字符串(string)
- 字段名(name): 无需推理预测项(reasoning_free_pred)
数据类型(dtype): 字符串(string)
- 字段名(name): 长答案(long_answer)
数据类型(dtype): 字符串(string)
- 字段名(name): 最终判定(final_decision)
数据类型(dtype): 字符串(string)
数据划分(splits):
- 划分名称(name): 训练集(train)
字节数(num_bytes): 2089200
样本数(num_examples): 1000
下载大小(download_size): 687882700
数据集总大小(dataset_size): 2089200
- 配置名(config_name): pqa_未标注版(pqa_unlabeled)
特征字段(features):
- 字段名(name): PubMed编号(pubid)
数据类型(dtype): 32位整数(int32)
- 字段名(name): 问题(question)
数据类型(dtype): 字符串(string)
- 字段名(name): 上下文(context)
序列类型(sequence):
- 字段名(name): 上下文文本(contexts)
数据类型(dtype): 字符串(string)
- 字段名(name): 标签(labels)
数据类型(dtype): 字符串(string)
- 字段名(name): 医学主题词(MeSH,meshes)
数据类型(dtype): 字符串(string)
- 字段名(name): 长答案(long_answer)
数据类型(dtype): 字符串(string)
数据划分(splits):
- 划分名称(name): 训练集(train)
字节数(num_bytes): 125938502
样本数(num_examples): 61249
下载大小(download_size): 687882700
数据集总大小(dataset_size): 125938502
- 配置名(config_name): pqa_人工合成版(pqa_artificial)
特征字段(features):
- 字段名(name): PubMed编号(pubid)
数据类型(dtype): 32位整数(int32)
- 字段名(name): 问题(question)
数据类型(dtype): 字符串(string)
- 字段名(name): 上下文(context)
序列类型(sequence):
- 字段名(name): 上下文文本(contexts)
数据类型(dtype): 字符串(string)
- 字段名(name): 标签(labels)
数据类型(dtype): 字符串(string)
- 字段名(name): 医学主题词(MeSH,meshes)
数据类型(dtype): 字符串(string)
- 字段名(name): 长答案(long_answer)
数据类型(dtype): 字符串(string)
- 字段名(name): 最终判定(final_decision)
数据类型(dtype): 字符串(string)
数据划分(splits):
- 划分名称(name): 训练集(train)
字节数(num_bytes): 443554667
样本数(num_examples): 211269
下载大小(download_size): 687882700
数据集总大小(dataset_size): 443554667
配置名列表(config_names):
- pqa_人工合成版(pqa_artificial)
- pqa_标注版(pqa_labeled)
- pqa_未标注版(pqa_unlabeled)
派生数据集来源(duplicated_from): pubmed_qa
# 数据集卡片(PubMedQA)
## 目录(Table of Contents)
- [数据集描述(Dataset Description)](#dataset-description)
- [数据集概述(Dataset Summary)](#dataset-summary)
- [支持的任务与排行榜(Supported Tasks and Leaderboards)](#supported-tasks-and-leaderboards)
- [使用语言(Languages)](#languages)
- [数据集结构(Dataset Structure)](#dataset-structure)
- [数据实例(Data Instances)](#data-instances)
- [数据字段(Data Fields)](#data-fields)
- [数据划分(Data Splits)](#data-splits)
- [数据集构建(Dataset Creation)](#dataset-creation)
- [构建依据(Curation Rationale)](#curation-rationale)
- [源数据(Source Data)](#source-data)
- [标注信息(Annotations)](#annotations)
- [个人与敏感信息(Personal and Sensitive Information)](#personal-and-sensitive-information)
- [数据集使用注意事项(Considerations for Using the Data)](#considerations-for-using-the-data)
- [数据集的社会影响(Social Impact of Dataset)](#social-impact-of-dataset)
- [偏见讨论(Discussion of Biases)](#discussion-of-biases)
- [其他已知局限性(Other Known Limitations)](#other-known-limitations)
- [附加信息(Additional Information)](#additional-information)
- [数据集策展人(Dataset Curators)](#dataset-curators)
- [许可证信息(Licensing Information)](#licensing-information)
- [引用信息(Citation Information)](#citation-information)
- [贡献声明(Contributions)](#contributions)
## 数据集描述(Dataset Description)
- **主页:** [PubMedQA 官方主页](https://pubmedqa.github.io/)
- **代码仓库:** [PubMedQA 代码仓库](https://github.com/pubmedqa/pubmedqa)
- **学术论文:** [PubMedQA:面向生物医学研究问答的数据集](https://arxiv.org/abs/1909.06146)
- **排行榜:** [PubMedQA 排行榜](https://pubmedqa.github.io/)
### 数据集概述(Dataset Summary)
【需补充更多信息】
### 支持的任务与排行榜(Supported Tasks and Leaderboards)
【需补充更多信息】
### 使用语言(Languages)
【需补充更多信息】
## 数据集结构(Dataset Structure)
### 数据实例(Data Instances)
【需补充更多信息】
### 数据字段(Data Fields)
【需补充更多信息】
### 数据划分(Data Splits)
【需补充更多信息】
## 数据集构建(Dataset Creation)
### 构建依据(Curation Rationale)
【需补充更多信息】
### 源数据(Source Data)
#### 初始数据收集与标准化(Initial Data Collection and Normalization)
【需补充更多信息】
#### 源语言生产者是谁?(Who are the source language producers?)
【需补充更多信息】
### 标注信息(Annotations)
#### 标注流程(Annotation process)
【需补充更多信息】
#### 标注者是谁?(Who are the annotators?)
【需补充更多信息】
### 个人与敏感信息(Personal and Sensitive Information)
【需补充更多信息】
## 数据集使用注意事项(Considerations for Using the Data)
### 数据集的社会影响(Social Impact of Dataset)
【需补充更多信息】
### 偏见讨论(Discussion of Biases)
【需补充更多信息】
### 其他已知局限性(Other Known Limitations)
【需补充更多信息】
## 附加信息(Additional Information)
### 数据集策展人(Dataset Curators)
【需补充更多信息】
### 许可证信息(Licensing Information)
【需补充更多信息】
### 引用信息(Citation Information)
【需补充更多信息】
### 贡献声明(Contributions)
感谢 [@tuner007](https://github.com/tuner007) 贡献此数据集。
提供机构:
highnote
原始信息汇总
数据集概述
数据集信息
- 状态: 信息待补充



