five

highnote/pubmed_qa

收藏
Hugging Face2023-08-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/highnote/pubmed_qa
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - expert-generated - machine-generated language_creators: - expert-generated language: - en license: - mit multilinguality: - monolingual size_categories: - 100K<n<1M - 10K<n<100K - 1K<n<10K source_datasets: - original task_categories: - question-answering task_ids: - multiple-choice-qa paperswithcode_id: pubmedqa pretty_name: PubMedQA dataset_info: - config_name: pqa_labeled features: - name: pubid dtype: int32 - name: question dtype: string - name: context sequence: - name: contexts dtype: string - name: labels dtype: string - name: meshes dtype: string - name: reasoning_required_pred dtype: string - name: reasoning_free_pred dtype: string - name: long_answer dtype: string - name: final_decision dtype: string splits: - name: train num_bytes: 2089200 num_examples: 1000 download_size: 687882700 dataset_size: 2089200 - config_name: pqa_unlabeled features: - name: pubid dtype: int32 - name: question dtype: string - name: context sequence: - name: contexts dtype: string - name: labels dtype: string - name: meshes dtype: string - name: long_answer dtype: string splits: - name: train num_bytes: 125938502 num_examples: 61249 download_size: 687882700 dataset_size: 125938502 - config_name: pqa_artificial features: - name: pubid dtype: int32 - name: question dtype: string - name: context sequence: - name: contexts dtype: string - name: labels dtype: string - name: meshes dtype: string - name: long_answer dtype: string - name: final_decision dtype: string splits: - name: train num_bytes: 443554667 num_examples: 211269 download_size: 687882700 dataset_size: 443554667 config_names: - pqa_artificial - pqa_labeled - pqa_unlabeled duplicated_from: pubmed_qa --- # Dataset Card for [Dataset Name] ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [PUBMED_QA homepage](https://pubmedqa.github.io/ ) - **Repository:** [PUBMED_QA repository](https://github.com/pubmedqa/pubmedqa) - **Paper:** [PUBMED_QA: A Dataset for Biomedical Research Question Answering](https://arxiv.org/abs/1909.06146) - **Leaderboard:** [PUBMED_QA: Leaderboard](https://pubmedqa.github.io/) ### Dataset Summary [More Information Needed] ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions Thanks to [@tuner007](https://github.com/tuner007) for adding this dataset.

标注生成方(annotations_creators): - 专家生成(expert-generated) - 机器生成(machine-generated) 语言生成方(language_creators): - 专家生成(expert-generated) 使用语言(language): - 英语(en) 许可证(license): - MIT许可证(mit) 多语言属性(multilinguality): - 单语言(monolingual) 样本规模分类(size_categories): - 10万 < 样本数 < 100万 - 1万 < 样本数 < 10万 - 1千 < 样本数 < 1万 源数据集(source_datasets): - 原生数据集(original) 任务类别(task_categories): - 问答(question-answering) 任务子类别(task_ids): - 多选问答(multiple-choice-qa) PapersWithCode数据集ID(paperswithcode_id): pubmedqa 数据集展示名(pretty_name): PubMedQA 数据集详情(dataset_info): - 配置名(config_name): pqa_标注版(pqa_labeled) 特征字段(features): - 字段名(name): PubMed编号(pubid) 数据类型(dtype): 32位整数(int32) - 字段名(name): 问题(question) 数据类型(dtype): 字符串(string) - 字段名(name): 上下文(context) 序列类型(sequence): - 字段名(name): 上下文文本(contexts) 数据类型(dtype): 字符串(string) - 字段名(name): 标签(labels) 数据类型(dtype): 字符串(string) - 字段名(name): 医学主题词(MeSH,meshes) 数据类型(dtype): 字符串(string) - 字段名(name): 需推理预测项(reasoning_required_pred) 数据类型(dtype): 字符串(string) - 字段名(name): 无需推理预测项(reasoning_free_pred) 数据类型(dtype): 字符串(string) - 字段名(name): 长答案(long_answer) 数据类型(dtype): 字符串(string) - 字段名(name): 最终判定(final_decision) 数据类型(dtype): 字符串(string) 数据划分(splits): - 划分名称(name): 训练集(train) 字节数(num_bytes): 2089200 样本数(num_examples): 1000 下载大小(download_size): 687882700 数据集总大小(dataset_size): 2089200 - 配置名(config_name): pqa_未标注版(pqa_unlabeled) 特征字段(features): - 字段名(name): PubMed编号(pubid) 数据类型(dtype): 32位整数(int32) - 字段名(name): 问题(question) 数据类型(dtype): 字符串(string) - 字段名(name): 上下文(context) 序列类型(sequence): - 字段名(name): 上下文文本(contexts) 数据类型(dtype): 字符串(string) - 字段名(name): 标签(labels) 数据类型(dtype): 字符串(string) - 字段名(name): 医学主题词(MeSH,meshes) 数据类型(dtype): 字符串(string) - 字段名(name): 长答案(long_answer) 数据类型(dtype): 字符串(string) 数据划分(splits): - 划分名称(name): 训练集(train) 字节数(num_bytes): 125938502 样本数(num_examples): 61249 下载大小(download_size): 687882700 数据集总大小(dataset_size): 125938502 - 配置名(config_name): pqa_人工合成版(pqa_artificial) 特征字段(features): - 字段名(name): PubMed编号(pubid) 数据类型(dtype): 32位整数(int32) - 字段名(name): 问题(question) 数据类型(dtype): 字符串(string) - 字段名(name): 上下文(context) 序列类型(sequence): - 字段名(name): 上下文文本(contexts) 数据类型(dtype): 字符串(string) - 字段名(name): 标签(labels) 数据类型(dtype): 字符串(string) - 字段名(name): 医学主题词(MeSH,meshes) 数据类型(dtype): 字符串(string) - 字段名(name): 长答案(long_answer) 数据类型(dtype): 字符串(string) - 字段名(name): 最终判定(final_decision) 数据类型(dtype): 字符串(string) 数据划分(splits): - 划分名称(name): 训练集(train) 字节数(num_bytes): 443554667 样本数(num_examples): 211269 下载大小(download_size): 687882700 数据集总大小(dataset_size): 443554667 配置名列表(config_names): - pqa_人工合成版(pqa_artificial) - pqa_标注版(pqa_labeled) - pqa_未标注版(pqa_unlabeled) 派生数据集来源(duplicated_from): pubmed_qa # 数据集卡片(PubMedQA) ## 目录(Table of Contents) - [数据集描述(Dataset Description)](#dataset-description) - [数据集概述(Dataset Summary)](#dataset-summary) - [支持的任务与排行榜(Supported Tasks and Leaderboards)](#supported-tasks-and-leaderboards) - [使用语言(Languages)](#languages) - [数据集结构(Dataset Structure)](#dataset-structure) - [数据实例(Data Instances)](#data-instances) - [数据字段(Data Fields)](#data-fields) - [数据划分(Data Splits)](#data-splits) - [数据集构建(Dataset Creation)](#dataset-creation) - [构建依据(Curation Rationale)](#curation-rationale) - [源数据(Source Data)](#source-data) - [标注信息(Annotations)](#annotations) - [个人与敏感信息(Personal and Sensitive Information)](#personal-and-sensitive-information) - [数据集使用注意事项(Considerations for Using the Data)](#considerations-for-using-the-data) - [数据集的社会影响(Social Impact of Dataset)](#social-impact-of-dataset) - [偏见讨论(Discussion of Biases)](#discussion-of-biases) - [其他已知局限性(Other Known Limitations)](#other-known-limitations) - [附加信息(Additional Information)](#additional-information) - [数据集策展人(Dataset Curators)](#dataset-curators) - [许可证信息(Licensing Information)](#licensing-information) - [引用信息(Citation Information)](#citation-information) - [贡献声明(Contributions)](#contributions) ## 数据集描述(Dataset Description) - **主页:** [PubMedQA 官方主页](https://pubmedqa.github.io/) - **代码仓库:** [PubMedQA 代码仓库](https://github.com/pubmedqa/pubmedqa) - **学术论文:** [PubMedQA:面向生物医学研究问答的数据集](https://arxiv.org/abs/1909.06146) - **排行榜:** [PubMedQA 排行榜](https://pubmedqa.github.io/) ### 数据集概述(Dataset Summary) 【需补充更多信息】 ### 支持的任务与排行榜(Supported Tasks and Leaderboards) 【需补充更多信息】 ### 使用语言(Languages) 【需补充更多信息】 ## 数据集结构(Dataset Structure) ### 数据实例(Data Instances) 【需补充更多信息】 ### 数据字段(Data Fields) 【需补充更多信息】 ### 数据划分(Data Splits) 【需补充更多信息】 ## 数据集构建(Dataset Creation) ### 构建依据(Curation Rationale) 【需补充更多信息】 ### 源数据(Source Data) #### 初始数据收集与标准化(Initial Data Collection and Normalization) 【需补充更多信息】 #### 源语言生产者是谁?(Who are the source language producers?) 【需补充更多信息】 ### 标注信息(Annotations) #### 标注流程(Annotation process) 【需补充更多信息】 #### 标注者是谁?(Who are the annotators?) 【需补充更多信息】 ### 个人与敏感信息(Personal and Sensitive Information) 【需补充更多信息】 ## 数据集使用注意事项(Considerations for Using the Data) ### 数据集的社会影响(Social Impact of Dataset) 【需补充更多信息】 ### 偏见讨论(Discussion of Biases) 【需补充更多信息】 ### 其他已知局限性(Other Known Limitations) 【需补充更多信息】 ## 附加信息(Additional Information) ### 数据集策展人(Dataset Curators) 【需补充更多信息】 ### 许可证信息(Licensing Information) 【需补充更多信息】 ### 引用信息(Citation Information) 【需补充更多信息】 ### 贡献声明(Contributions) 感谢 [@tuner007](https://github.com/tuner007) 贡献此数据集。
提供机构:
highnote
原始信息汇总

数据集概述

数据集信息

  • 状态: 信息待补充
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作