highnote/pubmed_qa

Name: highnote/pubmed_qa
Creator: highnote
Published: 2023-08-19 13:28:27
License: 暂无描述

Hugging Face2023-08-19 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/highnote/pubmed_qa

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - expert-generated - machine-generated language_creators: - expert-generated language: - en license: - mit multilinguality: - monolingual size_categories: - 100K<n<1M - 10K<n<100K - 1K<n<10K source_datasets: - original task_categories: - question-answering task_ids: - multiple-choice-qa paperswithcode_id: pubmedqa pretty_name: PubMedQA dataset_info: - config_name: pqa_labeled features: - name: pubid dtype: int32 - name: question dtype: string - name: context sequence: - name: contexts dtype: string - name: labels dtype: string - name: meshes dtype: string - name: reasoning_required_pred dtype: string - name: reasoning_free_pred dtype: string - name: long_answer dtype: string - name: final_decision dtype: string splits: - name: train num_bytes: 2089200 num_examples: 1000 download_size: 687882700 dataset_size: 2089200 - config_name: pqa_unlabeled features: - name: pubid dtype: int32 - name: question dtype: string - name: context sequence: - name: contexts dtype: string - name: labels dtype: string - name: meshes dtype: string - name: long_answer dtype: string splits: - name: train num_bytes: 125938502 num_examples: 61249 download_size: 687882700 dataset_size: 125938502 - config_name: pqa_artificial features: - name: pubid dtype: int32 - name: question dtype: string - name: context sequence: - name: contexts dtype: string - name: labels dtype: string - name: meshes dtype: string - name: long_answer dtype: string - name: final_decision dtype: string splits: - name: train num_bytes: 443554667 num_examples: 211269 download_size: 687882700 dataset_size: 443554667 config_names: - pqa_artificial - pqa_labeled - pqa_unlabeled duplicated_from: pubmed_qa --- # Dataset Card for [Dataset Name] ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [PUBMED_QA homepage](https://pubmedqa.github.io/ ) - **Repository:** [PUBMED_QA repository](https://github.com/pubmedqa/pubmedqa) - **Paper:** [PUBMED_QA: A Dataset for Biomedical Research Question Answering](https://arxiv.org/abs/1909.06146) - **Leaderboard:** [PUBMED_QA: Leaderboard](https://pubmedqa.github.io/) ### Dataset Summary [More Information Needed] ### Supported Tasks and Leaderboards [More Information Needed] ### Languages [More Information Needed] ## Dataset Structure ### Data Instances [More Information Needed] ### Data Fields [More Information Needed] ### Data Splits [More Information Needed] ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions Thanks to [@tuner007](https://github.com/tuner007) for adding this dataset.

标注生成方（annotations_creators）: - 专家生成（expert-generated） - 机器生成（machine-generated）语言生成方（language_creators）: - 专家生成（expert-generated）使用语言（language）: - 英语（en）许可证（license）: - MIT许可证（mit）多语言属性（multilinguality）: - 单语言（monolingual）样本规模分类（size_categories）: - 10万 < 样本数 < 100万 - 1万 < 样本数 < 10万 - 1千 < 样本数 < 1万源数据集（source_datasets）: - 原生数据集（original）任务类别（task_categories）: - 问答（question-answering）任务子类别（task_ids）: - 多选问答（multiple-choice-qa） PapersWithCode数据集ID（paperswithcode_id）: pubmedqa 数据集展示名（pretty_name）: PubMedQA 数据集详情（dataset_info): - 配置名（config_name）: pqa_标注版（pqa_labeled）特征字段（features）: - 字段名（name）: PubMed编号（pubid）数据类型（dtype）: 32位整数（int32） - 字段名（name）: 问题（question）数据类型（dtype）: 字符串（string） - 字段名（name）: 上下文（context）序列类型（sequence）: - 字段名（name）: 上下文文本（contexts）数据类型（dtype）: 字符串（string） - 字段名（name）: 标签（labels）数据类型（dtype）: 字符串（string） - 字段名（name）: 医学主题词（MeSH，meshes）数据类型（dtype）: 字符串（string） - 字段名（name）: 需推理预测项（reasoning_required_pred）数据类型（dtype）: 字符串（string） - 字段名（name）: 无需推理预测项（reasoning_free_pred）数据类型（dtype）: 字符串（string） - 字段名（name）: 长答案（long_answer）数据类型（dtype）: 字符串（string） - 字段名（name）: 最终判定（final_decision）数据类型（dtype）: 字符串（string）数据划分（splits）: - 划分名称（name）: 训练集（train）字节数（num_bytes）: 2089200 样本数（num_examples）: 1000 下载大小（download_size）: 687882700 数据集总大小（dataset_size）: 2089200 - 配置名（config_name）: pqa_未标注版（pqa_unlabeled）特征字段（features）: - 字段名（name）: PubMed编号（pubid）数据类型（dtype）: 32位整数（int32） - 字段名（name）: 问题（question）数据类型（dtype）: 字符串（string） - 字段名（name）: 上下文（context）序列类型（sequence）: - 字段名（name）: 上下文文本（contexts）数据类型（dtype）: 字符串（string） - 字段名（name）: 标签（labels）数据类型（dtype）: 字符串（string） - 字段名（name）: 医学主题词（MeSH，meshes）数据类型（dtype）: 字符串（string） - 字段名（name）: 长答案（long_answer）数据类型（dtype）: 字符串（string）数据划分（splits）: - 划分名称（name）: 训练集（train）字节数（num_bytes）: 125938502 样本数（num_examples）: 61249 下载大小（download_size）: 687882700 数据集总大小（dataset_size）: 125938502 - 配置名（config_name）: pqa_人工合成版（pqa_artificial）特征字段（features）: - 字段名（name）: PubMed编号（pubid）数据类型（dtype）: 32位整数（int32） - 字段名（name）: 问题（question）数据类型（dtype）: 字符串（string） - 字段名（name）: 上下文（context）序列类型（sequence）: - 字段名（name）: 上下文文本（contexts）数据类型（dtype）: 字符串（string） - 字段名（name）: 标签（labels）数据类型（dtype）: 字符串（string） - 字段名（name）: 医学主题词（MeSH，meshes）数据类型（dtype）: 字符串（string） - 字段名（name）: 长答案（long_answer）数据类型（dtype）: 字符串（string） - 字段名（name）: 最终判定（final_decision）数据类型（dtype）: 字符串（string）数据划分（splits）: - 划分名称（name）: 训练集（train）字节数（num_bytes）: 443554667 样本数（num_examples）: 211269 下载大小（download_size）: 687882700 数据集总大小（dataset_size）: 443554667 配置名列表（config_names）: - pqa_人工合成版（pqa_artificial） - pqa_标注版（pqa_labeled） - pqa_未标注版（pqa_unlabeled）派生数据集来源（duplicated_from）: pubmed_qa # 数据集卡片（PubMedQA） ## 目录（Table of Contents） - [数据集描述（Dataset Description）](#dataset-description) - [数据集概述（Dataset Summary）](#dataset-summary) - [支持的任务与排行榜（Supported Tasks and Leaderboards）](#supported-tasks-and-leaderboards) - [使用语言（Languages）](#languages) - [数据集结构（Dataset Structure）](#dataset-structure) - [数据实例（Data Instances）](#data-instances) - [数据字段（Data Fields）](#data-fields) - [数据划分（Data Splits）](#data-splits) - [数据集构建（Dataset Creation）](#dataset-creation) - [构建依据（Curation Rationale）](#curation-rationale) - [源数据（Source Data）](#source-data) - [标注信息（Annotations）](#annotations) - [个人与敏感信息（Personal and Sensitive Information）](#personal-and-sensitive-information) - [数据集使用注意事项（Considerations for Using the Data）](#considerations-for-using-the-data) - [数据集的社会影响（Social Impact of Dataset）](#social-impact-of-dataset) - [偏见讨论（Discussion of Biases）](#discussion-of-biases) - [其他已知局限性（Other Known Limitations）](#other-known-limitations) - [附加信息（Additional Information）](#additional-information) - [数据集策展人（Dataset Curators）](#dataset-curators) - [许可证信息（Licensing Information）](#licensing-information) - [引用信息（Citation Information）](#citation-information) - [贡献声明（Contributions）](#contributions) ## 数据集描述（Dataset Description） - **主页：** [PubMedQA 官方主页](https://pubmedqa.github.io/) - **代码仓库：** [PubMedQA 代码仓库](https://github.com/pubmedqa/pubmedqa) - **学术论文：** [PubMedQA：面向生物医学研究问答的数据集](https://arxiv.org/abs/1909.06146) - **排行榜：** [PubMedQA 排行榜](https://pubmedqa.github.io/) ### 数据集概述（Dataset Summary）【需补充更多信息】 ### 支持的任务与排行榜（Supported Tasks and Leaderboards）【需补充更多信息】 ### 使用语言（Languages）【需补充更多信息】 ## 数据集结构（Dataset Structure） ### 数据实例（Data Instances）【需补充更多信息】 ### 数据字段（Data Fields）【需补充更多信息】 ### 数据划分（Data Splits）【需补充更多信息】 ## 数据集构建（Dataset Creation） ### 构建依据（Curation Rationale）【需补充更多信息】 ### 源数据（Source Data） #### 初始数据收集与标准化（Initial Data Collection and Normalization）【需补充更多信息】 #### 源语言生产者是谁？（Who are the source language producers?）【需补充更多信息】 ### 标注信息（Annotations） #### 标注流程（Annotation process）【需补充更多信息】 #### 标注者是谁？（Who are the annotators?）【需补充更多信息】 ### 个人与敏感信息（Personal and Sensitive Information）【需补充更多信息】 ## 数据集使用注意事项（Considerations for Using the Data） ### 数据集的社会影响（Social Impact of Dataset）【需补充更多信息】 ### 偏见讨论（Discussion of Biases）【需补充更多信息】 ### 其他已知局限性（Other Known Limitations）【需补充更多信息】 ## 附加信息（Additional Information） ### 数据集策展人（Dataset Curators）【需补充更多信息】 ### 许可证信息（Licensing Information）【需补充更多信息】 ### 引用信息（Citation Information）【需补充更多信息】 ### 贡献声明（Contributions）感谢 [@tuner007](https://github.com/tuner007) 贡献此数据集。

提供机构：

highnote

原始信息汇总

数据集概述

数据集信息

状态: 信息待补充

5,000+

优质数据集

54 个

任务类型

进入经典数据集