bobox/OpenbookQA-4ST
收藏Hugging Face2024-07-09 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/bobox/OpenbookQA-4ST
下载链接
链接失效反馈官方服务:
资源简介:
OpenBookQA-forSentenceTransformers数据集是基于OpenBookQA数据集修改而来,专门为句子转换器(sentence transformers)设计。该数据集旨在促进高级问答研究,特别是需要多步推理、使用额外常识和丰富文本理解的问题。数据集包含两个配置:all和filtered,其中filtered版本是基于人类评估和清晰度分数筛选的子集,仅包含HumanScore > 0.9和Clarity > 1.4的问题。数据集的主要任务类别是问答和句子相似性,任务ID为开放域问答。数据集的语言为英语,大小为1K<n<10K,属于单语言数据集。数据集的来源是原始的OpenBookQA数据集,注释由众包和专家生成。数据集的结构包括问题、事实、答案、负面例子、人类评分和清晰度评分等字段。数据集的下载大小为2.89 MB,生成数据集的大小为2.88 MB,总磁盘使用量为5.78 MB。
The OpenBookQA-forSentenceTransformers dataset is a modified version of the OpenBookQA dataset, specifically designed for sentence transformers. This dataset aims to promote research in advanced question-answering, particularly questions that require multi-step reasoning, use of additional common and commonsense knowledge, and rich text comprehension. The dataset includes two configurations: all and filtered, where the filtered version is a subset based on human evaluation and clarity scores, including only questions with HumanScore > 0.9 and Clarity > 1.4. The primary task categories are question-answering and sentence similarity, with the task ID being open-domain QA. The dataset is in English, with a size of 1K<n<10K, and is monolingual. The source of the dataset is the original OpenBookQA dataset, with annotations created by crowdsourcing and expert generation. The dataset structure includes fields such as question, fact, answer, negatives, HumanScore, and Clarity. The download size of the dataset is 2.89 MB, the generated dataset size is 2.88 MB, and the total disk usage is 5.78 MB.
提供机构:
bobox
原始信息汇总
OpenBookQA-forSentenceTransformers 数据集概述
数据集描述
数据集摘要
- 语言: 英语
- 许可: 未知
- 多语言性: 单语种
- 数据集大小: 1K<n<10K
- 源数据集: 原始数据集
- 任务类别:
- 问答
- 句子相似度
- 任务ID: 开放领域问答
- PapersWithCode ID: openbookqa
- 数据集名称: OpenBookQA-forSentenceTransformers
数据集结构
配置
-
配置名称: all
- 特征:
question: 字符串fact: 字符串answer: 字符串negatives: 字符串序列HumanScore: 浮点数Clarity: 浮点数
- 分割:
train: 4957个样本, 1067030字节test: 500个样本, 108985字节validation: 500个样本, 114183字节
- 下载大小: 739046字节
- 数据集大小: 1290198字节
- 特征:
-
配置名称: filtered
- 特征:
question: 字符串fact: 字符串answer: 字符串negatives: 字符串序列HumanScore: 浮点数Clarity: 浮点数
- 分割:
train: 2740个样本, 589804.76字节test: 322个样本, 70186.34字节validation: 264个样本, 60288.62字节
- 下载大小: 418188字节
- 数据集大小: 720279.72字节
- 特征:
数据分割
| 名称 | 训练集 | 验证集 | 测试集 |
|---|---|---|---|
| all | 4957 | 500 | 500 |
| filtered | 2740 | 264 | 322 |
附加信息
许可信息
- 许可: 未知
- 原始数据集许可: 用户需参考原始OpenBookQA数据集的许可。
贡献
- 原始数据集由Allen Institute for AI提供。
- 数据集由bobox适配用于句子变换器,包括过滤和重构数据。
引用信息
@inproceedings{OpenBookQA2018, title={Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering}, author={Todor Mihaylov and Peter Clark and Tushar Khot and Ashish Sabharwal}, booktitle={EMNLP}, year={2018} }



