abacusai/WikiQA-Altered_Numeric_QA
收藏Hugging Face2024-01-17 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/abacusai/WikiQA-Altered_Numeric_QA
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
configs:
- config_name: default
data_files:
- split: 2k
path: data/2k-*
- split: 4k
path: data/4k-*
- split: 8k
path: data/8k-*
- split: 16k
path: data/16k-*
dataset_info:
features:
- name: conversations
list:
- name: from
dtype: string
- name: tok_len
dtype: int64
- name: value
dtype: string
splits:
- name: 2k
num_bytes: 2802096
num_examples: 456
- name: 4k
num_bytes: 5492874
num_examples: 456
- name: 8k
num_bytes: 10884816
num_examples: 456
- name: 16k
num_bytes: 19884934
num_examples: 456
download_size: 8163043
dataset_size: 39064720
---

# Dataset Card for "WikiQA-Altered_Numeric_QA"
The WikiQA task is the task of answering a question based on the information given in a Wikipedia document. We have built upon the short answer format data in Google Natural Questions to construct our QA task. It is formatted as a document and a question. We ensure the answer to the question is a short answer which is either a single word or a small sentence directly cut pasted from the document. Having the task structured as such, we can pinpoint exactly where the LLM was supposed to "look" for the answer in the context, and thus effectively evaluate every part of the expanded context length by carefully placing the answer in different locations.
We have selected large Wikipedia documents and have truncated them to get multiple versions of the same document with sizes varying between 2000 to 16000 tokens. For each size of the document, we also have multiple versions which place the question and the answer text at different locations i.e whether it occurs in the first 10%, the bulk or last 10% of the document. Having multiple version of the same document allows us to get a exhaustive and fair evaluation across model sizes, and within one model's context positions since we intrinsically are asking for the same information.
A potential issue in a Wikipedia based dataset is that the model could perhaps correctly answer from its pretrained corpus and not from context. To resolve this, we have created another “altered” dataset. This data only consists of questions which have numerical answers. Here, we change the answer and every occurrence of the answer in the document to a different number. Essentially making sure that if the LLM recollects from its pretrained corpus, it gives a wrong answer. The modification is made as follows:
If the answer is a year, which is quite frequent, (i.e. is between 1000-2100), we change it to a different random value within +/- 10 of the original value. We treat years as a special case so as to not make the interpretation of the document absurd by messing up choronological information
If the answer is any other number, we change it to a different random number which has the same number of digits
We call our original QA task [Free Form QA (FFQA)](url=https://huggingface.co/datasets/abacusai/WikiQA-Free_Form_QA) and the altered task Altered Numeric QA (AltQA).
提供机构:
abacusai
原始信息汇总
数据集概述
数据集名称
- 名称: WikiQA-Altered_Numeric_QA
数据集内容
- 任务: 基于Wikipedia文档回答问题。
- 格式: 文档和问题,答案为单个词或短句,直接从文档中提取。
- 特点: 使用大型Wikipedia文档,截断为不同长度的版本(2000至16000 tokens),并在不同位置放置问题和答案文本。
- 修改: 对于包含数值答案的问题,将答案和文档中的所有出现替换为不同的数值,确保模型不能从预训练数据中回忆答案。
数据集结构
- 特征:
- conversations:
- from: 字符串类型
- tok_len: 整数类型
- value: 字符串类型
- conversations:
- 分割:
- 2k: 456个示例,2802096字节
- 4k: 456个示例,5492874字节
- 8k: 456个示例,10884816字节
- 16k: 456个示例,19884934字节
数据集大小
- 下载大小: 8163043字节
- 数据集大小: 39064720字节
许可证
- 许可证: Apache-2.0



