Muennighoff/babi
收藏Hugging Face2023-02-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Muennighoff/babi
下载链接
链接失效反馈官方服务:
资源简介:
Creation (Copied & adapted from https://github.com/stanford-crfm/helm/blob/0eaaa62a2263ddb94e9850ee629423b010f57e4a/src/helm/benchmark/scenarios/babi_qa_scenario.py):
```python
!wget http://www.thespermwhale.com/jaseweston/babi/tasks_1-20_v1-2.tar.gz
!tar -xf tasks_1-20_v1-2.tar.gz
import json
from typing import List
tasks = list(range(1, 20))
splits = ["train", "valid", "test"]
def process_path(path: str) -> str:
"""Turn a path string (task 19) from the original format 's,w' to a verbal model-friendly format 'south west'"""
steps: List[str] = path.split(",")
directions = {"s": "south", "n": "north", "e": "east", "w": "west"}
path = " ".join([directions[step] for step in steps])
return path
for split in splits:
with open(f"babi_{split}.jsonl", "w") as f_base:
for task in tasks:
split_path: str = f"./tasks_1-20_v1-2/en-valid/qa{task}_{split}.txt"
with open(split_path, "r") as f:
facts = list(f)
story: List[str] = []
for fact in facts:
fid = int(fact.split(" ")[0])
if fid == 1:
story = []
fact = " ".join(fact.split(" ")[1:])
is_question = "?" in fact
if is_question:
question, answer = fact.split("\t")[:2]
question, answer = question.strip(), answer.strip()
# All tasks except task 19 have a verbal single-word answer (e.g. kitchen, apple, yes).
# Task 19 (path finding) has a non verbal answer format (
if task == 19:
answer = process_path(answer)
f_base.write(json.dumps({
"passage": "".join(story),
"question": question,
"answer": answer,
"task": task,
}) + "\n")
if "?" in story:
print("STORY", "".join(story))
else:
story.append(fact)
```
提供机构:
Muennighoff
原始信息汇总
数据集概述
数据集来源
- 数据集来源于文件
tasks_1-20_v1-2.tar.gz,通过wget命令从指定URL下载并解压。
数据集结构
- 数据集包含20个任务,编号从1到20。
- 每个任务分为三个部分:训练集(train)、验证集(valid)和测试集(test)。
数据处理
- 数据处理包括将原始文本文件转换为JSON格式,并存储在
babi_{split}.jsonl文件中。 - 对于任务19,其答案格式为方向序列(如s,w),通过
process_path函数转换为文字描述(如south west)。
数据内容
- 每个JSON记录包含以下字段:
passage: 故事文本question: 问题answer: 答案task: 任务编号



