scan-tasks/scan-tasks

Name: scan-tasks/scan-tasks
Creator: scan-tasks
Published: 2024-01-18 11:15:22
License: 暂无描述

Hugging Face2024-01-18 更新2024-06-15 收录

下载链接：

https://hf-mirror.com/datasets/scan-tasks/scan-tasks

下载链接

链接失效反馈

官方服务：

资源简介：

SCAN数据集是一个用于研究组合学习和零样本泛化的简单语言驱动导航任务集合。它包含多个配置，每个配置都有训练和测试分割，数据字段包括命令和动作。数据集的主要用途是文本到文本生成任务，且数据集的规模在10K到100K之间。

The SCAN dataset is a simple language-driven navigation task collection for studying compositional learning and zero-shot generalization. It includes multiple configurations, each with training and test splits, and its data fields consist of commands and actions. The primary intended use of this dataset is for text-to-text generation tasks, and its scale ranges from 10K to 100K.

提供机构：

scan-tasks

原始信息汇总

数据集概述

基本信息

数据集名称: SCAN
语言: 英语
许可证: BSD
多语言性: 单语种
数据集大小: 10K<n<100K
源数据: 原始数据
任务类别: 文本到文本生成
标签: 多轮

数据集配置

配置名称: simple

特征:
- commands: 字符串
- actions: 字符串
分割:
- train: 3217770 字节, 16728 样本
- test: 799912 字节, 4182 样本
下载大小: 4080388 字节
数据集大小: 4017682 字节

配置名称: addprim_jump

特征:
- commands: 字符串
- actions: 字符串
分割:
- train: 2535625 字节, 14670 样本
- test: 1508445 字节, 7706 样本
下载大小: 4111174 字节
数据集大小: 4044070 字节

配置名称: addprim_turn_left

特征:
- commands: 字符串
- actions: 字符串
分割:
- train: 3908891 字节, 21890 样本
- test: 170063 字节, 1208 样本
下载大小: 4148216 字节
数据集大小: 4078954 字节

配置名称: filler_num0

特征:
- commands: 字符串
- actions: 字符串
分割:
- train: 2513034 字节, 15225 样本
- test: 330087 字节, 1173 样本
下载大小: 2892291 字节
数据集大小: 2843121 字节

配置名称: filler_num1

特征:
- commands: 字符串
- actions: 字符串
分割:
- train: 2802865 字节, 16290 样本
- test: 330087 字节, 1173 样本
下载大小: 3185317 字节
数据集大小: 3132952 字节

配置名称: filler_num2

特征:
- commands: 字符串
- actions: 字符串
分割:
- train: 3106220 字节, 17391 样本
- test: 330087 字节, 1173 样本
下载大小: 3491975 字节
数据集大小: 3436307 字节

配置名称: filler_num3

特征:
- commands: 字符串
- actions: 字符串
分割:
- train: 3412704 字节, 18528 样本
- test: 330087 字节, 1173 样本
下载大小: 3801870 字节
数据集大小: 3742791 字节

配置名称: length

特征:
- commands: 字符串
- actions: 字符串
分割:
- train: 2672464 字节, 16990 样本
- test: 1345218 字节, 3920 样本
下载大小: 4080388 字节
数据集大小: 4017682 字节

配置名称: template_around_right

特征:
- commands: 字符串
- actions: 字符串
分割:
- train: 2513034 字节, 15225 样本
- test: 1229757 字节, 4476 样本
下载大小: 3801870 字节
数据集大小: 3742791 字节

配置名称: template_jump_around_right

特征:
- commands: 字符串
- actions: 字符串
分割:
- train: 3412704 字节, 18528 样本
- test: 330087 字节, 1173 样本
下载大小: 3801870 字节
数据集大小: 3742791 字节

配置名称: template_opposite_right

特征:
- commands: 字符串
- actions: 字符串
分割:
- train: 2944398 字节, 15225 样本
- test: 857943 字节, 4476 样本
下载大小: 3861420 字节
数据集大小: 3802341 字节

配置名称: template_right

特征:
- commands: 字符串
- actions: 字符串
分割:
- train: 3127623 字节, 15225 样本
- test: 716403 字节, 4476 样本
下载大小: 3903105 字节
数据集大小: 3844026 字节

数据集创建

数据字段

commands: 字符串
actions: 字符串

数据分割

名称	训练集样本数	测试集样本数
addprim_jump	14670	7706
addprim_turn_left	21890	1208
filler_num0	15225	1173
filler_num1	16290	1173
filler_num2	17391	1173

引用信息

@inproceedings{Lake2018GeneralizationWS, title={Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks}, author={Brenden M. Lake and Marco Baroni}, booktitle={ICML}, year={2018}, url={https://arxiv.org/pdf/1711.00350.pdf}, }

搜集汇总

数据集介绍

背景与挑战

背景概述

scan-tasks数据集是一个用于研究组合学习和零样本泛化的语言驱动导航任务集合，包含多个子任务和相应的命令与动作字段，适用于文本到文本生成任务。数据集大小为10K到100K，语言为英语，许可证为BSD。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集