VyoJ/calvin-ABCD-D-subsets
收藏Hugging Face2026-01-08 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/VyoJ/calvin-ABCD-D-subsets
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
language:
- en
size_categories:
- 1M<n<10M
---
# CALVIN Dataset - task_ABCD_D (Structured Subsets)
This repository contains the CALVIN task_ABCD_D dataset split into structured subsets for easier downloading and processing.
## Original Source
- URL: http://calvin.cs.uni-freiburg.de/dataset/task_ABCD_D.zip
- Original Size: ~600GB
## Structure
Each subset is a complete, self-contained dataset with the proper structure:
```
subset_training_000/
└── training/
├── scene_info.npy
├── lang_annotations/
│ └── auto_lang_ann.npy
├── ep_lens.npy
├── ep_start_end_ids.npy
├── episode_XXXXXXX.npz
└── ...
```
This structure is compatible with the CALVIN processing pipeline.
## Training Subsets (24 total)
| Subset | Episodes | Size |
|--------|----------|------|
| subset_training_000 | 100000 (episode_0037682.npz - episode_0153816.npz) | 27.19 GB |
| subset_training_001 | 100000 (episode_0153817.npz - episode_0278465.npz) | 27.47 GB |
| subset_training_002 | 100000 (episode_0278466.npz - episode_0378465.npz) | 26.85 GB |
| subset_training_003 | 100000 (episode_0378466.npz - episode_0499017.npz) | 26.82 GB |
| subset_training_004 | 100000 (episode_0499018.npz - episode_0599017.npz) | 26.96 GB |
| subset_training_005 | 100000 (episode_0599018.npz - episode_0699017.npz) | 27.63 GB |
| subset_training_006 | 100000 (episode_0699018.npz - episode_0799017.npz) | 27.85 GB |
| subset_training_007 | 100000 (episode_0799018.npz - episode_0899017.npz) | 27.91 GB |
| subset_training_008 | 100000 (episode_0899018.npz - episode_0999017.npz) | 27.65 GB |
| subset_training_009 | 100000 (episode_0999018.npz - episode_1099017.npz) | 27.93 GB |
| subset_training_010 | 100000 (episode_1099018.npz - episode_1199017.npz) | 27.78 GB |
| subset_training_011 | 100000 (episode_1199018.npz - episode_1299017.npz) | 27.04 GB |
| subset_training_012 | 100000 (episode_1299018.npz - episode_1399017.npz) | 27.32 GB |
| subset_training_013 | 100000 (episode_1399018.npz - episode_1499017.npz) | 27.74 GB |
| subset_training_014 | 100000 (episode_1499018.npz - episode_1599017.npz) | 27.82 GB |
| subset_training_015 | 100000 (episode_1599018.npz - episode_1699017.npz) | 28.17 GB |
| subset_training_016 | 100000 (episode_1699018.npz - episode_1799017.npz) | 28.24 GB |
| subset_training_017 | 100000 (episode_1799018.npz - episode_1899017.npz) | 26.27 GB |
| subset_training_018 | 100000 (episode_1899018.npz - episode_1999017.npz) | 25.78 GB |
| subset_training_019 | 100000 (episode_1999018.npz - episode_2099017.npz) | 26.02 GB |
| subset_training_020 | 100000 (episode_2099018.npz - episode_2199017.npz) | 27.08 GB |
| subset_training_021 | 100000 (episode_2199018.npz - episode_2299017.npz) | 26.93 GB |
| subset_training_022 | 100000 (episode_2299018.npz - episode_2399017.npz) | 26.86 GB |
| subset_training_023 | 7126 (episode_2399018.npz - episode_2406143.npz) | 2.01 GB |
## Validation Subsets (1 total)
| Subset | Episodes | Size |
|--------|----------|------|
| subset_validation_000 | 99022 (episode_0000000.npz - episode_0420498.npz) | 26.66 GB |
## How to Use
### Download a specific subset:
```bash
# Using huggingface-cli
huggingface-cli download VyoJ/calvin-ABCD-D-subsets training/subset_training_000.zip --local-dir ./
# Or using Python
from huggingface_hub import hf_hub_download
hf_hub_download(
repo_id="VyoJ/calvin-ABCD-D-subsets",
filename="training/subset_training_000.zip",
repo_type="dataset",
local_dir="./"
)
```
### Extract and process:
```bash
cd training
unzip subset_training_000.zip
# Now you have subset_training_000/training/ with all needed files
```
### Process with CALVIN pipeline:
Point your pipeline to the subset directory (e.g., `subset_training_000/`) and it will work as if processing the full dataset.
## Reassembling Full Dataset
If you want to reassemble the full dataset:
1. Download all subsets for a split
2. Extract each subset
3. Merge episode files into a single directory
```python
import shutil
from pathlib import Path
# After extracting all subsets
output_dir = Path("full_training")
output_dir.mkdir(exist_ok=True)
# Copy metadata from first subset
first_subset = Path("subset_training_000/training")
shutil.copy(first_subset / "scene_info.npy", output_dir)
shutil.copytree(first_subset / "lang_annotations", output_dir / "lang_annotations")
# Copy all episodes from all subsets
for subset_dir in sorted(Path(".").glob("subset_training_*/training")):
for ep_file in subset_dir.glob("episode_*.npz"):
shutil.copy(ep_file, output_dir)
```
许可证:MIT许可证
语言:英语
大小范畴:1M<n<10M
# CALVIN 数据集 - task_ABCD_D(结构化子集)
本仓库包含拆分后的结构化CALVIN task_ABCD_D数据集,旨在简化下载与处理流程。
## 原始来源
- 下载链接:http://calvin.cs.uni-freiburg.de/dataset/task_ABCD_D.zip
- 原始数据集大小:约600GB
## 数据集结构
每个子集均为完整且自包含的数据集,具备标准规范的目录结构:
subset_training_000/
└── training/
├── scene_info.npy
├── lang_annotations/
│ └── auto_lang_ann.npy
├── ep_lens.npy
├── ep_start_end_ids.npy
├── episode_XXXXXXX.npz
└── ...
该目录结构兼容CALVIN官方处理流水线。
## 训练子集(共24个)
| 子集名称 | 任务回合数 | 大小 |
|-------------------|------------|----------|
| subset_training_000 | 100000(episode_0037682.npz 至 episode_0153816.npz) | 27.19 GB |
| subset_training_001 | 100000(episode_0153817.npz 至 episode_0278465.npz) | 27.47 GB |
| subset_training_002 | 100000(episode_0278466.npz 至 episode_0378465.npz) | 26.85 GB |
| subset_training_003 | 100000(episode_0378466.npz 至 episode_0499017.npz) | 26.82 GB |
| subset_training_004 | 100000(episode_0499018.npz 至 episode_0599017.npz) | 26.96 GB |
| subset_training_005 | 100000(episode_0599018.npz 至 episode_0699017.npz) | 27.63 GB |
| subset_training_006 | 100000(episode_0699018.npz 至 episode_0799017.npz) | 27.85 GB |
| subset_training_007 | 100000(episode_0799018.npz 至 episode_0899017.npz) | 27.91 GB |
| subset_training_008 | 100000(episode_0899018.npz 至 episode_0999017.npz) | 27.65 GB |
| subset_training_009 | 100000(episode_0999018.npz 至 episode_1099017.npz) | 27.93 GB |
| subset_training_010 | 100000(episode_1099018.npz 至 episode_1199017.npz) | 27.78 GB |
| subset_training_011 | 100000(episode_1199018.npz 至 episode_1299017.npz) | 27.04 GB |
| subset_training_012 | 100000(episode_1299018.npz 至 episode_1399017.npz) | 27.32 GB |
| subset_training_013 | 100000(episode_1399018.npz 至 episode_1499017.npz) | 27.74 GB |
| subset_training_014 | 100000(episode_1499018.npz 至 episode_1599017.npz) | 27.82 GB |
| subset_training_015 | 100000(episode_1599018.npz 至 episode_1699017.npz) | 28.17 GB |
| subset_training_016 | 100000(episode_1699018.npz 至 episode_1799017.npz) | 28.24 GB |
| subset_training_017 | 100000(episode_1799018.npz 至 episode_1899017.npz) | 26.27 GB |
| subset_training_018 | 100000(episode_1899018.npz 至 episode_1999017.npz) | 25.78 GB |
| subset_training_019 | 100000(episode_1999018.npz 至 episode_2099017.npz) | 26.02 GB |
| subset_training_020 | 100000(episode_2099018.npz 至 episode_2199017.npz) | 27.08 GB |
| subset_training_021 | 100000(episode_2199018.npz 至 episode_2299017.npz) | 26.93 GB |
| subset_training_022 | 100000(episode_2299018.npz 至 episode_2399017.npz) | 26.86 GB |
| subset_training_023 | 7126(episode_2399018.npz 至 episode_2406143.npz) | 2.01 GB |
## 验证子集(共1个)
| 子集名称 | 任务回合数 | 大小 |
|-----------------------|------------|----------|
| subset_validation_000 | 99022(episode_0000000.npz 至 episode_0420498.npz) | 26.66 GB |
## 使用方法
### 下载指定子集
bash
# 使用huggingface-cli工具
huggingface-cli download VyoJ/calvin-ABCD-D-subsets training/subset_training_000.zip --local-dir ./
# 或使用Python代码
from huggingface_hub import hf_hub_download
hf_hub_download(
repo_id="VyoJ/calvin-ABCD-D-subsets",
filename="training/subset_training_000.zip",
repo_type="dataset",
local_dir="./"
)
### 解压与处理
bash
cd training
unzip subset_training_000.zip
# 此时将得到包含所需全部文件的 subset_training_000/training/ 目录
### 使用CALVIN处理流水线
将处理流水线指向对应子集目录(例如`subset_training_000/`),即可如同处理完整数据集一般正常运行。
## 重组完整数据集
若需重组完整数据集,请按以下步骤操作:
1. 下载对应拆分方式的全部子集
2. 解压每个子集
3. 将所有任务回合文件合并至单个目录
python
import shutil
from pathlib import Path
# 解压所有子集后执行以下代码
output_dir = Path("full_training")
output_dir.mkdir(exist_ok=True)
# 从第一个子集复制元数据文件
first_subset = Path("subset_training_000/training")
shutil.copy(first_subset / "scene_info.npy", output_dir)
shutil.copytree(first_subset / "lang_annotations", output_dir / "lang_annotations")
# 从所有子集复制所有任务回合文件
for subset_dir in sorted(Path(".").glob("subset_training_*/training")):
for ep_file in subset_dir.glob("episode_*.npz"):
shutil.copy(ep_file, output_dir)
提供机构:
VyoJ



