LeonOverload/primo-sft-json
收藏Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/LeonOverload/primo-sft-json
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
pretty_name: PRIMO SFT JSON
task_categories:
- video-text-to-text
language:
- en
configs:
- config_name: all
data_files:
- split: train
path: "jsonl/part-*.jsonl"
- config_name: agibot
data_files:
- split: train
path: "jsonl_subsets/agibot/part-*.jsonl"
- config_name: behavior-1k
data_files:
- split: train
path: "jsonl_subsets/behavior-1k/part-*.jsonl"
- config_name: nextqa
data_files:
- split: train
path: "jsonl_subsets/nextqa/part-*.jsonl"
- config_name: perceptiontest
data_files:
- split: train
path: "jsonl_subsets/perceptiontest/part-*.jsonl"
- config_name: robotwin-clean
data_files:
- split: train
path: "jsonl_subsets/robotwin-clean/part-*.jsonl"
- config_name: robotwin-randomized
data_files:
- split: train
path: "jsonl_subsets/robotwin-randomized/part-*.jsonl"
- config_name: robovqa
data_files:
- split: train
path: "jsonl_subsets/robovqa/part-*.jsonl"
- config_name: seed-bench-r1
data_files:
- split: train
path: "jsonl_subsets/seed-bench-r1/part-*.jsonl"
- config_name: sharerobot
data_files:
- split: train
path: "jsonl_subsets/sharerobot/part-*.jsonl"
- config_name: star
data_files:
- split: train
path: "jsonl_subsets/star/part-*.jsonl"
---
# PRIMO SFT JSON
This repository contains JSON annotations for **PRIMO**.
## What Is Included
- `raw_json/`: original JSON files copied from the PRIMO release layout
- `jsonl/`: flattened JSONL shards for better Hugging Face Data Studio preview
- `jsonl_subsets/`: subset-specific JSONL shards used by Dataset Viewer config selector
- `summary.json`: row/shard metadata generated at build time
## Split Type
- Task: Supervised Fine-Tuning
- Source pattern: `primo-sft/*/train_cot.json`
## Media Mapping
This repo stores annotations only.
Media files (videos/frames) should be prepared in a local folder like:
- `./primo-video/...`
The `path`, `init_frame_path`, and `current_frame_path` fields are expected to resolve against your local `PRIMO-Data` root.
## Quick Load Example
```python
import json
from pathlib import Path
root = Path(".")
jsonl_dir = root / "jsonl"
rows = []
for fp in sorted(jsonl_dir.glob("part-*.jsonl")):
with fp.open("r", encoding="utf-8") as f:
for line in f:
rows.append(json.loads(line))
print(len(rows))
```
## Build Metadata
- Total rows: **116755**
- Shards: **3**
- Shard size: **50000**
## Viewer Subsets
The Hugging Face Dataset Viewer subset selector maps to these configs:
- `all`
- `agibot`
- `behavior-1k`
- `nextqa`
- `perceptiontest`
- `robotwin-clean`
- `robotwin-randomized`
- `robovqa`
- `seed-bench-r1`
- `sharerobot`
- `star`
## Citations
If you find our work helpful for your research, please consider citing our work.
```
@misc{liu2026passiveobserveractivecritic,
title={From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation},
author={Yibin Liu and Yaxing Lyu and Daqi Gao and Zhixuan Liang and Weiliang Tang and Shilong Mu and Xiaokang Yang and Yao Mu},
year={2026},
eprint={2603.15600},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2603.15600},
}
```
许可证:Apache 2.0
易读名称:PRIMO SFT JSON
任务类别:
- 视频-文本到文本
语言:
- 英语
配置项:
- 配置名称:all
数据文件:
- 拆分:训练集
路径:"jsonl/part-*.jsonl"
- 配置名称:agibot
数据文件:
- 拆分:训练集
路径:"jsonl_subsets/agibot/part-*.jsonl"
- 配置名称:behavior-1k
数据文件:
- 拆分:训练集
路径:"jsonl_subsets/behavior-1k/part-*.jsonl"
- 配置名称:nextqa
数据文件:
- 拆分:训练集
路径:"jsonl_subsets/nextqa/part-*.jsonl"
- 配置名称:perceptiontest
数据文件:
- 拆分:训练集
路径:"jsonl_subsets/perceptiontest/part-*.jsonl"
- 配置名称:robotwin-clean
数据文件:
- 拆分:训练集
路径:"jsonl_subsets/robotwin-clean/part-*.jsonl"
- 配置名称:robotwin-randomized
数据文件:
- 拆分:训练集
路径:"jsonl_subsets/robotwin-randomized/part-*.jsonl"
- 配置名称:robovqa
数据文件:
- 拆分:训练集
路径:"jsonl_subsets/robovqa/part-*.jsonl"
- 配置名称:seed-bench-r1
数据文件:
- 拆分:训练集
路径:"jsonl_subsets/seed-bench-r1/part-*.jsonl"
- 配置名称:sharerobot
数据文件:
- 拆分:训练集
路径:"jsonl_subsets/sharerobot/part-*.jsonl"
- 配置名称:star
数据文件:
- 拆分:训练集
路径:"jsonl_subsets/star/part-*.jsonl"
---
# PRIMO SFT JSON
本仓库包含针对**PRIMO**的JSON标注文件。
## 包含内容
- `raw_json/`:复刻自PRIMO原始发布目录结构的原生JSON文件
- `jsonl/`:经扁平化处理的JSONL(JSON Lines)分片文件,用于优化Hugging Face数据工作室的预览体验
- `jsonl_subsets/`:供Hugging Face数据集查看器配置选择器使用的分子集JSONL分片文件
- `summary.json`:数据集构建阶段生成的行数据与分片元数据文件
## 拆分类型
- 任务类型:监督微调(Supervised Fine-Tuning)
- 源文件模式:`primo-sft/*/train_cot.json`
## 媒体映射
本仓库仅存储标注文件。
媒体文件(视频/帧)需按如下格式存放于本地文件夹中:
- `./primo-video/...`
数据集中的`path`、`init_frame_path`与`current_frame_path`字段需以本地`PRIMO-Data`根目录为基准进行路径解析。
## 快速加载示例
python
import json
from pathlib import Path
root = Path(".")
jsonl_dir = root / "jsonl"
rows = []
for fp in sorted(jsonl_dir.glob("part-*.jsonl")):
with fp.open("r", encoding="utf-8") as f:
for line in f:
rows.append(json.loads(line))
print(len(rows))
## 构建元数据
- 总数据行数:**116755**
- 分片数量:**3**
- 单分片大小:**50000**
## 查看器子集
Hugging Face数据集查看器的子集选择器对应如下配置:
- `all`
- `agibot`
- `behavior-1k`
- `nextqa`
- `perceptiontest`
- `robotwin-clean`
- `robotwin-randomized`
- `robovqa`
- `seed-bench-r1`
- `sharerobot`
- `star`
## 引用说明
若您的研究工作得益于本仓库的内容,请引用我们的成果:
@misc{liu2026passiveobserveractivecritic,
title={From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation},
author={Yibin Liu and Yaxing Lyu and Daqi Gao and Zhixuan Liang and Weiliang Tang and Shilong Mu and Xiaokang Yang and Yao Mu},
year={2026},
eprint={2603.15600},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2603.15600},
}
提供机构:
LeonOverload



