five

LeonOverload/primo-sft-json

收藏
Hugging Face2026-04-08 更新2026-04-12 收录
下载链接:
https://hf-mirror.com/datasets/LeonOverload/primo-sft-json
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 pretty_name: PRIMO SFT JSON task_categories: - video-text-to-text language: - en configs: - config_name: all data_files: - split: train path: "jsonl/part-*.jsonl" - config_name: agibot data_files: - split: train path: "jsonl_subsets/agibot/part-*.jsonl" - config_name: behavior-1k data_files: - split: train path: "jsonl_subsets/behavior-1k/part-*.jsonl" - config_name: nextqa data_files: - split: train path: "jsonl_subsets/nextqa/part-*.jsonl" - config_name: perceptiontest data_files: - split: train path: "jsonl_subsets/perceptiontest/part-*.jsonl" - config_name: robotwin-clean data_files: - split: train path: "jsonl_subsets/robotwin-clean/part-*.jsonl" - config_name: robotwin-randomized data_files: - split: train path: "jsonl_subsets/robotwin-randomized/part-*.jsonl" - config_name: robovqa data_files: - split: train path: "jsonl_subsets/robovqa/part-*.jsonl" - config_name: seed-bench-r1 data_files: - split: train path: "jsonl_subsets/seed-bench-r1/part-*.jsonl" - config_name: sharerobot data_files: - split: train path: "jsonl_subsets/sharerobot/part-*.jsonl" - config_name: star data_files: - split: train path: "jsonl_subsets/star/part-*.jsonl" --- # PRIMO SFT JSON This repository contains JSON annotations for **PRIMO**. ## What Is Included - `raw_json/`: original JSON files copied from the PRIMO release layout - `jsonl/`: flattened JSONL shards for better Hugging Face Data Studio preview - `jsonl_subsets/`: subset-specific JSONL shards used by Dataset Viewer config selector - `summary.json`: row/shard metadata generated at build time ## Split Type - Task: Supervised Fine-Tuning - Source pattern: `primo-sft/*/train_cot.json` ## Media Mapping This repo stores annotations only. Media files (videos/frames) should be prepared in a local folder like: - `./primo-video/...` The `path`, `init_frame_path`, and `current_frame_path` fields are expected to resolve against your local `PRIMO-Data` root. ## Quick Load Example ```python import json from pathlib import Path root = Path(".") jsonl_dir = root / "jsonl" rows = [] for fp in sorted(jsonl_dir.glob("part-*.jsonl")): with fp.open("r", encoding="utf-8") as f: for line in f: rows.append(json.loads(line)) print(len(rows)) ``` ## Build Metadata - Total rows: **116755** - Shards: **3** - Shard size: **50000** ## Viewer Subsets The Hugging Face Dataset Viewer subset selector maps to these configs: - `all` - `agibot` - `behavior-1k` - `nextqa` - `perceptiontest` - `robotwin-clean` - `robotwin-randomized` - `robovqa` - `seed-bench-r1` - `sharerobot` - `star` ## Citations If you find our work helpful for your research, please consider citing our work. ``` @misc{liu2026passiveobserveractivecritic, title={From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation}, author={Yibin Liu and Yaxing Lyu and Daqi Gao and Zhixuan Liang and Weiliang Tang and Shilong Mu and Xiaokang Yang and Yao Mu}, year={2026}, eprint={2603.15600}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2603.15600}, } ```

许可证:Apache 2.0 易读名称:PRIMO SFT JSON 任务类别: - 视频-文本到文本 语言: - 英语 配置项: - 配置名称:all 数据文件: - 拆分:训练集 路径:"jsonl/part-*.jsonl" - 配置名称:agibot 数据文件: - 拆分:训练集 路径:"jsonl_subsets/agibot/part-*.jsonl" - 配置名称:behavior-1k 数据文件: - 拆分:训练集 路径:"jsonl_subsets/behavior-1k/part-*.jsonl" - 配置名称:nextqa 数据文件: - 拆分:训练集 路径:"jsonl_subsets/nextqa/part-*.jsonl" - 配置名称:perceptiontest 数据文件: - 拆分:训练集 路径:"jsonl_subsets/perceptiontest/part-*.jsonl" - 配置名称:robotwin-clean 数据文件: - 拆分:训练集 路径:"jsonl_subsets/robotwin-clean/part-*.jsonl" - 配置名称:robotwin-randomized 数据文件: - 拆分:训练集 路径:"jsonl_subsets/robotwin-randomized/part-*.jsonl" - 配置名称:robovqa 数据文件: - 拆分:训练集 路径:"jsonl_subsets/robovqa/part-*.jsonl" - 配置名称:seed-bench-r1 数据文件: - 拆分:训练集 路径:"jsonl_subsets/seed-bench-r1/part-*.jsonl" - 配置名称:sharerobot 数据文件: - 拆分:训练集 路径:"jsonl_subsets/sharerobot/part-*.jsonl" - 配置名称:star 数据文件: - 拆分:训练集 路径:"jsonl_subsets/star/part-*.jsonl" --- # PRIMO SFT JSON 本仓库包含针对**PRIMO**的JSON标注文件。 ## 包含内容 - `raw_json/`:复刻自PRIMO原始发布目录结构的原生JSON文件 - `jsonl/`:经扁平化处理的JSONL(JSON Lines)分片文件,用于优化Hugging Face数据工作室的预览体验 - `jsonl_subsets/`:供Hugging Face数据集查看器配置选择器使用的分子集JSONL分片文件 - `summary.json`:数据集构建阶段生成的行数据与分片元数据文件 ## 拆分类型 - 任务类型:监督微调(Supervised Fine-Tuning) - 源文件模式:`primo-sft/*/train_cot.json` ## 媒体映射 本仓库仅存储标注文件。 媒体文件(视频/帧)需按如下格式存放于本地文件夹中: - `./primo-video/...` 数据集中的`path`、`init_frame_path`与`current_frame_path`字段需以本地`PRIMO-Data`根目录为基准进行路径解析。 ## 快速加载示例 python import json from pathlib import Path root = Path(".") jsonl_dir = root / "jsonl" rows = [] for fp in sorted(jsonl_dir.glob("part-*.jsonl")): with fp.open("r", encoding="utf-8") as f: for line in f: rows.append(json.loads(line)) print(len(rows)) ## 构建元数据 - 总数据行数:**116755** - 分片数量:**3** - 单分片大小:**50000** ## 查看器子集 Hugging Face数据集查看器的子集选择器对应如下配置: - `all` - `agibot` - `behavior-1k` - `nextqa` - `perceptiontest` - `robotwin-clean` - `robotwin-randomized` - `robovqa` - `seed-bench-r1` - `sharerobot` - `star` ## 引用说明 若您的研究工作得益于本仓库的内容,请引用我们的成果: @misc{liu2026passiveobserveractivecritic, title={From Passive Observer to Active Critic: Reinforcement Learning Elicits Process Reasoning for Robotic Manipulation}, author={Yibin Liu and Yaxing Lyu and Daqi Gao and Zhixuan Liang and Weiliang Tang and Shilong Mu and Xiaokang Yang and Yao Mu}, year={2026}, eprint={2603.15600}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2603.15600}, }
提供机构:
LeonOverload
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作