myconnects/motif
收藏Hugging Face2026-04-18 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/myconnects/motif
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- video-classification
- robotics
language:
- en
tags:
- robotics
- motion
- imitation-learning
- multimodal
- video
size_categories:
- 1K<n<10K
configs:
- config_name: human_motion
data_files:
- split: train
path: human_motion/train-*
- config_name: stretch_motion
data_files:
- split: train
path: stretch_motion/train-*
---
# MotIF-1K Dataset
Multimodal trajectories of human and Stretch-robot motion paired with task and motion annotations, released with the paper **"MotIF: Motion Instruction Fine-tuning"**.
- **Paper:** [MotIF: Motion Instruction Fine-tuning](https://arxiv.org/abs/2409.10683) (arXiv:2409.10683)
- **Project website:** https://motif-1k.github.io
- **Code (GitHub):** https://github.com/Minyoung1005/motif
- **Authors:** Minyoung Hwang, Joey Hejna, Dorsa Sadigh, Yonatan Bisk
- **Contact:** myhwang@mit.edu
## Abstract
Many robotics tasks require observing the full motion of the robot — not just start/end states — to correctly judge success (e.g., brushing hair requires the right trajectory, not just ending up "at" the hair). Off-the-shelf vision-language models (VLMs) struggle here because they are trained on single frames and lack robot-motion data. MotIF fine-tunes VLMs using **abstract visual motion representations** (e.g., keypoint trajectories overlaid on the initial frame, optical flow, key-frame storyboards) to semantically ground robot behavior in the environment. On MotIF-1K, the resulting model outperforms state-of-the-art VLMs by ≥2× in precision and by 56.1% in recall, generalizing across unseen motions, tasks, and environments.
## Configs
| Config | Rows | Source |
|-----------------|------|------------------------------------------------|
| `human_motion` | 653 | Human demonstrations (brushing, pouring, etc.) |
| `stretch_motion`| 370 | Hello Robot Stretch teleop demonstrations |
Covers 13 task categories with varied feasible motions per task.
## Fields
- `traj_idx` (int) — trajectory index
- `num_steps` (int) — number of frames
- `trajectory` (list[list[int]]) — 2D keypoint path `[x, y]` per frame
- `task_instruction` (str) — high-level task (e.g. `"brush hair"`)
- `motion_description` (str) — motion verb phrase (e.g. `"move downward and upward, repeating 3 times"`)
- `video_raw`, `video_trajviz` (video) — original and trajectory-overlaid clips (mp4)
- `last_frame_raw`, `last_frame_trajviz` (image) — final-frame stills
- `opticalflow` (image) — optical-flow visualization (some rows may be null in `human_motion`)
- `storyboard_key{2,4,9,16}` (image) — storyboard grids at K key-frames
- `storyboard_key{2,4,9,16}_trajviz` (image) — same grids with trajectory overlay
## Usage
```python
from datasets import load_dataset
ds = load_dataset("myconnects/motif", "human_motion", split="train")
row = ds[0]
print(row["task_instruction"], row["motion_description"])
row["video_raw"] # decoded video
row["last_frame_trajviz"] # PIL Image
```
## Training & Evaluation Code
The [MotIF GitHub repository](https://github.com/Minyoung1005/motif) contains the full codebase: data-collection scripts, LoRA fine-tuning pipeline (LLaVA-based), evaluation (with optional logits), pretrained model checkpoints, a videoLM-architecture variant, a success-detection comparison (GPT / Gemini / MotIF), and a Gradio web UI.
## Citation
```bibtex
@article{hwang2024motif,
title = {MotIF: Motion Instruction Fine-tuning},
author = {Hwang, Minyoung and Hejna, Joey and Sadigh, Dorsa and Bisk, Yonatan},
booktitle = {arXiv preprint arXiv:2409.10683},
year = {2024},
}
```
## Acknowledgements
We thank Abitha Thankaraj, Hao Zhu, Leena Mathur, Quanting Xie, Rosa Vitiello, Su Li, Tiffany Min, Vidhi Jain, and Yingshan Chang for helping us collect the dataset and providing thoughtful feedback.
许可证:MIT许可证
任务类别:
- 视频分类
- 机器人学
语言:英语
标签:
- 机器人学
- 运动
- 模仿学习(imitation-learning)
- 多模态
- 视频
规模类别:1000 < 样本量 < 10000
配置项:
- 配置名称:human_motion
数据文件:
- 划分:训练集
路径:human_motion/train-*
- 配置名称:stretch_motion
数据文件:
- 划分:训练集
路径:stretch_motion/train-*
# MotIF-1K 数据集
本数据集包含人类与Stretch机器人运动的多模态轨迹,搭配任务与运动标注,随论文**《MotIF:运动指令微调》**(MotIF: Motion Instruction Fine-tuning)发布。
- **论文:** [MotIF:运动指令微调](https://arxiv.org/abs/2409.10683)(arXiv:2409.10683)
- **项目官网:** https://motif-1k.github.io
- **代码仓库(GitHub):** https://github.com/Minyoung1005/motif
- **作者:** Minyoung Hwang、Joey Hejna、Dorsa Sadigh、Yonatan Bisk
- **联系方式:** myhwang@mit.edu
## 摘要
许多机器人任务需要观察机器人的完整运动过程——而非仅起始与结束状态——才能正确判断任务成功与否(例如梳头任务需要符合要求的运动轨迹,而非仅最终“触碰头发”)。现有通用视觉语言模型(Vision-Language Model, VLM)在此类任务中表现欠佳,因其仅在单帧图像上训练,且缺乏机器人运动相关数据集。MotIF通过**抽象视觉运动表征**(例如叠加于初始帧的关键点轨迹、光流、关键帧故事板)对视觉语言模型进行微调,以将机器人行为语义锚定至真实环境中。在MotIF-1K数据集上,经微调后的模型在精度上较当前顶尖视觉语言模型提升≥2倍,召回率提升56.1%,可泛化至未见过的运动、任务与环境。
## 配置项
| 配置名称 | 样本数 | 来源 |
|-----------------|--------|----------------------------------------------|
| `human_motion` | 653 | 人类演示数据(梳头、倾倒等任务) |
| `stretch_motion`| 370 | Hello Robot Stretch 远程操作演示数据 |
本数据集涵盖13个任务类别,每个任务包含多种可行运动轨迹。
## 数据字段
- `traj_idx`(整数):轨迹索引
- `num_steps`(整数):视频帧数
- `trajectory`(列表[列表[整数]]):每帧对应的二维关键点路径`[x, y]`
- `task_instruction`(字符串):高级任务指令(例如 `"梳头"`)
- `motion_description`(字符串):运动动词短语(例如 `"往复上下移动3次"`)
- `video_raw`、`video_trajviz`(视频):原始视频与叠加轨迹的视频片段(mp4格式)
- `last_frame_raw`、`last_frame_trajviz`(图像):最终帧的静态图像
- `opticalflow`(图像):光流可视化图(`human_motion`配置中部分行可能为空)
- `storyboard_key{2,4,9,16}`(图像):包含K个关键帧的故事板网格
- `storyboard_key{2,4,9,16}_trajviz`(图像):叠加了轨迹的同类故事板网格
## 使用示例
python
from datasets import load_dataset
ds = load_dataset("myconnects/motif", "human_motion", split="train")
row = ds[0]
print(row["task_instruction"], row["motion_description"])
row["video_raw"] # 解码后的视频
row["last_frame_trajviz"] # PIL图像对象
## 训练与评估代码
[MotIF GitHub仓库](https://github.com/Minyoung1005/motif) 包含完整代码库:数据收集脚本、基于LLaVA的LoRA微调流水线、评估代码(支持输出可选logits)、预训练模型检查点、videoLM架构变体、成功率检测对比实验(GPT / Gemini / MotIF)以及Gradio网页UI。
## 引用格式
bibtex
@article{hwang2024motif,
title = {MotIF: Motion Instruction Fine-tuning},
author = {Hwang, Minyoung and Hejna, Joey and Sadigh, Dorsa and Bisk, Yonatan},
booktitle = {arXiv preprint arXiv:2409.10683},
year = {2024},
}
## 致谢
感谢Abitha Thankaraj、Hao Zhu、Leena Mathur、Quanting Xie、Rosa Vitiello、Su Li、Tiffany Min、Vidhi Jain以及Yingshan Chang协助收集数据集并提供宝贵反馈。
提供机构:
myconnects



