myconnects/motif

Name: myconnects/motif
Creator: myconnects
Published: 2026-04-18 04:11:36
License: 暂无描述

Hugging Face2026-04-18 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/myconnects/motif

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: mit task_categories: - video-classification - robotics language: - en tags: - robotics - motion - imitation-learning - multimodal - video size_categories: - 1K<n<10K configs: - config_name: human_motion data_files: - split: train path: human_motion/train-* - config_name: stretch_motion data_files: - split: train path: stretch_motion/train-* --- # MotIF-1K Dataset Multimodal trajectories of human and Stretch-robot motion paired with task and motion annotations, released with the paper **"MotIF: Motion Instruction Fine-tuning"**. - **Paper:** [MotIF: Motion Instruction Fine-tuning](https://arxiv.org/abs/2409.10683) (arXiv:2409.10683) - **Project website:** https://motif-1k.github.io - **Code (GitHub):** https://github.com/Minyoung1005/motif - **Authors:** Minyoung Hwang, Joey Hejna, Dorsa Sadigh, Yonatan Bisk - **Contact:** myhwang@mit.edu ## Abstract Many robotics tasks require observing the full motion of the robot — not just start/end states — to correctly judge success (e.g., brushing hair requires the right trajectory, not just ending up "at" the hair). Off-the-shelf vision-language models (VLMs) struggle here because they are trained on single frames and lack robot-motion data. MotIF fine-tunes VLMs using **abstract visual motion representations** (e.g., keypoint trajectories overlaid on the initial frame, optical flow, key-frame storyboards) to semantically ground robot behavior in the environment. On MotIF-1K, the resulting model outperforms state-of-the-art VLMs by ≥2× in precision and by 56.1% in recall, generalizing across unseen motions, tasks, and environments. ## Configs | Config | Rows | Source | |-----------------|------|------------------------------------------------| | `human_motion` | 653 | Human demonstrations (brushing, pouring, etc.) | | `stretch_motion`| 370 | Hello Robot Stretch teleop demonstrations | Covers 13 task categories with varied feasible motions per task. ## Fields - `traj_idx` (int) — trajectory index - `num_steps` (int) — number of frames - `trajectory` (list[list[int]]) — 2D keypoint path `[x, y]` per frame - `task_instruction` (str) — high-level task (e.g. `"brush hair"`) - `motion_description` (str) — motion verb phrase (e.g. `"move downward and upward, repeating 3 times"`) - `video_raw`, `video_trajviz` (video) — original and trajectory-overlaid clips (mp4) - `last_frame_raw`, `last_frame_trajviz` (image) — final-frame stills - `opticalflow` (image) — optical-flow visualization (some rows may be null in `human_motion`) - `storyboard_key{2,4,9,16}` (image) — storyboard grids at K key-frames - `storyboard_key{2,4,9,16}_trajviz` (image) — same grids with trajectory overlay ## Usage ```python from datasets import load_dataset ds = load_dataset("myconnects/motif", "human_motion", split="train") row = ds[0] print(row["task_instruction"], row["motion_description"]) row["video_raw"] # decoded video row["last_frame_trajviz"] # PIL Image ``` ## Training & Evaluation Code The [MotIF GitHub repository](https://github.com/Minyoung1005/motif) contains the full codebase: data-collection scripts, LoRA fine-tuning pipeline (LLaVA-based), evaluation (with optional logits), pretrained model checkpoints, a videoLM-architecture variant, a success-detection comparison (GPT / Gemini / MotIF), and a Gradio web UI. ## Citation ```bibtex @article{hwang2024motif, title = {MotIF: Motion Instruction Fine-tuning}, author = {Hwang, Minyoung and Hejna, Joey and Sadigh, Dorsa and Bisk, Yonatan}, booktitle = {arXiv preprint arXiv:2409.10683}, year = {2024}, } ``` ## Acknowledgements We thank Abitha Thankaraj, Hao Zhu, Leena Mathur, Quanting Xie, Rosa Vitiello, Su Li, Tiffany Min, Vidhi Jain, and Yingshan Chang for helping us collect the dataset and providing thoughtful feedback.

许可证：MIT许可证任务类别： - 视频分类 - 机器人学语言：英语标签： - 机器人学 - 运动 - 模仿学习（imitation-learning） - 多模态 - 视频规模类别：1000 < 样本量 < 10000 配置项： - 配置名称：human_motion 数据文件： - 划分：训练集路径：human_motion/train-* - 配置名称：stretch_motion 数据文件： - 划分：训练集路径：stretch_motion/train-* # MotIF-1K 数据集本数据集包含人类与Stretch机器人运动的多模态轨迹，搭配任务与运动标注，随论文**《MotIF：运动指令微调》**（MotIF: Motion Instruction Fine-tuning）发布。 - **论文：** [MotIF：运动指令微调](https://arxiv.org/abs/2409.10683)（arXiv:2409.10683） - **项目官网：** https://motif-1k.github.io - **代码仓库（GitHub）：** https://github.com/Minyoung1005/motif - **作者：** Minyoung Hwang、Joey Hejna、Dorsa Sadigh、Yonatan Bisk - **联系方式：** myhwang@mit.edu ## 摘要许多机器人任务需要观察机器人的完整运动过程——而非仅起始与结束状态——才能正确判断任务成功与否（例如梳头任务需要符合要求的运动轨迹，而非仅最终“触碰头发”）。现有通用视觉语言模型（Vision-Language Model, VLM）在此类任务中表现欠佳，因其仅在单帧图像上训练，且缺乏机器人运动相关数据集。MotIF通过**抽象视觉运动表征**（例如叠加于初始帧的关键点轨迹、光流、关键帧故事板）对视觉语言模型进行微调，以将机器人行为语义锚定至真实环境中。在MotIF-1K数据集上，经微调后的模型在精度上较当前顶尖视觉语言模型提升≥2倍，召回率提升56.1%，可泛化至未见过的运动、任务与环境。 ## 配置项 | 配置名称 | 样本数 | 来源 | |-----------------|--------|----------------------------------------------| | `human_motion` | 653 | 人类演示数据（梳头、倾倒等任务） | | `stretch_motion`| 370 | Hello Robot Stretch 远程操作演示数据 | 本数据集涵盖13个任务类别，每个任务包含多种可行运动轨迹。 ## 数据字段 - `traj_idx`（整数）：轨迹索引 - `num_steps`（整数）：视频帧数 - `trajectory`（列表[列表[整数]]）：每帧对应的二维关键点路径`[x, y]` - `task_instruction`（字符串）：高级任务指令（例如 `"梳头"`） - `motion_description`（字符串）：运动动词短语（例如 `"往复上下移动3次"`） - `video_raw`、`video_trajviz`（视频）：原始视频与叠加轨迹的视频片段（mp4格式） - `last_frame_raw`、`last_frame_trajviz`（图像）：最终帧的静态图像 - `opticalflow`（图像）：光流可视化图（`human_motion`配置中部分行可能为空） - `storyboard_key{2,4,9,16}`（图像）：包含K个关键帧的故事板网格 - `storyboard_key{2,4,9,16}_trajviz`（图像）：叠加了轨迹的同类故事板网格 ## 使用示例 python from datasets import load_dataset ds = load_dataset("myconnects/motif", "human_motion", split="train") row = ds[0] print(row["task_instruction"], row["motion_description"]) row["video_raw"] # 解码后的视频 row["last_frame_trajviz"] # PIL图像对象 ## 训练与评估代码 [MotIF GitHub仓库](https://github.com/Minyoung1005/motif) 包含完整代码库：数据收集脚本、基于LLaVA的LoRA微调流水线、评估代码（支持输出可选logits）、预训练模型检查点、videoLM架构变体、成功率检测对比实验（GPT / Gemini / MotIF）以及Gradio网页UI。 ## 引用格式 bibtex @article{hwang2024motif, title = {MotIF: Motion Instruction Fine-tuning}, author = {Hwang, Minyoung and Hejna, Joey and Sadigh, Dorsa and Bisk, Yonatan}, booktitle = {arXiv preprint arXiv:2409.10683}, year = {2024}, } ## 致谢感谢Abitha Thankaraj、Hao Zhu、Leena Mathur、Quanting Xie、Rosa Vitiello、Su Li、Tiffany Min、Vidhi Jain以及Yingshan Chang协助收集数据集并提供宝贵反馈。

提供机构：

myconnects

5,000+

优质数据集

54 个

任务类型

进入经典数据集