ChangChrisLiu/GNN_Disassembly_WorldModel

Name: ChangChrisLiu/GNN_Disassembly_WorldModel
Creator: ChangChrisLiu
Published: 2026-04-11 02:09:35
License: 暂无描述

Hugging Face2026-04-11 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/ChangChrisLiu/GNN_Disassembly_WorldModel

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: cc-by-4.0 task_categories: - robotics - image-segmentation - graph-ml language: - en tags: - robotics - manipulation - disassembly - constraint-graph - gnn - world-model - sam2 - segmentation - ur5e size_categories: - 1K<n<10K pretty_name: GNN Disassembly World Model Dataset --- # GNN Disassembly World Model Dataset Real robot disassembly episodes with per-view constraint graphs, SAM2 segmentation masks, 256D feature embeddings, 3D positions, and synchronized robot states. **Hardware:** UR5e + Robotiq 2F-85 gripper, OAK-D Pro (side), RealSense D435i (wrist) ## Overview This dataset contains teleoperated demonstrations of a robot disassembling a desktop motherboard. Each episode includes: - **Dual-camera RGB-D video** at 30Hz (side + wrist views, 1280×720) - **13D robot state** per frame (6 joints + 6 TCP pose + 1 gripper) - **Robot actions** (frame-to-frame state deltas) - **Per-view constraint graphs** with directed edges and per-frame states - **Binary segmentation masks** per component per frame - **256D SAM2 feature embeddings** per component per frame - **3D position centroids** (depth backprojection, meters) ## Dataset Structure ``` data/disassembly/desktop/ ├── session_XXXX_YYYYYY/ # Timestamped sessions │ ├── session_metadata.json │ ├── episode_00/ # One episode = one component removal │ │ ├── metadata.json # goal_component, component_counts │ │ ├── robot_states.npy # (T, 13) float32 │ │ ├── robot_actions.npy # (T-1, 13) float32 deltas │ │ ├── timestamps.npy # (T, 3) float64 │ │ ├── side/ │ │ │ ├── rgb/frame_XXXXXX.png # 1280x720 RGB (side camera) │ │ │ └── depth/frame_XXXXXX.npy # 1280x720 uint16 (mm) │ │ ├── wrist/ │ │ │ ├── rgb/frame_XXXXXX.png # 1280x720 RGB (wrist camera) │ │ │ └── depth/frame_XXXXXX.npy │ │ └── annotations/ # Only for labeled episodes │ │ ├── side_graph.json # Side view constraint graph │ │ ├── wrist_graph.json # Wrist view constraint graph │ │ ├── side_masks/frame_XXXXXX.npz # {component_id: (H,W) uint8} │ │ ├── wrist_masks/frame_XXXXXX.npz │ │ ├── side_embeddings/frame_XXXXXX.npz # {component_id: (256,) float32} │ │ ├── wrist_embeddings/frame_XXXXXX.npz │ │ ├── side_centroids/frame_XXXXXX.json # {component_id: [x,y,z]} meters │ │ ├── wrist_centroids/frame_XXXXXX.json │ │ └── dataset_card.json │ └── episode_01/ ... └── session_.../ ... ``` ## Component Types (Type Vocab) 8 standard component types used throughout the dataset: | Index | Type | Color | Description | |-------|------|-------|-------------| | 0 | `cpu_fan` | #FF6B6B | CPU cooling fan | | 1 | `cpu_bracket` | #4ECDC4 | CPU retention bracket | | 2 | `cpu` | #45B7D1 | CPU processor | | 3 | `ram_clip` | #96CEB4 | RAM retention clip | | 4 | `ram` | #FFEAA7 | RAM stick | | 5 | `connector` | #DDA0DD | Cable/connector | | 6 | `graphic_card` | #FF8C42 | GPU card | | 7 | `motherboard` | #8B5CF6 | Main board | **Dynamic instances:** Components of types `ram`, `ram_clip`, `connector` can have multiple instances (e.g., `ram_1`, `ram_2`). They share the same type one-hot encoding — instances are distinguished by their 256D SAM2 embedding and 3D position. **Occluded components:** `cpu_bracket` and `cpu` are typically hidden under `cpu_fan` at the start of each episode and become visible mid-episode. This is tracked in `frame_states.visibility` using delta encoding. ## Constraint Graph Semantics Edges are **directed prerequisite constraints**: `A -> B` means "A blocks the removal of B" (A must be released before B can be removed). ### Auto-edge rules (physical knowledge) ``` cpu_fan -> cpu_bracket (fan covers bracket) cpu_fan -> motherboard (fan attached to board) cpu_bracket -> cpu (bracket holds CPU) cpu_bracket -> motherboard (bracket bolted to board) cpu -> motherboard (CPU in socket) ram_N -> motherboard (RAM in slot) ram_clip_N -> motherboard (clip attached to board) connector_N -> motherboard (connector plugged in) graphic_card -> motherboard (GPU in PCIe slot) ``` Manual edges (not auto-generated): `ram_clip_N -> ram_M` — users manually pair each clip with its matching RAM stick. ### Edge states - **Locked** (`true`, value=1): constraint active, component cannot be removed - **Unlocked** (`false`, value=0): constraint released, component is free - **Monotonic:** once unlocked during an episode, stays unlocked ### Delta-encoded frame states Frame states are stored as deltas — only frames where state changes are recorded. To resolve state at frame N, accumulate deltas from frame 0 through N: ```python def resolve_frame_state(graph_json, frame_idx): constraints = {} visibility = {} for c in graph_json["components"]: visibility[c["id"]] = True # default visible for e in graph_json["edges"]: constraints[f"{e['src']}->{e['dst']}"] = True # default locked frame_states = graph_json.get("frame_states", {}) for f in sorted([int(k) for k in frame_states]): if f > frame_idx: break fs = frame_states[str(f)] constraints.update(fs.get("constraints", {})) visibility.update(fs.get("visibility", {})) return constraints, visibility ``` ## Graph JSON Structure ```json { "view": "side", "episode_id": "episode_00", "goal_component": "cpu_fan", "components": [ {"id": "cpu_fan", "type": "cpu_fan", "color": "#FF6B6B"}, {"id": "ram_1", "type": "ram", "color": "#FFEAA7"} ], "edges": [ {"src": "cpu_fan", "dst": "cpu_bracket", "directed": true}, {"src": "ram_clip_1", "dst": "ram_1", "directed": true} ], "frame_states": { "0": { "constraints": {"cpu_fan->cpu_bracket": true}, "visibility": {"cpu_fan": true, "cpu_bracket": false, "cpu": false} }, "152": { "constraints": {"cpu_fan->cpu_bracket": false}, "visibility": {"cpu_fan": false, "cpu_bracket": true, "cpu": true} } }, "node_positions": {"cpu_fan": [120, 80]}, "embedding_dim": 256, "feature_extractor": "sam2.1_hiera_base_plus", "type_vocab": ["cpu_fan", "cpu_bracket", "cpu", "ram_clip", "ram", "connector", "graphic_card", "motherboard"] } ``` ## Node Features for GNN Training Each node has a 268D feature vector (for K=8 types): | Feature | Dim | Source | |---------|-----|--------| | SAM2 embedding | 256 | Masked average pool of `sam2.1_hiera_b+` encoder features | | 3D position | 3 | Depth backprojection, averaged over all valid mask pixels (meters, camera frame) | | Component type one-hot | 8 | Index by `type_vocab` — multiple instances share the same one-hot | | Visibility | 1 | Binary flag for this camera at this frame | | **Total** | **268** | | **Robot state** (13D) is stored separately in `robot_states.npy` and can be concatenated at training time to form Graph B features (281D per node). **3D position handling:** When the wrist camera is too close to the surface, depth becomes invalid. In those cases, the centroid entry is missing for that component. Handle this in your data loader (e.g., interpolate from previous frame or set zeros with a `depth_valid` flag). ## Converting to PyG (PyTorch Geometric) **IMPORTANT:** The labeling tool stores only sparse physical constraint edges. For GNN training, expand to a **fully connected graph** so message passing works across all node pairs. Edge features encode whether a constraint exists: | has_constraint | is_locked | Meaning | |---|---|---| | 1 | 1 | Physical constraint exists, locked | | 1 | 0 | Physical constraint exists, released | | 0 | 0 | No physical constraint (message passing only) | ### Example: Load a frame into PyG ```python import json import numpy as np import torch from pathlib import Path from torch_geometric.data import Data def resolve_frame_state(graph_json, frame_idx): """Resolve delta-encoded constraints and visibility at a frame.""" constraints = {} visibility = {} for c in graph_json["components"]: visibility[c["id"]] = True for e in graph_json["edges"]: constraints[f"{e['src']}->{e['dst']}"] = True fs_dict = graph_json.get("frame_states", {}) for f in sorted([int(k) for k in fs_dict]): if f > frame_idx: break fs = fs_dict[str(f)] constraints.update(fs.get("constraints", {})) visibility.update(fs.get("visibility", {})) return constraints, visibility def load_pyg_frame(episode_dir: Path, view: str, frame_idx: int) -> Data: """Load one frame of a view's graph as a fully connected PyG Data object.""" anno_dir = episode_dir / "annotations" # Load graph JSON with open(anno_dir / f"{view}_graph.json") as f: graph = json.load(f) nodes = graph["components"] type_vocab = graph["type_vocab"] N = len(nodes) # Load per-frame data masks_npz = np.load(anno_dir / f"{view}_masks" / f"frame_{frame_idx:06d}.npz") embeddings_npz = np.load(anno_dir / f"{view}_embeddings" / f"frame_{frame_idx:06d}.npz") with open(anno_dir / f"{view}_centroids" / f"frame_{frame_idx:06d}.json") as f: centroids = json.load(f) # Resolve delta-encoded state constraints, visibility = resolve_frame_state(graph, frame_idx) # Build node features: [256D SAM2 embedding, 3D pos, 8D type one-hot, 1D visibility] x_list = [] for node in nodes: cid = node["id"] emb = embeddings_npz[cid] if cid in embeddings_npz.files else np.zeros(256, dtype=np.float32) pos = centroids.get(cid, [0.0, 0.0, 0.0]) type_oh = [1.0 if t == node["type"] else 0.0 for t in type_vocab] vis = [1.0 if visibility.get(cid, True) else 0.0] x_list.append(list(emb) + list(pos) + type_oh + vis) # Build FULLY CONNECTED edge index + 2D edge features constraint_set = {(e["src"], e["dst"]) for e in graph["edges"]} src_idx, dst_idx, edge_attr = [], [], [] for i in range(N): for j in range(N): if i == j: continue # no self-loops src_id = nodes[i]["id"] dst_id = nodes[j]["id"] src_idx.append(i) dst_idx.append(j) if (src_id, dst_id) in constraint_set: is_locked = constraints.get(f"{src_id}->{dst_id}", True) edge_attr.append([1.0, 1.0 if is_locked else 0.0]) else: edge_attr.append([0.0, 0.0]) # message passing only return Data( x=torch.tensor(x_list, dtype=torch.float32), # (N, 268) edge_index=torch.tensor([src_idx, dst_idx], dtype=torch.long), # (2, N*(N-1)) edge_attr=torch.tensor(edge_attr, dtype=torch.float32), # (N*(N-1), 2) num_nodes=N, ) def load_episode(episode_dir: Path, view: str = "side"): """Generator: yield PyG Data objects for each annotated frame.""" anno_dir = episode_dir / "annotations" mask_dir = anno_dir / f"{view}_masks" for npz_path in sorted(mask_dir.glob("frame_*.npz")): frame_idx = int(npz_path.stem.split("_")[1]) yield frame_idx, load_pyg_frame(episode_dir, view, frame_idx) # Usage from pathlib import Path episode = Path("data/disassembly/desktop/session_0408_164005/episode_00") for frame_idx, data in load_episode(episode, view="side"): print(f"Frame {frame_idx}: {data.num_nodes} nodes, {data.num_edges} edges") # data.x shape: (N, 268) # data.edge_index shape: (2, N*(N-1)) # data.edge_attr shape: (N*(N-1), 2) ``` ### Adding Robot State (Graph B) To use robot state as additional node features, broadcast the 13D state to all nodes: ```python robot_states = np.load(episode_dir / "robot_states.npy") # (T, 13) # At frame t, concatenate to every node: robot_at_t = torch.tensor(robot_states[frame_idx], dtype=torch.float32) # (13,) robot_broadcast = robot_at_t.unsqueeze(0).expand(N, -1) # (N, 13) data.x = torch.cat([data.x, robot_broadcast], dim=1) # (N, 281) ``` ## Recording Hardware - **Robot:** UR5e + Robotiq 2F-85 gripper - **Side camera:** Luxonis OAK-D Pro (static, workspace view) - Intrinsics: fx=1033.8, fy=1033.7, cx=632.9, cy=359.9 - **Wrist camera:** Intel RealSense D435i (mounted on robot wrist) - Intrinsics: fx=906.6, fy=905.8, cx=645.9, cy=364.6 - **Recording rate:** 30 Hz - **Image size:** 1280 × 720 - **Depth format:** uint16, millimeters - **Teleoperation:** Thrustmaster SOL-R2 HOSAS controllers ## Annotation Tool Annotations were created with a custom SAM2-based labeling tool: - **Repository:** https://github.com/ChangChrisLiu/gnn-world-model - **Backend:** FastAPI + SAM2 (`sam2.1_hiera_base_plus`) - **Frontend:** Vanilla HTML/JS with 5 interaction modes (BBox, Point, Polygon, Brush, Eraser) - **Features:** Per-view independent graphs, dynamic component instances, interactive graph editor, scroll-to-zoom, undo/redo ## License This dataset is released under **CC BY 4.0**. You are free to use, share, and adapt the data for any purpose (including commercial) as long as you provide attribution. ## Acknowledgements This work was conducted at Texas A&M University for submission to CoRL 2026. Built using: - [Segment Anything Model 2 (SAM2)](https://github.com/facebookresearch/sam2) by Meta AI - [PyTorch Geometric](https://pytorch-geometric.readthedocs.io/) - [Hugging Face Datasets](https://huggingface.co/docs/datasets)

提供机构：

ChangChrisLiu

搜集汇总

数据集介绍

构建方式

在机器人操作领域，GNN_Disassembly_WorldModel数据集的构建体现了多模态数据采集与自动化标注的精密融合。该数据集通过UR5e机械臂与OAK-D Pro相机在桌面拆卸与汉诺塔两个操作场景中同步捕获30 Hz的RGB图像、深度信息及机器人状态，形成原始数据流。随后，离线自动标注流程利用HSV分割与SAM2模型生成精细掩码，并结合深度反投影提取三维位置，同时基于符号化规则推导约束图状态，确保每帧数据包含组件掩码、256维特征嵌入、三维坐标及可见性标签。标注后通过浏览器界面进行人工验证与修正，形成结构化的每帧图表示，最终统一封装为270维节点特征的PyG图数据格式。

特点

该数据集的核心特征在于其跨域统一的图结构表示与丰富的多模态信息集成。数据集涵盖桌面拆卸与汉诺塔两个机器人操作领域，均采用一致的270维节点特征格式，其中包含256维SAM2视觉嵌入、三维空间坐标、10维固定类型编码及可见性标志，确保了节点表示的维度一致性。约束图以有向边编码物理依赖关系，并附带三维边特征描述约束状态、锁定关系及方向性。数据组织遵循严格的对齐保证，每帧的掩码、嵌入、深度信息与机器人状态均通过相同帧索引键控，便于高效加载。此外，数据集提供四种图加载变体，支持从仅产品节点到包含机器人状态与动作条件的多种建模需求，为图神经网络世界模型的研究提供了灵活而坚实的基础。

使用方法

使用该数据集时，研究者可通过提供的自包含Python加载器便捷地访问结构化图数据。加载器支持从数据集根目录读取指定会话与回合，自动解析每帧的掩码、嵌入、深度信息及约束图，并转换为torch_geometric.data.Data对象。用户可根据建模目标选择四种加载变体之一，例如仅产品图、包含机器人状态或动作条件的图，以适应不同的世界模型架构。数据集的固定10维类型编码通过YAML配置文件提供，确保跨域节点特征的一致性。加载过程天然支持批处理与图数据转换，便于直接集成到图神经网络训练流程中，用于约束感知的视频生成、机器人操作规划或物理推理等前沿研究任务。

背景与挑战

背景概述

GNN_Disassembly_WorldModel数据集由德克萨斯农工大学的Chang Liu等人于2026年创建，旨在推动机器人操作领域的研究。该数据集聚焦于约束感知的世界模型构建，通过图神经网络（GNN）处理桌面拆卸和汉诺塔两个典型操作场景，以解决机器人视频生成中的物理约束建模问题。其核心研究问题在于如何将视觉信息与结构化约束图相结合，从而预测动态环境中的物体状态变化，为自主操作系统的决策提供可靠依据。该数据集通过提供同步的机器人状态、深度图像及SAM2分割掩码，为相关领域的研究者提供了丰富的多模态数据资源，有望在机器人学习与规划任务中发挥关键作用。

当前挑战

该数据集旨在解决机器人操作中约束感知视频生成的挑战，具体包括如何准确建模物体间的物理依赖关系，例如拆卸任务中的先后顺序或汉诺塔中的堆叠规则。构建过程中的挑战涉及多源数据的同步采集与对齐，确保每帧图像、深度信息与机器人状态的时间一致性。此外，自动标注环节需处理复杂场景下的分割精度问题，特别是在物体遮挡或动态变化时维持掩码与嵌入向量的可靠性。数据集的统一节点特征格式设计也需平衡不同领域间的语义差异，确保图结构在跨任务泛化中的有效性。

常用场景

实际应用

在工业自动化与智能机器人领域，该数据集支撑了复杂装配任务的自主决策系统开发。基于约束图的世界模型可预测拆卸序列的可行性，辅助规划机械臂操作路径。汉诺塔域的任务变体模拟了重配置操作场景，其目标提示与状态标注为指令跟随型机器人提供了训练数据。实际部署中，此类模型能够减少对精确动力学仿真的依赖，通过视觉感知直接推断物体交互约束，提升在非结构化环境中的操作鲁棒性。

衍生相关工作

围绕该数据集衍生的经典研究集中于图神经网络在物理推理中的扩展。相关工作包括基于约束边缘特征的动态图注意力机制、结合机器人状态编码的多模态融合架构，以及利用SAM2特征进行零样本组件识别的迁移学习框架。部分研究进一步探索了动作条件生成模型在操作序列预测中的应用，通过帧间状态差分学习机器人动作的因果效应。这些工作共同推动了符号约束与神经表示相结合的混合推理范式发展。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集