five

jdopensource/JoyAI-Image-OpenSpatial

收藏
Hugging Face2026-04-15 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/jdopensource/JoyAI-Image-OpenSpatial
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - visual-question-answering - image-to-text language: - en tags: - spatial-understanding - 3d-vision - depth-estimation - 3d-grounding - multi-view size_categories: - 1M<n<10M configs: - config_name: default data_files: - split: train path: data/*.parquet dataset_info: config_name: default features: - name: conversations list: - name: "from" dtype: string - name: value dtype: string - name: id dtype: string - name: data_source dtype: string - name: images list: - name: bytes dtype: binary - name: path dtype: string - name: type dtype: string - name: meta_info dtype: string splits: - name: train num_examples: 2335335 download_size: 2362232012800 dataset_size: 2362232012800 --- # JoyAI-Image-OpenSpatial Spatial understanding dataset built on [OpenSpatial](https://github.com/VINHYU/OpenSpatial), used in [JoyAI-Image](https://github.com/jd-opensource/JoyAI-Image). The full dataset contains about **~3M** multi-turn visual-spatial QA samples across **7 open-source datasets** and web data. The open-source datasets contain ARKitScenes, ScanNet, ScanNet++, HyperSim, Matterport3D, WildRGB-D, and Ego-Exo4D. Tasks cover a wide range of spatial understanding capabilities including 3D object grounding, depth ordering, spatial relation reasoning, distance estimation, and more. We have released **~2.3M** QA samples constructed from the open-source datasets. The remaining web data will be open-sourced in a future release. ## Quick Start ```python from datasets import load_dataset ds = load_dataset("jdopensource/JoyAI-Image-OpenSpatial", split="train", streaming=True) for sample in ds: print(sample["conversations"]) break ``` ## Data Format Each parquet file contains the following columns: | Column | Type | Description | |---|---|---| | `conversations` | `list[{from, value}]` | Multi-turn conversation pairs (`human` / `gpt`). The human turn provides camera parameters and a spatial reasoning question; the gpt turn provides structured spatial annotations (e.g., 3D bounding boxes, depth ordering, spatial relations). | | `id` | `string` | Unique sample identifier | | `data_source` | `string` | Source dataset (e.g., `arkitscenes`, `scannet`, `scannetpp`, `hypersim`, `matterport3d`, `wildrgbd`, `Ego-Exo4D`) | | `images` | `list[{bytes, path}]` | Embedded image data (PNG bytes) | | `type` | `string` | Data type label | | `meta_info` | `string` | JSON string with image dimensions (`width`, `height`, `resized_width`, `resized_height`) | ## TODO - [ ] Release 3D lifting data
提供机构:
jdopensource
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作