PRISM
收藏魔搭社区2025-12-05 更新2025-06-07 收录
下载链接:
https://modelscope.cn/datasets/allenai/PRISM
下载链接
链接失效反馈官方服务:
资源简介:
# PRISM
[[Paper]](https://arxiv.org/pdf/2505.13441) [[arXiv]](https://arxiv.org/abs/2505.13441) [[Project Website]](https://abhaybd.github.io/GraspMolmo/)
Purpose-driven Robotic Interaction in Scene Manipulation (PRISM) is a large-scale synthetic dataset for Task-Oriented Grasping featuring cluttered environments and diverse, realistic task descriptions. We use 2365 object instances from ShapeNet-Sem along with stable grasps from ACRONYM to compose 10,000 unique and diverse scenes. Within each scene we capture 10 views, within which there are multiple tasks to be performed. This results in 379k task-grasp samples in total.
The dataset card contains the tasks and corresponding descriptions for the train/test datasets. The RGB images, point clouds, segmentation maps, etc. are available in the included "data files", which are in the `.tar` files in `PRISM-train` and `PRISM-test`, which can be retrieved as required for each data sample.
## Data Files
Each `.tar` file contains multiple `<scene_id>.hdf5` files, each with the following structure:
```
view_<i>/
rgb: RGB image as (H,W,3) array
xyz: Back-projected point-cloud (from RGB-D view) as (H,W,3) array of XYZ points
seg: Segmentation map as (H,W) array where each pixel is index of object name in object_names
object_names: List of object names visible in view
normals (optional): Point-cloud normals as (H,W,3) array
view_pose: Camera pose in world frame as (4,4) array
cam_params: Camera intrinsics matrix as (3,3) array
obs_<j>/
grasp_pose: Grasp pose in camera frame as (4,4) array
grasp_point: Point being grasped in camera frame as (3,) array
grasp_point_px: Point being grasped projected onto image plane as (2,) array
annot: YAML-formatted object with the following keys: ["annotation_id", "grasp_description", "object_description", "object_category", "object_id", "grasp_id"]
```
### Reading Data Files
Here's an example of how to extract the required information from the data files to create a `datasets.Dataset` of image, task, and corresponding point, as was used to train [GraspMolmo](https://github.com/abhaybd/GraspMolmo).
```python
import os
import datasets
import huggingface_hub as hf_hub
import h5py
from PIL import Image
import numpy as np
def point_to_xml(grasp_pt: np.ndarray):
if grasp_pt.ndim == 2:
assert grasp_pt.shape == (1, 2)
grasp_pt = grasp_pt[0]
assert grasp_pt.shape == (2,)
point_desc = "Where to grasp the object"
return f"<point x=\"{grasp_pt[0]*100:.1f}\" y=\"{grasp_pt[1]*100:.1f}\" alt=\"{point_desc}\">{point_desc}</point>"
def map_sample(file_loc_map: dict[str, str], ex: dict):
h5_path = file_loc_map[ex["scene_path"]]
with h5py.File(h5_path, "r") as f:
img = Image.fromarray(f[ex["view_id"]]["rgb"][:])
grasp_pt_px = f[ex["view_id"]][ex["obs_id"]]["grasp_point_px"][:]
grasp_pt_px = grasp_pt_px / np.array([img.width, img.height])
task = ex["task"]
prompt = f"Point to the grasp that would accomplish the following task: {task}"
point_xml = point_to_xml(grasp_pt_px)
response = f"In order to accomplish the task \"{task}\", the optimal grasp is described as follows: \"{ex['matching_grasp_desc']}\".\n\n{point_xml}"
return dict(
image=img,
prompt=prompt,
text=response,
style="pointing"
)
def build_pointing_dataset(split: str, num_proc: int = 10) -> datasets.Dataset:
hf_fs = hf_hub.HfFileSystem()
chunks = hf_fs.ls(f"datasets/allenai/PRISM/PRISM-{split}", detail=False)
urls = []
for chunk in chunks:
path = chunk[len("datasets/allenai/PRISM/"):]
urls.append(hf_hub.hf_hub_url(repo_id="allenai/PRISM", filename=path, repo_type="dataset"))
dl_manager = datasets.DownloadManager(dataset_name="allenai/PRISM", record_checksums=False)
paths = dl_manager.download_and_extract(urls)
file_loc_map = {}
for path in paths:
path = str(path)
for file in os.listdir(path):
file_loc_map[file] = os.path.join(path, file)
metadata_ds = datasets.load_dataset("allenai/PRISM", split=split)
dataset = metadata_ds.map(lambda ex: map_sample(file_loc_map, ex), num_proc=num_proc)
return dataset
if __name__ == "__main__":
build_pointing_dataset("train")
build_pointing_dataset("test")
```
# PRISM
[[论文]](https://arxiv.org/pdf/2505.13441) [[arXiv]](https://arxiv.org/abs/2505.13441) [[项目主页]](https://abhaybd.github.io/GraspMolmo/)
场景操控中的目标驱动机器人交互(PRISM)是一款面向任务导向抓取(Task-Oriented Grasping)的大规模合成数据集,其场景涵盖杂乱环境与多样化、逼真的任务描述。本数据集采用来自ShapeNet-Sem的2365个物体实例,结合ACRONYM生成的稳定抓取姿态,构建了10000个独特且多样化的场景。每个场景下采集10个视角,每个视角包含多个待执行任务,最终总计生成37.9万条任务-抓取样本对。
数据集卡片包含训练集与测试集的任务及对应描述文本。RGB图像、点云、分割掩码等数据均存储于配套的"data files"中,这些文件以`.tar`格式封装,分别存放在`PRISM-train`与`PRISM-test`目录下,可根据各数据样本的需求调取。
## 数据文件
每个`.tar`文件包含多个`<scene_id>.hdf5`文件,每个文件的结构如下:
view_<i>/
rgb: RGB image as (H,W,3) array
xyz: Back-projected point-cloud (from RGB-D view) as (H,W,3) array of XYZ points
seg: Segmentation map as (H,W) array where each pixel is index of object name in object_names
object_names: List of object names visible in view
normals (optional): Point-cloud normals as (H,W,3) array
view_pose: Camera pose in world frame as (4,4) array
cam_params: Camera intrinsics matrix as (3,3) array
obs_<j>/
grasp_pose: Grasp pose in camera frame as (4,4) array
grasp_point: Point being grasped in camera frame as (3,) array
grasp_point_px: Point being grasped projected onto image plane as (2,) array
annot: YAML-formatted object with the following keys: ["annotation_id", "grasp_description", "object_description", "object_category", "object_id", "grasp_id"]
### 数据文件读取示例
以下为从数据文件中提取所需信息,以构建包含图像、任务与对应抓取点的`datasets.Dataset`数据集的示例代码,该代码曾用于训练[GraspMolmo](https://github.com/abhaybd/GraspMolmo)。
python
import os
import datasets
import huggingface_hub as hf_hub
import h5py
from PIL import Image
import numpy as np
def point_to_xml(grasp_pt: np.ndarray):
if grasp_pt.ndim == 2:
assert grasp_pt.shape == (1, 2)
grasp_pt = grasp_pt[0]
assert grasp_pt.shape == (2,)
point_desc = "Where to grasp the object"
return f"<point x="{grasp_pt[0]*100:.1f}" y="{grasp_pt[1]*100:.1f}" alt="{point_desc}">{point_desc}</point>"
def map_sample(file_loc_map: dict[str, str], ex: dict):
h5_path = file_loc_map[ex["scene_path"]]
with h5py.File(h5_path, "r") as f:
img = Image.fromarray(f[ex["view_id"]]["rgb"][:])
grasp_pt_px = f[ex["view_id"]][ex["obs_id"]]["grasp_point_px"][:]
grasp_pt_px = grasp_pt_px / np.array([img.width, img.height])
task = ex["task"]
prompt = f"Point to the grasp that would accomplish the following task: {task}"
point_xml = point_to_xml(grasp_pt_px)
response = f"In order to accomplish the task "{task}", the optimal grasp is described as follows: "{ex['matching_grasp_desc']}".
{point_xml}"
return dict(
image=img,
prompt=prompt,
text=response,
style="pointing"
)
def build_pointing_dataset(split: str, num_proc: int = 10) -> datasets.Dataset:
hf_fs = hf_hub.HfFileSystem()
chunks = hf_fs.ls(f"datasets/allenai/PRISM/PRISM-{split}", detail=False)
urls = []
for chunk in chunks:
path = chunk[len("datasets/allenai/PRISM/"):]
urls.append(hf_hub.hf_hub_url(repo_id="allenai/PRISM", filename=path, repo_type="dataset"))
dl_manager = datasets.DownloadManager(dataset_name="allenai/PRISM", record_checksums=False)
paths = dl_manager.download_and_extract(urls)
file_loc_map = {}
for path in paths:
path = str(path)
for file in os.listdir(path):
file_loc_map[file] = os.path.join(path, file)
metadata_ds = datasets.load_dataset("allenai/PRISM", split=split)
dataset = metadata_ds.map(lambda ex: map_sample(file_loc_map, ex), num_proc=num_proc)
return dataset
if __name__ == "__main__":
build_pointing_dataset("train")
build_pointing_dataset("test")
提供机构:
maas
创建时间:
2025-06-05



