digit-pose-estimation

Name: digit-pose-estimation
Creator: maas
Published: 2025-11-27 16:34:18
License: 暂无描述

魔搭社区2025-11-27 更新2025-05-24 收录

下载链接：

https://modelscope.cn/datasets/facebook/digit-pose-estimation

下载链接

链接失效反馈

官方服务：

资源简介：

# Dataset Details This dataset contains time-synchronized pairs of DIGIT images and SE(3) object poses. In our setup, the robot hand is stationary with its palm facing downwards and pressing against the object on a table. The robot hand has DIGIT sensors mounted on the index, middle, and ring fingertips, all of which are in contact with the object. A human manually perturbs the object's pose by translating and rotating it in SE(2). We use tag tracking to obtain the object's pose. We collect data using two objects: a Pringles can and the YCB sugar box, both of which have a tag fixed to their top surfaces. The following image illustrates our setting: ![](assets/pose_estimation.png) This dataset is part of TacBench for evaluating Sparsh touch representations. For more information, please visit https://sparsh-ssl.github.io/. ## Uses This dataset contains aligned DIGIT tactile data and world frame object poses. It is designed to evaluate the performance of [Sparsh encoders](https://huggingface.co/collections/facebook/sparsh-67167ce57566196a4526c328) in enhancing perception by predicting relative pose changes with respect to the sensor gel of the fingers, denoted as $S_t^{t-H} \triangleq (\Delta x, \Delta y, \Delta \theta) \in \mathbf{SE}(2)$, where H is the time stride. For more information on how to use this dataset and set up corresponding downstream tasks, please refer to the [Sparsh repository](https://github.com/facebookresearch/sparsh). ## Dataset Structure The dataset is a collection of sequences where a human manually perturbs the object's pose. We collect data using two objects: a Pringles can and the YCB sugar box. Each sequence corresponds to a pickle file containing the following labeled data: - DIGIT tactile images for index, middle and ring fingers - Object pose tracked from tag in format (x, y, z, qw, qx, qy, qz) - Robot hand joint positions - `object_index_rel_pose_n5`: the pose change within the last 5 samples as a transformation matrix. The object pose is with respect to the index finger. - `object_middle_rel_pose_n5`: the pose change within the last 5 samples as a transformation matrix. The object pose is with respect to the middle finger. - `object_ring_rel_pose_n5`: the pose change within the last 5 samples as a transformation matrix. The object pose is with respect to the ring finger. We also provide reference (no contact) images for each of the DIGITs to facilitate pre-processing such as background subtraction. ```bash train ├── pringles │ ├── bag_00.pkl │ ├── ... │ ├── bag_37.pkl │ ├── bag_38.pkl ├── sugar │ ├── ... test ├── pringles │ ├── bag_00.pkl │ ├── ... │ ├── bag_05.pkl │ ├── bag_06.pkl ├── sugar │ ├── ... bgs ├── digit_index.png ├── digit_index.png ├── digit_index.png ``` The following code is an example about how to load the data: ```python def load_bin_image(io_buf): img = Image.open(io.BytesIO(io_buf)) img = np.array(img) return img def load_dataset_poses(dataset_name, finger_type, t_stride): path_data = f"{dataset_name}.pkl" with open(path_data, "rb") as file: data = pickle.load(file) idx_max = np.min( [ len(data[f"digit_{finger_type}"]), len(data[f"object_{finger_type}_rel_pose_n{t_stride}"]), ] ) dataset_digit = data[f"digit_{finger_type}"][:idx_max] dataset_poses = data[f"object_{finger_type}_rel_pose_n{t_stride}"][:idx_max] return dataset_digit, dataset_poses dataset_digit, dataset_poses = load_dataset_poses("train/pringles/bag_00.pkl", "ring", 5) delta_rel_pose_gt = dataset_poses[0] img = load_bin_image(dataset_digit[0]) ``` Please refer to [Sparsh repository](https://github.com/facebookresearch/sparsh) for further information about using the pose estimation dataset and downstream task training. ## BibTeX entry and citation info ```bibtex @inproceedings{ higuera2024sparsh, title={Sparsh: Self-supervised touch representations for vision-based tactile sensing}, author={Carolina Higuera and Akash Sharma and Chaithanya Krishna Bodduluri and Taosha Fan and Patrick Lancaster and Mrinal Kalakrishnan and Michael Kaess and Byron Boots and Mike Lambeta and Tingfan Wu and Mustafa Mukadam}, booktitle={8th Annual Conference on Robot Learning}, year={2024}, url={https://openreview.net/forum?id=xYJn2e1uu8} } ```

# 数据集详情本数据集包含时序同步的DIGIT触觉图像（DIGIT）与特殊欧几里得群SE(3)物体位姿对。在本实验设置中，机械手掌固定不动，掌心朝下并按压桌面上的物体。机械手掌的食指、中指和无名指指尖均安装有DIGIT触觉传感器（DIGIT），所有传感器均与物体保持接触。由人工通过二维特殊欧几里得群SE(2)内的平移与旋转操作手动扰动物体位姿。我们通过标签追踪获取物体位姿。本次数据采集使用两种物体：品客薯片罐与YCB糖盒，二者顶面均固定有追踪标签。下图展示了本实验设置： ![](assets/pose_estimation.png) 本数据集是用于评估Sparsh触觉表征的TacBench的一部分。如需更多信息，请访问https://sparsh-ssl.github.io/。 ## 用途本数据集包含对齐后的DIGIT触觉数据与世界坐标系下的物体位姿。其设计目标为评估[Sparsh编码器（Sparsh encoders）](https://huggingface.co/collections/facebook/sparsh-67167ce57566196a4526c328)的感知性能，该编码器通过预测相对于手指传感器凝胶面的相对位姿变化来提升感知能力，该相对位姿记为$S_t^{t-H} riangleq (Delta x, Delta y, Delta heta) in mathbf{SE}(2)$，其中H为时间步长。如需了解本数据集的使用方法与对应下游任务的搭建流程，请参考[Sparsh代码仓库（Sparsh repository）](https://github.com/facebookresearch/sparsh)。 ## 数据集结构本数据集由多段序列组成，每段序列对应一次人工手动扰动物体位姿的过程。本次数据采集仍使用品客薯片罐与YCB糖盒两种物体。每段序列对应一个pickle序列化文件，其中包含以下带标注的数据： - 食指、中指与无名指的DIGIT触觉图像 - 通过追踪标签获取的物体位姿，格式为(x, y, z, qw, qx, qy, qz) - 机械手掌关节位置 - `object_index_rel_pose_n5`：最近5个采样步长内的位姿变化，以变换矩阵形式存储，该位姿相对于食指传感器 - `object_middle_rel_pose_n5`：最近5个采样步长内的位姿变化，以变换矩阵形式存储，该位姿相对于中指传感器 - `object_ring_rel_pose_n5`：最近5个采样步长内的位姿变化，以变换矩阵形式存储，该位姿相对于无名指传感器我们还为每个DIGIT传感器提供了无接触参考图像，以支持背景减除等预处理操作。 bash train ├── pringles │ ├── bag_00.pkl │ ├── ... │ ├── bag_37.pkl │ ├── bag_38.pkl ├── sugar │ ├── ... test ├── pringles │ ├── bag_00.pkl │ ├── ... │ ├── bag_05.pkl │ ├── bag_06.pkl ├── sugar │ ├── ... bgs ├── digit_index.png ├── digit_index.png ├── digit_index.png 以下代码展示了如何加载该数据集： python def load_bin_image(io_buf): img = Image.open(io.BytesIO(io_buf)) img = np.array(img) return img def load_dataset_poses(dataset_name, finger_type, t_stride): path_data = f"{dataset_name}.pkl" with open(path_data, "rb") as file: data = pickle.load(file) idx_max = np.min( [ len(data[f"digit_{finger_type}"]), len(data[f"object_{finger_type}_rel_pose_n{t_stride}"]), ] ) dataset_digit = data[f"digit_{finger_type}"][:idx_max] dataset_poses = data[f"object_{finger_type}_rel_pose_n{t_stride}"][:idx_max] return dataset_digit, dataset_poses dataset_digit, dataset_poses = load_dataset_poses("train/pringles/bag_00.pkl", "ring", 5) delta_rel_pose_gt = dataset_poses[0] img = load_bin_image(dataset_digit[0]) 如需了解位姿估计数据集的使用方法与下游任务训练的更多细节，请参考[Sparsh代码仓库](https://github.com/facebookresearch/sparsh)。 ## BibTeX引用条目与引文信息 bibtex @inproceedings{ higuera2024sparsh, title={Sparsh: Self-supervised touch representations for vision-based tactile sensing}, author={Carolina Higuera and Akash Sharma and Chaithanya Krishna Bodduluri and Taosha Fan and Patrick Lancaster and Mrinal Kalakrishnan and Michael Kaess and Byron Boots and Mike Lambeta and Tingfan Wu and Mustafa Mukadam}, booktitle={8th Annual Conference on Robot Learning}, year={2024}, url={https://openreview.net/forum?id=xYJn2e1uu8} }

提供机构：

maas

创建时间：

2025-05-20

5,000+

优质数据集

54 个

任务类型

进入经典数据集