Caesarrr/co3d_parquet
收藏Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Caesarrr/co3d_parquet
下载链接
链接失效反馈官方服务:
资源简介:
---
configs:
- config_name: default
data_files:
- split: train
path: train-*.parquet
- split: validation
path: val-*.parquet
- split: test
path: test-*.parquet
---
# Probe CO3D Parquet Export
This directory contains a Parquet export of the probe-ready CO3D subset.
## Summary
- Source layout: `experiments/probe/datasets/co3d`
- Export layout: `experiments/probe/datasets/co3d_parquet`
- Row unit: one sequence with exactly 8 selected frames
- Categories: 51
- Selected sequences: 4396
- Original on-disk sequences: 20273
- Valid sequences before per-category truncation: 15938
- Rows per shard: 64
- Shards: train=55, val=7, test=7
## Column Overview
- Scalar metadata:
`split`, `category`, `sequence_name`, `camera_source`,
`available_frame_count`, `selected_frame_count`, `quality_score`,
`total_span_deg`, `max_slot_error_deg`, `rmse_slot_error_deg`,
`sweep_deg`, `monotonicity`
- Frame media columns:
`image_0..7`, `mask_0..7`, `depth_0..7`, `object_only_0..7`
- Additional metadata:
`camera_poses_npz`, `selected_frames_json`,
`trajectory_metrics_json`, `sequence_annotation_json`,
`frame_annotations_json`
## Loading Example
```python
from io import BytesIO
import json
import numpy as np
from datasets import load_dataset
ds = load_dataset(
"parquet",
data_files={
"train": "train-*.parquet",
"validation": "val-*.parquet",
"test": "test-*.parquet",
},
)
sample = ds["train"][0]
image0 = sample["image_0"]
mask0 = sample["mask_0"]
depth0 = sample["depth_0"]
object_only0 = sample["object_only_0"]
selected_frames = json.loads(sample["selected_frames_json"])
sequence_annotation = json.loads(sample["sequence_annotation_json"])
frame_annotations = json.loads(sample["frame_annotations_json"])
camera_poses = np.load(BytesIO(sample["camera_poses_npz"]))
```
## Notes
- All media columns are embedded in the Parquet shards, so the export is self-contained.
- `selected_frames_json` is the exact per-sequence metadata produced by the probe builder.
- `assets/` contains the category distribution figures copied from the source export.
- Verify the upstream dataset redistribution terms and set the final Hugging Face metadata before publishing.
---
配置项:
- 配置名称:default
数据文件:
- 拆分集:train
路径:train-*.parquet
- 拆分集:validation
路径:val-*.parquet
- 拆分集:test
路径:test-*.parquet
---
# CO3D探针适配版Parquet格式导出数据集
本目录包含适配探针任务的CO3D子集的Parquet格式导出文件。
## 数据集概览
- 源数据集布局:`experiments/probe/datasets/co3d`
- 导出数据集布局:`experiments/probe/datasets/co3d_parquet`
- 数据行单位:单条序列,且恰好包含8帧选中图像
- 类别数量:51个
- 选中序列数:4396条
- 原始磁盘存储序列数:20273条
- 按类别截断前的有效序列数:15938条
- 每个分片的行数:64行
- 分片分布:训练集=55,验证集=7,测试集=7
## 列信息总览
- 标量元数据:
`split`, `category`, `sequence_name`, `camera_source`,
`available_frame_count`, `selected_frame_count`, `quality_score`,
`total_span_deg`, `max_slot_error_deg`, `rmse_slot_error_deg`,
`sweep_deg`, `monotonicity`
- 帧媒体列:
`image_0..7`, `mask_0..7`, `depth_0..7`, `object_only_0..7`
- 附加元数据:
`camera_poses_npz`, `selected_frames_json`,
`trajectory_metrics_json`, `sequence_annotation_json`,
`frame_annotations_json`
## 加载示例
python
from io import BytesIO
import json
import numpy as np
from datasets import load_dataset
ds = load_dataset(
"parquet",
data_files={
"train": "train-*.parquet",
"validation": "val-*.parquet",
"test": "test-*.parquet",
},
)
sample = ds["train"][0]
image0 = sample["image_0"]
mask0 = sample["mask_0"]
depth0 = sample["depth_0"]
object_only0 = sample["object_only_0"]
selected_frames = json.loads(sample["selected_frames_json"])
sequence_annotation = json.loads(sample["sequence_annotation_json"])
frame_annotations = json.loads(sample["frame_annotations_json"])
camera_poses = np.load(BytesIO(sample["camera_poses_npz"]))
## 注意事项
- 所有媒体列均内嵌于Parquet分片中,因此该导出数据集为自包含式文件。
- `selected_frames_json`为探针构建器生成的精准单序列元数据。
- `assets/`目录包含从源导出文件中复制的类别分布统计图。
- 发布前请核实上游数据集的再分发条款,并配置最终的Hugging Face元数据。
提供机构:
Caesarrr



