five

Caesarrr/co3d_parquet

收藏
Hugging Face2026-04-19 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Caesarrr/co3d_parquet
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: default data_files: - split: train path: train-*.parquet - split: validation path: val-*.parquet - split: test path: test-*.parquet --- # Probe CO3D Parquet Export This directory contains a Parquet export of the probe-ready CO3D subset. ## Summary - Source layout: `experiments/probe/datasets/co3d` - Export layout: `experiments/probe/datasets/co3d_parquet` - Row unit: one sequence with exactly 8 selected frames - Categories: 51 - Selected sequences: 4396 - Original on-disk sequences: 20273 - Valid sequences before per-category truncation: 15938 - Rows per shard: 64 - Shards: train=55, val=7, test=7 ## Column Overview - Scalar metadata: `split`, `category`, `sequence_name`, `camera_source`, `available_frame_count`, `selected_frame_count`, `quality_score`, `total_span_deg`, `max_slot_error_deg`, `rmse_slot_error_deg`, `sweep_deg`, `monotonicity` - Frame media columns: `image_0..7`, `mask_0..7`, `depth_0..7`, `object_only_0..7` - Additional metadata: `camera_poses_npz`, `selected_frames_json`, `trajectory_metrics_json`, `sequence_annotation_json`, `frame_annotations_json` ## Loading Example ```python from io import BytesIO import json import numpy as np from datasets import load_dataset ds = load_dataset( "parquet", data_files={ "train": "train-*.parquet", "validation": "val-*.parquet", "test": "test-*.parquet", }, ) sample = ds["train"][0] image0 = sample["image_0"] mask0 = sample["mask_0"] depth0 = sample["depth_0"] object_only0 = sample["object_only_0"] selected_frames = json.loads(sample["selected_frames_json"]) sequence_annotation = json.loads(sample["sequence_annotation_json"]) frame_annotations = json.loads(sample["frame_annotations_json"]) camera_poses = np.load(BytesIO(sample["camera_poses_npz"])) ``` ## Notes - All media columns are embedded in the Parquet shards, so the export is self-contained. - `selected_frames_json` is the exact per-sequence metadata produced by the probe builder. - `assets/` contains the category distribution figures copied from the source export. - Verify the upstream dataset redistribution terms and set the final Hugging Face metadata before publishing.

--- 配置项: - 配置名称:default 数据文件: - 拆分集:train 路径:train-*.parquet - 拆分集:validation 路径:val-*.parquet - 拆分集:test 路径:test-*.parquet --- # CO3D探针适配版Parquet格式导出数据集 本目录包含适配探针任务的CO3D子集的Parquet格式导出文件。 ## 数据集概览 - 源数据集布局:`experiments/probe/datasets/co3d` - 导出数据集布局:`experiments/probe/datasets/co3d_parquet` - 数据行单位:单条序列,且恰好包含8帧选中图像 - 类别数量:51个 - 选中序列数:4396条 - 原始磁盘存储序列数:20273条 - 按类别截断前的有效序列数:15938条 - 每个分片的行数:64行 - 分片分布:训练集=55,验证集=7,测试集=7 ## 列信息总览 - 标量元数据: `split`, `category`, `sequence_name`, `camera_source`, `available_frame_count`, `selected_frame_count`, `quality_score`, `total_span_deg`, `max_slot_error_deg`, `rmse_slot_error_deg`, `sweep_deg`, `monotonicity` - 帧媒体列: `image_0..7`, `mask_0..7`, `depth_0..7`, `object_only_0..7` - 附加元数据: `camera_poses_npz`, `selected_frames_json`, `trajectory_metrics_json`, `sequence_annotation_json`, `frame_annotations_json` ## 加载示例 python from io import BytesIO import json import numpy as np from datasets import load_dataset ds = load_dataset( "parquet", data_files={ "train": "train-*.parquet", "validation": "val-*.parquet", "test": "test-*.parquet", }, ) sample = ds["train"][0] image0 = sample["image_0"] mask0 = sample["mask_0"] depth0 = sample["depth_0"] object_only0 = sample["object_only_0"] selected_frames = json.loads(sample["selected_frames_json"]) sequence_annotation = json.loads(sample["sequence_annotation_json"]) frame_annotations = json.loads(sample["frame_annotations_json"]) camera_poses = np.load(BytesIO(sample["camera_poses_npz"])) ## 注意事项 - 所有媒体列均内嵌于Parquet分片中,因此该导出数据集为自包含式文件。 - `selected_frames_json`为探针构建器生成的精准单序列元数据。 - `assets/`目录包含从源导出文件中复制的类别分布统计图。 - 发布前请核实上游数据集的再分发条款,并配置最终的Hugging Face元数据。
提供机构:
Caesarrr
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作