Caesarrr/co3d_parquet

Name: Caesarrr/co3d_parquet
Creator: Caesarrr
Published: 2026-04-19 12:21:05
License: 暂无描述

Hugging Face2026-04-19 更新2026-04-26 收录

下载链接：

https://hf-mirror.com/datasets/Caesarrr/co3d_parquet

下载链接

链接失效反馈

官方服务：

资源简介：

--- configs: - config_name: default data_files: - split: train path: train-*.parquet - split: validation path: val-*.parquet - split: test path: test-*.parquet --- # Probe CO3D Parquet Export This directory contains a Parquet export of the probe-ready CO3D subset. ## Summary - Source layout: `experiments/probe/datasets/co3d` - Export layout: `experiments/probe/datasets/co3d_parquet` - Row unit: one sequence with exactly 8 selected frames - Categories: 51 - Selected sequences: 4396 - Original on-disk sequences: 20273 - Valid sequences before per-category truncation: 15938 - Rows per shard: 64 - Shards: train=55, val=7, test=7 ## Column Overview - Scalar metadata: `split`, `category`, `sequence_name`, `camera_source`, `available_frame_count`, `selected_frame_count`, `quality_score`, `total_span_deg`, `max_slot_error_deg`, `rmse_slot_error_deg`, `sweep_deg`, `monotonicity` - Frame media columns: `image_0..7`, `mask_0..7`, `depth_0..7`, `object_only_0..7` - Additional metadata: `camera_poses_npz`, `selected_frames_json`, `trajectory_metrics_json`, `sequence_annotation_json`, `frame_annotations_json` ## Loading Example ```python from io import BytesIO import json import numpy as np from datasets import load_dataset ds = load_dataset( "parquet", data_files={ "train": "train-*.parquet", "validation": "val-*.parquet", "test": "test-*.parquet", }, ) sample = ds["train"][0] image0 = sample["image_0"] mask0 = sample["mask_0"] depth0 = sample["depth_0"] object_only0 = sample["object_only_0"] selected_frames = json.loads(sample["selected_frames_json"]) sequence_annotation = json.loads(sample["sequence_annotation_json"]) frame_annotations = json.loads(sample["frame_annotations_json"]) camera_poses = np.load(BytesIO(sample["camera_poses_npz"])) ``` ## Notes - All media columns are embedded in the Parquet shards, so the export is self-contained. - `selected_frames_json` is the exact per-sequence metadata produced by the probe builder. - `assets/` contains the category distribution figures copied from the source export. - Verify the upstream dataset redistribution terms and set the final Hugging Face metadata before publishing.

--- 配置项： - 配置名称：default 数据文件： - 拆分集：train 路径：train-*.parquet - 拆分集：validation 路径：val-*.parquet - 拆分集：test 路径：test-*.parquet --- # CO3D探针适配版Parquet格式导出数据集本目录包含适配探针任务的CO3D子集的Parquet格式导出文件。 ## 数据集概览 - 源数据集布局：`experiments/probe/datasets/co3d` - 导出数据集布局：`experiments/probe/datasets/co3d_parquet` - 数据行单位：单条序列，且恰好包含8帧选中图像 - 类别数量：51个 - 选中序列数：4396条 - 原始磁盘存储序列数：20273条 - 按类别截断前的有效序列数：15938条 - 每个分片的行数：64行 - 分片分布：训练集=55，验证集=7，测试集=7 ## 列信息总览 - 标量元数据： `split`, `category`, `sequence_name`, `camera_source`, `available_frame_count`, `selected_frame_count`, `quality_score`, `total_span_deg`, `max_slot_error_deg`, `rmse_slot_error_deg`, `sweep_deg`, `monotonicity` - 帧媒体列： `image_0..7`, `mask_0..7`, `depth_0..7`, `object_only_0..7` - 附加元数据： `camera_poses_npz`, `selected_frames_json`, `trajectory_metrics_json`, `sequence_annotation_json`, `frame_annotations_json` ## 加载示例 python from io import BytesIO import json import numpy as np from datasets import load_dataset ds = load_dataset( "parquet", data_files={ "train": "train-*.parquet", "validation": "val-*.parquet", "test": "test-*.parquet", }, ) sample = ds["train"][0] image0 = sample["image_0"] mask0 = sample["mask_0"] depth0 = sample["depth_0"] object_only0 = sample["object_only_0"] selected_frames = json.loads(sample["selected_frames_json"]) sequence_annotation = json.loads(sample["sequence_annotation_json"]) frame_annotations = json.loads(sample["frame_annotations_json"]) camera_poses = np.load(BytesIO(sample["camera_poses_npz"])) ## 注意事项 - 所有媒体列均内嵌于Parquet分片中，因此该导出数据集为自包含式文件。 - `selected_frames_json`为探针构建器生成的精准单序列元数据。 - `assets/`目录包含从源导出文件中复制的类别分布统计图。 - 发布前请核实上游数据集的再分发条款，并配置最终的Hugging Face元数据。

提供机构：

Caesarrr

5,000+

优质数据集

54 个

任务类型

进入经典数据集