Voxel51/toon3d

Name: Voxel51/toon3d
Creator: Voxel51
Published: 2026-03-17 22:25:20
License: 暂无描述

Hugging Face2026-03-17 更新2026-04-05 收录

下载链接：

https://hf-mirror.com/datasets/Voxel51/toon3d

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: [] language: en size_categories: - n<1K task_ids: [] pretty_name: toon3d tags: - fiftyone - group - 3d dataset_summary: > This is a [FiftyOne](https://github.com/voxel51/fiftyone) dataset with 12 samples. ## Installation If you haven't already, install FiftyOne: ```bash pip install -U fiftyone ``` ## Usage ```python import fiftyone as fo from fiftyone.utils.huggingface import load_from_hub # Load the dataset # Note: other available arguments include 'max_samples', etc dataset = load_from_hub("Voxel51/toon3d") # Launch the App session = fo.launch_app(dataset) ``` --- # Dataset Card for toon3d ![image/png](toon3d.gif) This is a [FiftyOne](https://github.com/voxel51/fiftyone) dataset with 12 samples. ## Installation If you haven't already, install FiftyOne: ```bash pip install -U fiftyone ``` ## Usage ```python import fiftyone as fo from huggingface_hub import snapshot_download # Download the dataset snapshot to the current working directory snapshot_download( repo_id="Voxel51/toon3d", local_dir=".", repo_type="dataset" ) # Load dataset from current directory using FiftyOne's native format dataset = fo.Dataset.from_dir( dataset_dir=".", # Current directory contains the dataset files dataset_type=fo.types.FiftyOneDataset, # Specify FiftyOne dataset format name="toon3d" # Assign a name to the dataset for identification ) ``` ## Dataset Details ### Dataset Description Toon3D is a multi-view cartoon scene reconstruction dataset introduced in the paper *"Toon3D: Seeing Cartoons from New Perspectives"* (Weber et al., arXiv:2405.10320, 2024). It contains 12 scenes from popular hand-drawn cartoons and anime, each comprising 5–12 frames that depict the same environment from geometrically inconsistent viewpoints. This FiftyOne dataset wraps the Toon3D scenes together with the outputs of the authors' custom Structure-from-Motion (SfM) pipeline, which recovers camera poses and dense 3D point clouds from the geometrically inconsistent input images. - **Curated by:** Ethan Weber, Riley Peterlinz, Rohan Mathur, Frederik Warburg, Alexei A. Efros, Angjoo Kanazawa (UC Berkeley / Teton.ai) - **Paper:** [arXiv:2405.10320](https://arxiv.org/abs/2405.10320) - **Project page:** https://toon3d.studio/ - **Dataset repository:** https://github.com/ethanweber/toon3d-dataset --- ## FiftyOne Dataset Structure ### Overview | Property | Value | |---|---| | Dataset name | `toon3d` | | FiftyOne media type | `group` | | Number of groups | 12 | | Number of samples | 92 | | Default group slice | `frame_00` | | Total named slices | 13 | ### Groups Each **group** corresponds to one cartoon scene. There are 12 groups, one per scene: | Group (scene tag) | Frame slices | Total samples (frames + 3D) | |---|---|---| | `avatar-house` | `frame_00` – `frame_07` | 9 | | `bobs-burgers` | `frame_00` – `frame_06` | 8 | | `bojak-room` | `frame_00` – `frame_11` | 13 | | `family-guy-dining` | `frame_00` – `frame_06` | 8 | | `family-guy-house` | `frame_00` – `frame_05` | 7 | | `krusty-krab` | `frame_00` – `frame_08` | 10 | | `magic-school-bus` | `frame_00` – `frame_04` | 6 | | `mystery-machine` | `frame_00` – `frame_05` | 7 | | `planet-express` | `frame_00` – `frame_04` | 6 | | `simpsons-house` | `frame_00` – `frame_04` | 6 | | `smith-residence` | `frame_00` – `frame_03` | 5 | | `spirited-away` | `frame_00` – `frame_05` | 7 | Groups have a variable number of frame slices (4–12). Slice names are zero-padded to two digits (`frame_00`–`frame_11`) and are consistent across all groups; scenes with fewer frames simply have no sample in the higher-indexed slices. ### Slices #### Frame slices — `frame_00` through `frame_11` Each frame slice sample represents one original cartoon image from the scene. **`filepath`** points to the original cartoon PNG (`toon3d-dataset/<scene>/images/<NNNNN>.png`). Images are resized to a maximum of 960 × 720 px during preprocessing. **Fields:** | Field | Type | Description | |---|---|---| | `group` | `EmbeddedDocumentField(Group)` | Group membership — id shared across all slices of the same scene, name is the slice identifier (e.g. `frame_02`) | | `tags` | `ListField(StringField)` | Single-element list containing the scene name, e.g. `["bobs-burgers"]` | | `depth` | `EmbeddedDocumentField(Heatmap)` | Marigold monocular depth estimate visualised as a heatmap. `map_path` points to the depth colormap PNG at `toon3d-dataset/<scene>/depth-images/<NNNNN>.png` | | `keypoints` | `EmbeddedDocumentField(Keypoints)` | Human-annotated 2D sparse correspondences from the Toon3D Labeler, stored as a single `fo.Keypoint` with `label="correspondence"`. Coordinates are normalised to `[0, 1]`. Only valid (visible) points are included. `None` if no valid points exist for the frame. | | `scene` | `StringField` | Scene name, e.g. `"bobs-burgers"` | | `frame_idx` | `IntField` | Zero-based frame index within the scene (matches the SfM frame order after filtering invalid images) | | `n_frames` | `IntField` | Total number of valid frames in this scene | | `camera` | `DictField` | Camera parameters recovered by the Toon3D SfM pipeline (see below) | **`camera` dict schema:** | Key | Type | Description | |---|---|---| | `fl_x` | `float` | Learned focal length in pixels, x-axis | | `fl_y` | `float` | Learned focal length in pixels, y-axis | | `cx` | `float` | Principal point x (half image width) | | `cy` | `float` | Principal point y (half image height) | | `w` | `int` | Image width in pixels | | `h` | `int` | Image height in pixels | | `transform_matrix` | `list[list[float]]` | 4×4 camera-to-world matrix in OpenCV / Nerfstudio convention. Camera 0 is placed at the world origin; all other cameras are expressed relative to it. | The `transform_matrix` coordinate convention matches Nerfstudio's OPENCV camera model. A coordinate flip `diag(1, -1, -1)` is applied to the internal toon3d rotation before writing, converting from the SfM-internal convention to Nerfstudio convention. In world space, camera 0 looks in the **+Z** direction with **−Y** as the image-up axis. --- #### `scene_3d` slice Each group has exactly one `scene_3d` sample. Its **`filepath`** points to a `.fo3d` scene file (written by `fo3d.Scene.write(..., resolve_relative_paths=True)`). **Fields:** | Field | Type | Description | |---|---|---| | `group` | `EmbeddedDocumentField(Group)` | Group membership — same group id as the frame slices for this scene | | `tags` | `ListField(StringField)` | `[scene_name]` | | `scene` | `StringField` | Scene name | | `n_frames` | `IntField` | Number of per-frame PLY nodes in the scene | **`.fo3d` scene contents:** Each `.fo3d` file contains: - **`camera`** — a `PerspectiveCamera` with: - `up = "-Y"` — consistent with the SfM coordinate convention where image-up maps to world −Y - `position` — auto-computed as the point cloud centroid offset by 1.5× the scene diagonal along −Z, placing the viewer behind the scene looking forward - `look_at` — centroid of the combined `all.ply` point cloud - `fov = 50.0°`, `near = 0.1`, `far = 2000.0` - **N `PlyMesh` nodes** (one per frame) with: - `name` = `"frame_NNNNN"` (zero-padded 5 digits) - `ply_path` — absolute path to `outputs/<scene>/run/<timestamp>/nerfstudio/plys/<NNNNN>.ply` - `is_point_cloud = False` (set to `True` for point cloud rendering) - `center_geometry = False` — the PLYs are already in SfM world space and must not be re-centred - `default_material` — `MeshBasicMaterial` with a distinct per-frame colour (cycles through 13 colours: `#e63946`, `#457b9d`, `#2a9d8f`, `#e9c46a`, `#f4a261`, `#264653`, `#a8dadc`, `#bc4749`, `#6a994e`, `#7209b7`, `#3a0ca3`, `#f77f00`, `#4cc9f0`) The PLY format is ASCII with per-vertex fields `x y z red green blue` (float32 x/y/z, uint8 RGB). Points are in SfM world space — back-projected from Marigold depth maps through the recovered camera poses. All per-frame PLYs share the same coordinate system and can be composed directly. --- ### How the Dataset Was Built 1. **SfM outputs generated** with `tnd-run` (from the `toon3d` package) for all 12 scenes. Each run reads `toon3d-dataset/<scene>/` (images, `.pt` depth tensors, `points.json`) and writes Nerfstudio-format outputs to `outputs/<scene>/run/<timestamp>/nerfstudio/`. 2. **`.fo3d` scene files written** — one per scene — into `outputs/<scene>/run/<timestamp>/nerfstudio/fo3d/scene.fo3d`. The camera position is computed programmatically from the `all.ply` centroid and extent. 3. **FiftyOne grouped dataset created** with `fo.Dataset.add_group_field("group", default="frame_00")`. All 92 samples are added in a single `dataset.add_samples(all_samples)` call. The build script is `build_dataset.py` at the root of this workspace. --- ## Citation ```bibtex @inproceedings{weber2024toon3d, title = {Toon3D: Seeing Cartoons from New Perspectives}, author = {Ethan Weber and Riley Peterlinz and Rohan Mathur and Frederik Warburg and Alexei A. Efros and Angjoo Kanazawa}, booktitle = {arXiv:2405.10320}, year = {2024}, } ```

提供机构：

Voxel51

5,000+

优质数据集

54 个

任务类型

进入经典数据集